2024-08-13 14:40:41,656 INFO [train_multi_KD3.py:1187] (1/4) Training started 2024-08-13 14:40:41,657 INFO [train_multi_KD3.py:1197] (1/4) Device: cuda:1 2024-08-13 14:40:41,658 INFO [train_multi_KD3.py:1212] (1/4) Using dtype=torch.bfloat16 2024-08-13 14:40:41,658 INFO [train_multi_KD3.py:1214] (1/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': 'e400fa3b456faf8afe0ee5bfe572946b4921a3db', 'k2-git-date': 'Sat Jul 15 04:21:50 2023', 'lhotse-version': '1.16.0', 'torch-version': '2.0.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.9', 'icefall-git-branch': 'multi_KD_with_wenet', 'icefall-git-sha1': 'a6c2f7a4-dirty', 'icefall-git-date': 'Thu Aug 8 16:21:21 2024', 'icefall-path': '/xy/mnt/yangxiaoyu/workspace/icefall_multi_KD', 'k2-path': '/root/anaconda3/lib/python3.9/site-packages/k2/__init__.py', 'lhotse-path': '/root/anaconda3/lib/python3.9/site-packages/lhotse/__init__.py', 'hostname': 'NGK_xiaoyu'}, 'world_size': 4, 'master_port': 13440, 'tensorboard': True, 'num_epochs': 35, 'start_epoch': 16, 'start_batch': 0, 'exp_dir': PosixPath('multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16'), 'bpe_model': 'data/lang_bpe_500/bpe.model', 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'stop_early': True, 'use_fp16': False, 'use_bf16': True, 'share_asr': True, 'beats_loss_scale': 1.0, 'ecapa_loss_scale': 10.0, 'whisper_loss_scale': 1.0, 'whisper_cb_loss_scale': 0.01, 'repeat_librispeech': 5, 'repeat_wenetspeech': 0, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': True, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'use_transducer': True, 'use_ctc': False, 'speaker_input_idx': 2, 'whisper_dim': 1280, 'use_task_id': True, 'num_codebooks': 32, 'mvq_kd_layer_idx': -1, 'use_subsampled_output': True, 'delta_t': 6, 'full_libri': True, 'mini_libri': False, 'use_libriheavy': False, 'libriheavy_subset': 'small', 'use_librispeech': True, 'use_wenetspeech': False, 'use_audioset': True, 'audioset_subset': 'unbalanced', 'use_voxceleb': True, 'voxceleb_subset': 'vox2', 'use_fma': False, 'fma_subset': 'large', 'manifest_dir': PosixPath('data/fbank_LSVoxAs_with_whisper_large-v3_with_taskID'), 'max_duration': 1500, 'bucketing_sampler': False, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 1, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'enable_musan': False, 'enable_audioset': False, 'use_musan_separately': False, 'input_strategy': 'PrecomputedFeatures', 'drop_features': False, 'return_audio': False, 'use_beats': True, 'use_ecapa': True, 'use_whisper': True, 'whisper_mvq': False, 'beats_ckpt': 'data/models/BEATs/BEATs_iter3_plus_AS2M_finetuned_on_AS2M_cpt2.pt', 'whisper_version': 'large-v3', 'use_mert': False, 'blank_id': 0, 'vocab_size': 500, 'dtype': torch.bfloat16, 'use_amp': True} 2024-08-13 14:40:41,658 INFO [train_multi_KD3.py:1216] (1/4) About to create model 2024-08-13 14:40:42,023 INFO [model_shift.py:142] (1/4) Delta_t: 6 when computing the distillation loss 2024-08-13 14:40:42,027 INFO [train_multi_KD3.py:1220] (1/4) Number of model parameters: 66484678 2024-08-13 14:40:42,027 INFO [checkpoint.py:112] (1/4) Loading checkpoint from multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/epoch-15.pt 2024-08-13 14:40:44,323 INFO [train_multi_KD3.py:1235] (1/4) Using DDP 2024-08-13 14:40:46,708 INFO [train_multi_KD3.py:1247] (1/4) Loading optimizer state dict 2024-08-13 14:40:47,135 INFO [train_multi_KD3.py:1255] (1/4) Loading scheduler state dict 2024-08-13 14:40:47,135 INFO [kd_datamodule.py:690] (1/4) About to get train 960 cuts 2024-08-13 14:40:47,180 INFO [train_multi_KD3.py:1306] (1/4) Getting audioset cuts 2024-08-13 14:40:47,180 INFO [kd_datamodule.py:900] (1/4) About to get the audioset cuts for KD. 2024-08-13 14:40:47,182 INFO [kd_datamodule.py:869] (1/4) About to get the voxceleb cuts. 2024-08-13 14:40:47,183 INFO [kd_datamodule.py:880] (1/4) Adding voxceleb2 cuts. 2024-08-13 14:40:47,185 INFO [train_multi_KD3.py:1320] (1/4) Using mux to combine Librispeech: True, WenetSpeech: False, audioset: True and voxceleb: True 2024-08-13 14:40:55,708 INFO [train_multi_KD3.py:1322] (1/4) Using mux to combine [CutSet(len=1406195) [underlying data type: ], CutSet(len=1904746) [underlying data type: ], CutSet(len=1187704) [underlying data type: ]] 2024-08-13 14:40:55,709 INFO [train_multi_KD3.py:1323] (1/4) Using weights: [1406195, 1904746, 1187704] 2024-08-13 14:40:55,709 INFO [train_multi_KD3.py:1332] (1/4) CutSet(len=4498645) [underlying data type: ] 2024-08-13 14:40:55,710 INFO [kd_datamodule.py:449] (1/4) Disable MUSAN 2024-08-13 14:40:55,710 INFO [kd_datamodule.py:489] (1/4) Disable SpecAugment 2024-08-13 14:40:55,710 INFO [kd_datamodule.py:491] (1/4) About to create train dataset 2024-08-13 14:40:55,711 INFO [kd_datamodule.py:528] (1/4) Using SimpleCutSampler 2024-08-13 14:40:55,712 INFO [kd_datamodule.py:536] (1/4) About to create train dataloader 2024-08-13 14:40:55,714 INFO [kd_datamodule.py:763] (1/4) About to get dev-clean cuts 2024-08-13 14:40:55,716 INFO [kd_datamodule.py:781] (1/4) About to get dev-other cuts 2024-08-13 14:40:55,717 INFO [kd_datamodule.py:570] (1/4) About to create dev dataset 2024-08-13 14:40:56,006 INFO [kd_datamodule.py:591] (1/4) About to create dev dataloader 2024-08-13 14:40:56,006 INFO [kd_datamodule.py:840] (1/4) About to get the test set of voxceleb1 set. 2024-08-13 14:40:56,007 INFO [kd_datamodule.py:570] (1/4) About to create dev dataset 2024-08-13 14:40:56,255 INFO [kd_datamodule.py:591] (1/4) About to create dev dataloader 2024-08-13 14:40:56,256 INFO [kd_datamodule.py:912] (1/4) About to get the audioset eval cuts. 2024-08-13 14:40:56,257 INFO [kd_datamodule.py:570] (1/4) About to create dev dataset 2024-08-13 14:40:56,728 INFO [kd_datamodule.py:591] (1/4) About to create dev dataloader 2024-08-13 14:40:56,729 INFO [train_multi_KD3.py:1412] (1/4) ['ASR_libri', 'SV_voxceleb1', 'AT_audioset'] 2024-08-13 14:40:56,729 INFO [train_multi_KD3.py:1416] (1/4) Loading grad scaler state dict 2024-08-13 14:41:09,448 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-13 14:41:13,683 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 0, loss[loss=0.09328, beats_loss=0.01019, ecapa_loss=0.0001849, whisper_loss=0.08123, over 21861.00 frames. ], tot_loss[loss=0.09328, beats_loss=0.01019, ecapa_loss=0.0001849, whisper_loss=0.08123, over 21861.00 frames. ], batch size: 91, lr: 3.98e-03, grad_scale: 5.764607523034235e+17 2024-08-13 14:41:13,684 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-13 14:41:44,625 INFO [train_multi_KD3.py:1149] (1/4) Epoch 16, validation on ASR_libri: loss=0.2541, beats_loss=0, ecapa_loss=0.0005685, whisper_loss=0.2484, over 922467.00 frames. 2024-08-13 14:41:57,987 INFO [train_multi_KD3.py:1149] (1/4) Epoch 16, validation on SV_voxceleb1: loss=0.004519, beats_loss=0, ecapa_loss=0.0004519, whisper_loss=0, over 939242.00 frames. 2024-08-13 14:43:06,399 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.7324, 2.4185, 2.3709, 2.2802, 3.1040, 2.2970, 2.6384, 2.1466], device='cuda:1') 2024-08-13 14:43:30,512 INFO [train_multi_KD3.py:1149] (1/4) Epoch 16, validation on AT_audioset: loss=0.02374, beats_loss=0.02374, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 14:43:30,513 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 30639MB 2024-08-13 14:43:31,269 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.40 vs. limit=10.0 2024-08-13 14:43:37,156 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 11 from LS+wenet, 11 from Vox, 39 fro AS 2024-08-13 14:43:40,338 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 27 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 14:44:06,798 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2173910.0, ans=0.2 2024-08-13 14:44:37,458 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 22 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-13 14:44:58,179 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 16 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-13 14:45:01,225 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 21 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-13 14:45:23,264 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2174110.0, ans=0.125 2024-08-13 14:45:58,941 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.75 vs. limit=15.0 2024-08-13 14:45:59,447 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 50, loss[loss=0.1305, beats_loss=0.00925, ecapa_loss=0.0001233, whisper_loss=0.12, over 24591.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.009858, ecapa_loss=0.000168, whisper_loss=0.0908, over 882075.26 frames. ], batch size: 88, lr: 3.98e-03, grad_scale: 5.764607523034235e+17 2024-08-13 14:46:29,811 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2174310.0, ans=0.125 2024-08-13 14:46:38,629 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.755e+01 2.624e+01 2.896e+01 3.246e+01 4.521e+01, threshold=5.792e+01, percent-clipped=0.0 2024-08-13 14:47:10,303 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 21 from LS+wenet, 25 from Vox, 22 fro AS 2024-08-13 14:47:17,622 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2174510.0, ans=0.05 2024-08-13 14:47:38,253 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2174510.0, ans=0.0 2024-08-13 14:47:41,238 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2174510.0, ans=0.125 2024-08-13 14:47:50,715 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 25 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-13 14:48:14,130 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.06 vs. limit=12.0 2024-08-13 14:48:25,269 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.43 vs. limit=15.0 2024-08-13 14:48:57,878 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 18 from LS+wenet, 9 from Vox, 27 fro AS 2024-08-13 14:49:11,606 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2174810.0, ans=0.0 2024-08-13 14:49:14,392 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 100, loss[loss=0.09624, beats_loss=0.01011, ecapa_loss=0.0001375, whisper_loss=0.08475, over 14583.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.009756, ecapa_loss=0.0001692, whisper_loss=0.09023, over 1522125.44 frames. ], batch size: 56, lr: 3.98e-03, grad_scale: 5.764607523034235e+17 2024-08-13 14:49:17,505 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.76 vs. limit=15.0 2024-08-13 14:49:22,143 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2174810.0, ans=0.2 2024-08-13 14:50:00,399 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2174910.0, ans=0.125 2024-08-13 14:50:14,557 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2174910.0, ans=0.0 2024-08-13 14:50:36,465 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2175010.0, ans=0.0 2024-08-13 14:51:36,634 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2175110.0, ans=0.125 2024-08-13 14:52:32,785 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 150, loss[loss=0.09821, beats_loss=0.01101, ecapa_loss=0.0001672, whisper_loss=0.08553, over 21542.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.009914, ecapa_loss=0.0001669, whisper_loss=0.0918, over 2041415.01 frames. ], batch size: 86, lr: 3.98e-03, grad_scale: 5.764607523034235e+17 2024-08-13 14:52:44,084 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2175310.0, ans=0.0 2024-08-13 14:53:06,879 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.105e+01 2.627e+01 2.921e+01 3.180e+01 8.449e+01, threshold=5.841e+01, percent-clipped=2.0 2024-08-13 14:53:45,067 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 25 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-13 14:54:10,471 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2175610.0, ans=0.0 2024-08-13 14:54:30,824 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2175610.0, ans=0.2 2024-08-13 14:54:40,971 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 16 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-13 14:54:57,231 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2175710.0, ans=0.125 2024-08-13 14:54:59,591 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2175710.0, ans=0.0 2024-08-13 14:55:05,155 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 200, loss[loss=0.1031, beats_loss=0.009847, ecapa_loss=0.0001732, whisper_loss=0.09157, over 17710.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.0101, ecapa_loss=0.0001659, whisper_loss=0.09207, over 2431984.50 frames. ], batch size: 71, lr: 3.98e-03, grad_scale: 5.764607523034235e+17 2024-08-13 14:55:32,648 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2175910.0, ans=0.0 2024-08-13 14:55:38,324 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2175910.0, ans=0.0 2024-08-13 14:55:39,852 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 20 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-13 14:55:46,335 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2176010.0, ans=0.125 2024-08-13 14:56:00,356 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2176110.0, ans=0.125 2024-08-13 14:56:06,258 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 32 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-13 14:56:28,046 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2176210.0, ans=0.1 2024-08-13 14:56:29,976 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2176210.0, ans=0.0 2024-08-13 14:56:32,516 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 250, loss[loss=0.1334, beats_loss=0.008587, ecapa_loss=0.0001774, whisper_loss=0.1231, over 21314.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01024, ecapa_loss=0.000164, whisper_loss=0.09269, over 2747773.77 frames. ], batch size: 81, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 14:56:46,640 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-13 14:56:47,765 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.702e+01 2.294e+01 2.573e+01 2.919e+01 5.746e+01, threshold=5.146e+01, percent-clipped=0.0 2024-08-13 14:56:51,763 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 19 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-13 14:57:07,242 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2176510.0, ans=0.2 2024-08-13 14:57:11,975 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 27 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-13 14:57:12,189 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2176510.0, ans=0.125 2024-08-13 14:57:23,509 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2176610.0, ans=0.125 2024-08-13 14:57:56,054 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 300, loss[loss=0.09287, beats_loss=0.01429, ecapa_loss=0.0001278, whisper_loss=0.0773, over 14756.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01043, ecapa_loss=0.0001631, whisper_loss=0.09142, over 2958551.17 frames. ], batch size: 58, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 14:58:09,745 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 38 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-13 14:58:29,577 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2177010.0, ans=0.1 2024-08-13 14:58:39,537 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.40 vs. limit=15.0 2024-08-13 14:58:54,840 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.44 vs. limit=10.0 2024-08-13 14:59:08,193 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.70 vs. limit=22.5 2024-08-13 14:59:15,949 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 350, loss[loss=0.11, beats_loss=0.01043, ecapa_loss=0.0001407, whisper_loss=0.09812, over 22379.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01052, ecapa_loss=0.0001627, whisper_loss=0.09108, over 3142248.15 frames. ], batch size: 86, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 14:59:31,450 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.781e+01 2.361e+01 2.664e+01 2.951e+01 4.705e+01, threshold=5.328e+01, percent-clipped=0.0 2024-08-13 14:59:36,398 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 35 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-13 14:59:55,388 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-13 15:00:32,688 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 400, loss[loss=0.1128, beats_loss=0.009968, ecapa_loss=0.0001733, whisper_loss=0.1011, over 14268.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01066, ecapa_loss=0.0001624, whisper_loss=0.0911, over 3308085.86 frames. ], batch size: 58, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:00:46,201 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 15 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-13 15:01:22,226 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2178110.0, ans=0.125 2024-08-13 15:01:41,050 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2178210.0, ans=0.1 2024-08-13 15:01:44,029 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.50 vs. limit=15.0 2024-08-13 15:01:47,806 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 450, loss[loss=0.09745, beats_loss=0.01261, ecapa_loss=0.0001102, whisper_loss=0.08374, over 19268.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01071, ecapa_loss=0.0001623, whisper_loss=0.09029, over 3430773.31 frames. ], batch size: 74, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:01:52,508 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 23 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-13 15:02:02,103 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.915e+01 2.405e+01 2.560e+01 2.967e+01 1.017e+02, threshold=5.120e+01, percent-clipped=1.0 2024-08-13 15:02:05,550 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2178410.0, ans=0.125 2024-08-13 15:02:11,200 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.41 vs. limit=10.0 2024-08-13 15:02:27,409 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.34 vs. limit=6.0 2024-08-13 15:02:46,228 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2178710.0, ans=0.1 2024-08-13 15:03:00,348 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 500, loss[loss=0.09209, beats_loss=0.01041, ecapa_loss=0.0001505, whisper_loss=0.08017, over 18080.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01059, ecapa_loss=0.0001634, whisper_loss=0.09062, over 3508419.41 frames. ], batch size: 70, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:03:40,336 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.76 vs. limit=10.0 2024-08-13 15:03:41,564 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.17 vs. limit=10.0 2024-08-13 15:03:45,175 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 27 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-13 15:03:46,953 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2179110.0, ans=0.1 2024-08-13 15:04:14,725 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 550, loss[loss=0.07767, beats_loss=0.01285, ecapa_loss=0.0001338, whisper_loss=0.06349, over 16098.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01067, ecapa_loss=0.000163, whisper_loss=0.08972, over 3637085.71 frames. ], batch size: 63, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:04:20,676 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 33 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-13 15:04:22,349 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2179310.0, ans=0.125 2024-08-13 15:04:28,260 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 17 from LS+wenet, 26 from Vox, 22 fro AS 2024-08-13 15:04:29,272 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.286e+01 2.542e+01 2.908e+01 4.014e+01, threshold=5.083e+01, percent-clipped=0.0 2024-08-13 15:04:30,764 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 23 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-13 15:04:58,036 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2179610.0, ans=0.1 2024-08-13 15:05:09,579 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 22 from LS+wenet, 13 from Vox, 39 fro AS 2024-08-13 15:05:19,216 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 21 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-13 15:05:23,589 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2024-08-13 15:05:26,334 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=25.73 vs. limit=22.5 2024-08-13 15:05:29,013 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 600, loss[loss=0.1074, beats_loss=0.01037, ecapa_loss=0.0001527, whisper_loss=0.09547, over 21003.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0107, ecapa_loss=0.000162, whisper_loss=0.08991, over 3661940.89 frames. ], batch size: 84, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:05:40,005 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2179810.0, ans=0.0 2024-08-13 15:05:45,884 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2179910.0, ans=0.125 2024-08-13 15:05:56,418 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.14 vs. limit=15.0 2024-08-13 15:06:14,342 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 25 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-13 15:06:17,397 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 18 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-13 15:06:26,846 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 17 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-13 15:06:28,130 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-13 15:06:29,903 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2180210.0, ans=0.0 2024-08-13 15:06:40,926 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 650, loss[loss=0.1086, beats_loss=0.01124, ecapa_loss=0.0001616, whisper_loss=0.09571, over 21702.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01073, ecapa_loss=0.0001621, whisper_loss=0.08976, over 3713753.10 frames. ], batch size: 85, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:06:55,575 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.927e+01 2.462e+01 2.734e+01 3.167e+01 1.676e+02, threshold=5.468e+01, percent-clipped=3.0 2024-08-13 15:07:00,086 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.69 vs. limit=15.0 2024-08-13 15:07:39,220 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2180710.0, ans=0.0 2024-08-13 15:07:45,219 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.28 vs. limit=15.0 2024-08-13 15:07:47,287 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2180710.0, ans=0.1 2024-08-13 15:07:47,335 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2180710.0, ans=0.125 2024-08-13 15:07:53,255 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2180810.0, ans=0.125 2024-08-13 15:07:54,018 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 700, loss[loss=0.1087, beats_loss=0.01076, ecapa_loss=0.000165, whisper_loss=0.09626, over 18718.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01056, ecapa_loss=0.0001633, whisper_loss=0.09052, over 3731976.10 frames. ], batch size: 74, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:08:00,342 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2180810.0, ans=0.125 2024-08-13 15:08:12,755 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2180910.0, ans=0.0 2024-08-13 15:08:12,774 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2180910.0, ans=0.0 2024-08-13 15:08:13,940 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2180910.0, ans=0.125 2024-08-13 15:08:42,997 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2181110.0, ans=0.07 2024-08-13 15:08:44,214 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2181110.0, ans=0.0 2024-08-13 15:09:06,669 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 750, loss[loss=0.1019, beats_loss=0.01129, ecapa_loss=0.0001615, whisper_loss=0.08904, over 16702.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01056, ecapa_loss=0.0001627, whisper_loss=0.09071, over 3733231.62 frames. ], batch size: 66, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:09:14,480 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2181310.0, ans=0.0 2024-08-13 15:09:21,588 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.329e+01 2.542e+01 2.977e+01 1.085e+02, threshold=5.083e+01, percent-clipped=1.0 2024-08-13 15:09:28,728 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=2181410.0, ans=0.5 2024-08-13 15:09:32,429 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 37 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-13 15:09:34,000 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 27 from LS+wenet, 29 from Vox, 22 fro AS 2024-08-13 15:09:40,763 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=2181510.0, ans=0.1 2024-08-13 15:09:40,813 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2181510.0, ans=0.125 2024-08-13 15:09:47,707 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 15 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-13 15:09:52,664 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2181610.0, ans=0.125 2024-08-13 15:09:58,920 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2181610.0, ans=0.125 2024-08-13 15:10:17,870 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 18 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-13 15:10:18,670 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=8.72 vs. limit=12.0 2024-08-13 15:10:18,924 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 800, loss[loss=0.0844, beats_loss=0.009154, ecapa_loss=0.0001443, whisper_loss=0.0738, over 17512.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01058, ecapa_loss=0.0001623, whisper_loss=0.09001, over 3748397.77 frames. ], batch size: 67, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:10:29,174 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 17 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-13 15:10:45,037 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2181910.0, ans=0.125 2024-08-13 15:10:52,670 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2182010.0, ans=0.0 2024-08-13 15:10:58,140 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 23 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-13 15:11:22,296 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-13 15:11:22,638 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2182210.0, ans=0.125 2024-08-13 15:11:25,821 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2182210.0, ans=0.125 2024-08-13 15:11:32,344 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 850, loss[loss=0.1047, beats_loss=0.01139, ecapa_loss=0.0001488, whisper_loss=0.09177, over 17800.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01063, ecapa_loss=0.0001601, whisper_loss=0.08966, over 3786628.47 frames. ], batch size: 68, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:11:46,249 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.901e+01 2.398e+01 2.663e+01 2.990e+01 7.176e+01, threshold=5.326e+01, percent-clipped=1.0 2024-08-13 15:11:47,031 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2182410.0, ans=0.125 2024-08-13 15:11:58,958 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2182410.0, ans=0.025 2024-08-13 15:12:03,268 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2182510.0, ans=0.0 2024-08-13 15:12:24,511 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2182610.0, ans=0.125 2024-08-13 15:12:44,511 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 900, loss[loss=0.1173, beats_loss=0.009497, ecapa_loss=0.0001762, whisper_loss=0.1061, over 17080.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01068, ecapa_loss=0.0001602, whisper_loss=0.08976, over 3811792.03 frames. ], batch size: 67, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:12:47,563 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 29 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-13 15:13:00,697 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2182910.0, ans=0.125 2024-08-13 15:13:16,900 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2183010.0, ans=0.2 2024-08-13 15:13:30,060 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 12 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-13 15:13:38,775 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2183110.0, ans=0.1 2024-08-13 15:13:38,919 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2183110.0, ans=0.125 2024-08-13 15:13:46,832 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2183210.0, ans=0.2 2024-08-13 15:13:59,023 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 950, loss[loss=0.123, beats_loss=0.0103, ecapa_loss=0.0001385, whisper_loss=0.1113, over 24530.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01072, ecapa_loss=0.0001587, whisper_loss=0.08967, over 3822478.95 frames. ], batch size: 91, lr: 3.97e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:14:06,678 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 15 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-13 15:14:13,554 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.758e+01 2.387e+01 2.716e+01 2.954e+01 4.081e+01, threshold=5.431e+01, percent-clipped=0.0 2024-08-13 15:14:15,455 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2183410.0, ans=0.125 2024-08-13 15:14:32,001 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2183510.0, ans=0.0 2024-08-13 15:14:36,396 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2183510.0, ans=0.125 2024-08-13 15:14:46,228 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2183610.0, ans=0.125 2024-08-13 15:14:49,992 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 23 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-13 15:14:53,213 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 23 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-13 15:14:57,786 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 21 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-13 15:15:00,443 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 36 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-13 15:15:03,822 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 22 from LS+wenet, 12 from Vox, 21 fro AS 2024-08-13 15:15:10,832 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 20 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-13 15:15:14,396 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 1000, loss[loss=0.1005, beats_loss=0.01014, ecapa_loss=0.0001361, whisper_loss=0.08897, over 19505.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01067, ecapa_loss=0.0001599, whisper_loss=0.09052, over 3821037.32 frames. ], batch size: 74, lr: 3.97e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:15:20,409 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.72 vs. limit=15.0 2024-08-13 15:16:06,397 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2184110.0, ans=0.1 2024-08-13 15:16:18,807 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2184210.0, ans=0.0 2024-08-13 15:16:29,317 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 1050, loss[loss=0.09973, beats_loss=0.01132, ecapa_loss=0.000124, whisper_loss=0.08717, over 19081.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01071, ecapa_loss=0.0001596, whisper_loss=0.09018, over 3826206.09 frames. ], batch size: 68, lr: 3.97e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:16:29,737 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=2184310.0, ans=0.95 2024-08-13 15:16:37,435 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.13 vs. limit=15.0 2024-08-13 15:16:38,360 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 19 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-13 15:16:40,484 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.81 vs. limit=15.0 2024-08-13 15:16:43,884 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.952e+01 2.435e+01 2.686e+01 3.027e+01 6.105e+01, threshold=5.372e+01, percent-clipped=2.0 2024-08-13 15:16:47,123 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 26 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-13 15:16:50,946 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.00 vs. limit=5.0 2024-08-13 15:17:09,957 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 22 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-13 15:17:10,174 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2184510.0, ans=0.1 2024-08-13 15:17:10,218 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2184510.0, ans=0.125 2024-08-13 15:17:13,676 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2184610.0, ans=0.125 2024-08-13 15:17:15,609 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2184610.0, ans=0.125 2024-08-13 15:17:18,149 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 21 from LS+wenet, 18 from Vox, 16 fro AS 2024-08-13 15:17:18,447 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2184610.0, ans=0.125 2024-08-13 15:17:31,382 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2184710.0, ans=0.0 2024-08-13 15:17:35,771 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2184710.0, ans=0.0 2024-08-13 15:17:42,369 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 22 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-13 15:17:43,779 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 1100, loss[loss=0.0933, beats_loss=0.01176, ecapa_loss=0.0001344, whisper_loss=0.08019, over 21375.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0107, ecapa_loss=0.0001581, whisper_loss=0.09044, over 3809911.82 frames. ], batch size: 83, lr: 3.97e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:17:50,819 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2184810.0, ans=0.5 2024-08-13 15:17:58,056 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2184910.0, ans=0.125 2024-08-13 15:18:22,051 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2185010.0, ans=0.2 2024-08-13 15:18:26,981 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2185010.0, ans=0.1 2024-08-13 15:18:39,814 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 39 from LS+wenet, 9 from Vox, 42 fro AS 2024-08-13 15:18:40,943 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 24 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-13 15:18:42,227 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-13 15:18:42,551 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2185110.0, ans=0.2 2024-08-13 15:18:46,954 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 25 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-13 15:18:56,304 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2185210.0, ans=0.1 2024-08-13 15:19:00,552 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 1150, loss[loss=0.1008, beats_loss=0.01261, ecapa_loss=0.0001296, whisper_loss=0.08686, over 18159.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0107, ecapa_loss=0.0001583, whisper_loss=0.09021, over 3819972.09 frames. ], batch size: 69, lr: 3.97e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:19:08,810 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 15 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-13 15:19:09,107 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2185310.0, ans=0.1 2024-08-13 15:19:16,615 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.961e+01 2.495e+01 2.743e+01 3.086e+01 4.866e+01, threshold=5.485e+01, percent-clipped=0.0 2024-08-13 15:19:28,890 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2185410.0, ans=0.05 2024-08-13 15:19:31,228 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 23 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-13 15:19:34,199 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 17 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-13 15:19:50,861 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2185610.0, ans=0.125 2024-08-13 15:20:28,761 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 23 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-13 15:20:39,957 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2185710.0, ans=0.1 2024-08-13 15:20:45,452 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 1200, loss[loss=0.1036, beats_loss=0.01125, ecapa_loss=0.000161, whisper_loss=0.09073, over 21869.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01073, ecapa_loss=0.0001578, whisper_loss=0.09008, over 3835880.37 frames. ], batch size: 89, lr: 3.97e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:20:56,206 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-13 15:20:57,015 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.17 vs. limit=10.0 2024-08-13 15:21:04,839 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 23 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-13 15:21:06,649 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2185910.0, ans=0.1 2024-08-13 15:21:08,205 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2185910.0, ans=0.0 2024-08-13 15:21:11,953 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2185910.0, ans=0.2 2024-08-13 15:21:22,935 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2186010.0, ans=0.0 2024-08-13 15:21:24,201 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2186010.0, ans=0.0 2024-08-13 15:21:29,375 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2186010.0, ans=0.125 2024-08-13 15:21:37,266 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2186110.0, ans=0.0 2024-08-13 15:21:38,661 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2186110.0, ans=0.0 2024-08-13 15:21:45,489 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2186110.0, ans=0.125 2024-08-13 15:22:04,312 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 29 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-13 15:22:07,676 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 1250, loss[loss=0.1072, beats_loss=0.01059, ecapa_loss=0.0001519, whisper_loss=0.09507, over 23138.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0108, ecapa_loss=0.000156, whisper_loss=0.0901, over 3845296.65 frames. ], batch size: 90, lr: 3.97e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:22:14,403 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2186310.0, ans=0.1 2024-08-13 15:22:22,786 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.801e+01 2.204e+01 2.472e+01 2.749e+01 3.995e+01, threshold=4.944e+01, percent-clipped=0.0 2024-08-13 15:22:36,977 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2186510.0, ans=0.1 2024-08-13 15:22:44,664 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 32 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-13 15:23:00,685 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2186610.0, ans=0.125 2024-08-13 15:23:25,579 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 1300, loss[loss=0.09297, beats_loss=0.01196, ecapa_loss=0.0001358, whisper_loss=0.07965, over 14116.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01076, ecapa_loss=0.0001558, whisper_loss=0.09066, over 3848180.26 frames. ], batch size: 54, lr: 3.97e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:23:41,741 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2186910.0, ans=0.125 2024-08-13 15:23:50,481 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 16 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-13 15:23:58,332 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2187010.0, ans=10.0 2024-08-13 15:24:03,016 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.63 vs. limit=12.0 2024-08-13 15:24:32,368 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 25 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-13 15:24:33,052 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.03 vs. limit=22.5 2024-08-13 15:24:33,989 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 29 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-13 15:24:41,934 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 1350, loss[loss=0.0809, beats_loss=0.01323, ecapa_loss=0.0001351, whisper_loss=0.06632, over 22895.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01075, ecapa_loss=0.0001558, whisper_loss=0.09132, over 3847964.31 frames. ], batch size: 93, lr: 3.97e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:24:55,300 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-13 15:25:00,082 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.385e+01 2.728e+01 3.101e+01 1.009e+02, threshold=5.456e+01, percent-clipped=3.0 2024-08-13 15:25:09,089 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-13 15:25:09,849 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2187410.0, ans=0.1 2024-08-13 15:25:29,762 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.63 vs. limit=15.0 2024-08-13 15:25:49,155 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 24 from LS+wenet, 15 from Vox, 14 fro AS 2024-08-13 15:25:57,892 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 1400, loss[loss=0.1137, beats_loss=0.0112, ecapa_loss=0.0001885, whisper_loss=0.1006, over 22014.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.0107, ecapa_loss=0.0001576, whisper_loss=0.09121, over 3842300.20 frames. ], batch size: 90, lr: 3.97e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:26:18,539 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2187910.0, ans=0.125 2024-08-13 15:26:21,167 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 15 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-13 15:26:29,269 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.35 vs. limit=15.0 2024-08-13 15:26:33,742 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2188010.0, ans=0.125 2024-08-13 15:26:33,750 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2188010.0, ans=0.125 2024-08-13 15:26:57,466 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 21 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-13 15:27:05,186 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2188210.0, ans=0.0 2024-08-13 15:27:23,878 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 1450, loss[loss=0.09813, beats_loss=0.01147, ecapa_loss=0.000136, whisper_loss=0.08529, over 19983.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01079, ecapa_loss=0.0001558, whisper_loss=0.08984, over 3833557.73 frames. ], batch size: 77, lr: 3.97e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:27:27,429 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 22 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-13 15:27:40,065 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.16 vs. limit=15.0 2024-08-13 15:27:40,701 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.749e+01 2.325e+01 2.552e+01 2.880e+01 5.017e+01, threshold=5.104e+01, percent-clipped=1.0 2024-08-13 15:27:43,301 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2188410.0, ans=0.125 2024-08-13 15:27:48,848 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.99 vs. limit=10.0 2024-08-13 15:27:49,428 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 22 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-13 15:28:05,591 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2188510.0, ans=0.1 2024-08-13 15:28:17,119 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2188610.0, ans=0.125 2024-08-13 15:28:20,460 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2188610.0, ans=0.125 2024-08-13 15:28:41,450 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2188710.0, ans=0.125 2024-08-13 15:28:43,972 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 1500, loss[loss=0.09077, beats_loss=0.01287, ecapa_loss=0.00014, whisper_loss=0.0765, over 18060.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01083, ecapa_loss=0.0001558, whisper_loss=0.08905, over 3818736.16 frames. ], batch size: 70, lr: 3.97e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:29:06,738 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2188910.0, ans=0.0 2024-08-13 15:29:08,260 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2188910.0, ans=0.0 2024-08-13 15:29:14,324 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 34 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-13 15:29:19,731 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2189010.0, ans=0.2 2024-08-13 15:29:22,572 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2189010.0, ans=0.1 2024-08-13 15:29:59,135 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 18 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-13 15:29:59,501 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2189210.0, ans=0.2 2024-08-13 15:30:04,735 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 1550, loss[loss=0.1175, beats_loss=0.01095, ecapa_loss=0.0001749, whisper_loss=0.1048, over 21892.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01084, ecapa_loss=0.0001552, whisper_loss=0.08983, over 3847612.08 frames. ], batch size: 89, lr: 3.97e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:30:08,718 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.28 vs. limit=15.0 2024-08-13 15:30:19,301 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2189310.0, ans=0.125 2024-08-13 15:30:23,699 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.750e+01 2.248e+01 2.490e+01 2.864e+01 4.046e+01, threshold=4.981e+01, percent-clipped=0.0 2024-08-13 15:30:43,730 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.01 vs. limit=15.0 2024-08-13 15:30:51,608 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.35 vs. limit=15.0 2024-08-13 15:31:24,897 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2189710.0, ans=0.0 2024-08-13 15:31:26,990 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 1600, loss[loss=0.0806, beats_loss=0.01302, ecapa_loss=0.0001426, whisper_loss=0.06615, over 19113.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0109, ecapa_loss=0.0001543, whisper_loss=0.08971, over 3869160.98 frames. ], batch size: 77, lr: 3.97e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:31:27,852 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2189810.0, ans=0.2 2024-08-13 15:31:29,214 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.80 vs. limit=15.0 2024-08-13 15:31:44,521 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.889e+05 2024-08-13 15:31:55,279 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 29 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-13 15:32:26,368 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 21 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-13 15:32:32,396 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 27 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-13 15:32:46,625 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 1650, loss[loss=0.0825, beats_loss=0.01139, ecapa_loss=0.0001445, whisper_loss=0.06967, over 15230.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0108, ecapa_loss=0.0001544, whisper_loss=0.09046, over 3844556.94 frames. ], batch size: 62, lr: 3.97e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:33:03,886 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.370e+01 2.654e+01 3.120e+01 7.882e+01, threshold=5.308e+01, percent-clipped=3.0 2024-08-13 15:33:15,729 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2190410.0, ans=0.125 2024-08-13 15:33:22,128 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2190510.0, ans=0.07 2024-08-13 15:33:36,600 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2190610.0, ans=0.125 2024-08-13 15:33:37,770 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.605e-02 2024-08-13 15:33:38,034 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.09 vs. limit=6.0 2024-08-13 15:33:41,450 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.82 vs. limit=22.5 2024-08-13 15:33:50,840 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.63 vs. limit=6.0 2024-08-13 15:33:54,887 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=2190710.0, ans=15.0 2024-08-13 15:34:00,809 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.19 vs. limit=6.0 2024-08-13 15:34:04,933 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 1700, loss[loss=0.1166, beats_loss=0.008516, ecapa_loss=0.0001822, whisper_loss=0.1063, over 20960.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01075, ecapa_loss=0.0001554, whisper_loss=0.09112, over 3881860.09 frames. ], batch size: 81, lr: 3.97e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:34:10,189 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2190810.0, ans=0.0 2024-08-13 15:34:10,250 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2190810.0, ans=0.125 2024-08-13 15:34:20,764 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=8.194e-03 2024-08-13 15:34:30,318 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 31 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-13 15:34:38,645 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 36 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-13 15:34:38,720 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2191010.0, ans=0.015 2024-08-13 15:34:39,187 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2191010.0, ans=0.125 2024-08-13 15:34:56,108 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 38 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-13 15:35:00,859 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 18 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-13 15:35:05,553 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.40 vs. limit=15.0 2024-08-13 15:35:06,631 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2191210.0, ans=0.0 2024-08-13 15:35:10,760 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-13 15:35:21,444 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 1750, loss[loss=0.08626, beats_loss=0.01208, ecapa_loss=0.0001469, whisper_loss=0.07271, over 22637.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01075, ecapa_loss=0.0001558, whisper_loss=0.09096, over 3918659.73 frames. ], batch size: 92, lr: 3.97e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:35:28,680 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-13 15:35:29,876 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 28 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-13 15:35:37,235 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.731e+01 2.448e+01 2.728e+01 3.089e+01 6.360e+01, threshold=5.456e+01, percent-clipped=3.0 2024-08-13 15:35:45,458 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 23 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-13 15:36:27,231 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2191710.0, ans=0.1 2024-08-13 15:36:28,251 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 30 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-13 15:36:35,824 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 1800, loss[loss=0.1017, beats_loss=0.009334, ecapa_loss=0.0001829, whisper_loss=0.09053, over 23409.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0107, ecapa_loss=0.0001564, whisper_loss=0.09063, over 3908600.18 frames. ], batch size: 94, lr: 3.97e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:36:38,828 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 22 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-13 15:36:40,789 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2191810.0, ans=0.125 2024-08-13 15:36:42,876 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 25 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-13 15:37:00,820 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2191910.0, ans=0.125 2024-08-13 15:37:08,009 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2192010.0, ans=0.1 2024-08-13 15:37:20,019 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2192110.0, ans=0.015 2024-08-13 15:37:50,775 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 1850, loss[loss=0.1016, beats_loss=0.009521, ecapa_loss=0.0001818, whisper_loss=0.09027, over 18449.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01067, ecapa_loss=0.0001569, whisper_loss=0.09087, over 3884109.83 frames. ], batch size: 76, lr: 3.97e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:37:55,178 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 19 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-13 15:38:03,240 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2192310.0, ans=0.07 2024-08-13 15:38:06,965 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.120e+01 2.440e+01 2.626e+01 2.890e+01 6.922e+01, threshold=5.252e+01, percent-clipped=1.0 2024-08-13 15:38:19,335 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.31 vs. limit=15.0 2024-08-13 15:38:25,976 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2192510.0, ans=0.1 2024-08-13 15:38:32,844 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 19 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-13 15:38:42,086 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-13 15:38:42,577 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.80 vs. limit=15.0 2024-08-13 15:39:03,413 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 1900, loss[loss=0.08913, beats_loss=0.01072, ecapa_loss=0.0001472, whisper_loss=0.07693, over 16394.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01067, ecapa_loss=0.0001567, whisper_loss=0.09149, over 3873635.33 frames. ], batch size: 66, lr: 3.97e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:39:03,566 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 27 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-13 15:39:04,756 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 26 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-13 15:39:09,072 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.93 vs. limit=5.0 2024-08-13 15:39:16,969 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 29 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-13 15:39:30,837 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 15:39:34,609 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.83 vs. limit=15.0 2024-08-13 15:40:18,153 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 1950, loss[loss=0.1201, beats_loss=0.01018, ecapa_loss=0.0001794, whisper_loss=0.1081, over 20322.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01065, ecapa_loss=0.0001575, whisper_loss=0.09158, over 3868655.35 frames. ], batch size: 81, lr: 3.97e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:40:18,265 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 14 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-13 15:40:20,406 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2193310.0, ans=0.0 2024-08-13 15:40:34,160 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.351e+01 2.582e+01 2.888e+01 8.249e+01, threshold=5.164e+01, percent-clipped=1.0 2024-08-13 15:40:34,617 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2193410.0, ans=0.125 2024-08-13 15:41:12,995 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 26 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-13 15:41:21,556 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2193710.0, ans=0.125 2024-08-13 15:41:27,355 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 22 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-13 15:41:30,525 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 25 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-13 15:41:33,072 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 2000, loss[loss=0.1103, beats_loss=0.01102, ecapa_loss=0.0001452, whisper_loss=0.09778, over 17484.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01065, ecapa_loss=0.0001576, whisper_loss=0.09126, over 3826868.47 frames. ], batch size: 69, lr: 3.97e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:41:40,859 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=2193810.0, ans=0.025 2024-08-13 15:41:52,973 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.21 vs. limit=12.0 2024-08-13 15:42:03,109 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.195e+05 2024-08-13 15:42:06,201 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.68 vs. limit=15.0 2024-08-13 15:42:14,537 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2194010.0, ans=0.125 2024-08-13 15:42:23,916 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 29 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-13 15:42:27,329 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2194110.0, ans=0.0 2024-08-13 15:42:38,679 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2194210.0, ans=0.1 2024-08-13 15:42:38,682 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2194210.0, ans=0.0 2024-08-13 15:42:38,869 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.77 vs. limit=15.0 2024-08-13 15:42:47,759 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 2050, loss[loss=0.1057, beats_loss=0.01213, ecapa_loss=0.0001654, whisper_loss=0.09193, over 15639.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01067, ecapa_loss=0.0001575, whisper_loss=0.09032, over 3810879.43 frames. ], batch size: 64, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:42:48,189 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2194310.0, ans=0.0 2024-08-13 15:42:49,647 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=2194310.0, ans=6.0 2024-08-13 15:42:50,975 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.37 vs. limit=22.5 2024-08-13 15:43:03,940 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.358e+01 2.622e+01 3.012e+01 4.492e+01, threshold=5.245e+01, percent-clipped=0.0 2024-08-13 15:43:14,865 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 15 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-13 15:43:22,508 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 20 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-13 15:43:36,213 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2194610.0, ans=0.125 2024-08-13 15:43:46,024 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2194710.0, ans=0.0 2024-08-13 15:43:48,691 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 17 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-13 15:43:48,893 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=2194710.0, ans=0.2 2024-08-13 15:43:48,983 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2194710.0, ans=0.125 2024-08-13 15:44:02,124 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 2100, loss[loss=0.1054, beats_loss=0.009475, ecapa_loss=0.0001615, whisper_loss=0.09431, over 19729.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01068, ecapa_loss=0.0001565, whisper_loss=0.08993, over 3813163.76 frames. ], batch size: 76, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:44:10,507 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 15 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-13 15:44:21,054 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.772e+01 2024-08-13 15:44:25,647 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.07 vs. limit=12.0 2024-08-13 15:44:49,828 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2195110.0, ans=0.125 2024-08-13 15:44:58,519 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 24 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-13 15:45:01,705 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2195210.0, ans=0.125 2024-08-13 15:45:04,154 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 16 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-13 15:45:14,429 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 2150, loss[loss=0.113, beats_loss=0.01058, ecapa_loss=0.0001325, whisper_loss=0.1011, over 23101.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01065, ecapa_loss=0.0001566, whisper_loss=0.09052, over 3804913.26 frames. ], batch size: 91, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:45:30,874 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.824e+01 2.427e+01 2.711e+01 3.071e+01 5.101e+01, threshold=5.422e+01, percent-clipped=0.0 2024-08-13 15:45:53,724 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2195510.0, ans=0.0 2024-08-13 15:45:53,725 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2195510.0, ans=0.125 2024-08-13 15:46:00,907 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2195610.0, ans=0.125 2024-08-13 15:46:14,698 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 25 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-13 15:46:20,217 INFO [train_multi_KD3.py:844] (1/4) A total of 96 cuts. 29 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-13 15:46:29,491 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 2200, loss[loss=0.1417, beats_loss=0.009651, ecapa_loss=0.0001502, whisper_loss=0.1306, over 16840.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01079, ecapa_loss=0.000156, whisper_loss=0.09093, over 3849997.87 frames. ], batch size: 61, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:46:43,799 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 26 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-13 15:46:49,335 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2195910.0, ans=0.0 2024-08-13 15:46:50,991 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-13 15:47:11,847 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 24 from LS+wenet, 29 from Vox, 22 fro AS 2024-08-13 15:47:16,257 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2196110.0, ans=0.125 2024-08-13 15:47:27,438 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 26 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-13 15:47:45,302 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 2250, loss[loss=0.1183, beats_loss=0.01007, ecapa_loss=0.0001557, whisper_loss=0.1067, over 23609.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01081, ecapa_loss=0.0001585, whisper_loss=0.09191, over 3867370.22 frames. ], batch size: 92, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:47:52,951 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 38 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-13 15:47:53,308 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=14.01 vs. limit=15.0 2024-08-13 15:48:01,990 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.872e+01 2.332e+01 2.611e+01 2.967e+01 5.729e+01, threshold=5.223e+01, percent-clipped=1.0 2024-08-13 15:48:19,671 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 22 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-13 15:48:30,431 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.55 vs. limit=15.0 2024-08-13 15:48:42,281 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.92 vs. limit=15.0 2024-08-13 15:48:55,354 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2196710.0, ans=0.125 2024-08-13 15:49:00,476 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 2300, loss[loss=0.1199, beats_loss=0.0115, ecapa_loss=0.0001632, whisper_loss=0.1067, over 23424.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.0108, ecapa_loss=0.0001579, whisper_loss=0.0921, over 3874040.45 frames. ], batch size: 93, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:49:01,471 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.83 vs. limit=12.0 2024-08-13 15:49:05,482 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2196810.0, ans=0.0 2024-08-13 15:49:27,810 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.90 vs. limit=12.0 2024-08-13 15:49:44,832 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.01 vs. limit=15.0 2024-08-13 15:49:48,552 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2197110.0, ans=0.125 2024-08-13 15:50:14,900 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 2350, loss[loss=0.094, beats_loss=0.01025, ecapa_loss=0.0001744, whisper_loss=0.08201, over 16063.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.0108, ecapa_loss=0.0001588, whisper_loss=0.09198, over 3856375.45 frames. ], batch size: 64, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:50:16,454 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 21 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-13 15:50:19,379 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-13 15:50:28,748 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2197410.0, ans=0.125 2024-08-13 15:50:31,449 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.996e+01 2.477e+01 2.777e+01 3.066e+01 6.337e+01, threshold=5.554e+01, percent-clipped=1.0 2024-08-13 15:50:32,938 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 22 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-13 15:50:44,790 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-13 15:50:48,938 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 20 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-13 15:50:49,185 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2197510.0, ans=0.125 2024-08-13 15:50:50,215 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 21 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-13 15:51:03,032 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.62 vs. limit=22.5 2024-08-13 15:51:19,555 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2197710.0, ans=0.125 2024-08-13 15:51:30,143 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 2400, loss[loss=0.08184, beats_loss=0.01331, ecapa_loss=0.000155, whisper_loss=0.06698, over 21973.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01071, ecapa_loss=0.0001595, whisper_loss=0.09212, over 3849720.25 frames. ], batch size: 94, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:51:31,898 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2197810.0, ans=0.125 2024-08-13 15:51:34,756 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2197810.0, ans=0.125 2024-08-13 15:52:04,094 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2198010.0, ans=0.125 2024-08-13 15:52:07,972 INFO [train_multi_KD3.py:844] (1/4) A total of 96 cuts. 16 from LS+wenet, 27 from Vox, 53 fro AS 2024-08-13 15:52:08,263 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2198010.0, ans=0.2 2024-08-13 15:52:09,389 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 33 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-13 15:52:20,604 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 26 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-13 15:52:37,477 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 27 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-13 15:52:42,501 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 2450, loss[loss=0.1054, beats_loss=0.01026, ecapa_loss=0.0001738, whisper_loss=0.09342, over 22676.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01075, ecapa_loss=0.0001601, whisper_loss=0.09138, over 3866628.97 frames. ], batch size: 92, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:52:42,602 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-13 15:52:44,091 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2198310.0, ans=0.125 2024-08-13 15:52:51,703 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.489e-01 2024-08-13 15:52:58,744 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.941e+01 2.483e+01 2.773e+01 3.111e+01 4.520e+01, threshold=5.546e+01, percent-clipped=0.0 2024-08-13 15:53:00,266 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 23 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-13 15:53:10,261 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=2198510.0, ans=22.5 2024-08-13 15:53:32,244 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2198610.0, ans=0.125 2024-08-13 15:53:39,179 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 30 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-13 15:53:53,735 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 2500, loss[loss=0.1101, beats_loss=0.01057, ecapa_loss=0.0001686, whisper_loss=0.09786, over 15022.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01073, ecapa_loss=0.0001611, whisper_loss=0.09147, over 3853230.19 frames. ], batch size: 55, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:53:56,874 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2198810.0, ans=0.125 2024-08-13 15:53:57,764 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 22 from LS+wenet, 30 from Vox, 39 fro AS 2024-08-13 15:54:01,035 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 22 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-13 15:54:06,671 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 15 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-13 15:54:32,507 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 26 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-13 15:54:59,241 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2199210.0, ans=0.125 2024-08-13 15:55:06,665 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 2550, loss[loss=0.09685, beats_loss=0.01292, ecapa_loss=0.0001263, whisper_loss=0.08266, over 22212.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01083, ecapa_loss=0.0001602, whisper_loss=0.09089, over 3857657.50 frames. ], batch size: 90, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:55:06,993 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2199310.0, ans=0.125 2024-08-13 15:55:07,057 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2199310.0, ans=0.125 2024-08-13 15:55:21,475 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.351e+01 2.676e+01 3.107e+01 6.569e+01, threshold=5.353e+01, percent-clipped=1.0 2024-08-13 15:55:27,799 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.09 vs. limit=12.0 2024-08-13 15:55:54,595 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2199610.0, ans=0.0 2024-08-13 15:56:04,604 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff3.min_abs, batch_count=2199710.0, ans=0.2 2024-08-13 15:56:17,873 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 2600, loss[loss=0.1061, beats_loss=0.01301, ecapa_loss=9.998e-05, whisper_loss=0.09207, over 19788.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01086, ecapa_loss=0.0001596, whisper_loss=0.09108, over 3880016.51 frames. ], batch size: 73, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:56:23,685 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 22 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-13 15:56:27,542 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2199810.0, ans=0.125 2024-08-13 15:56:41,055 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 31 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-13 15:57:20,655 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2200210.0, ans=0.125 2024-08-13 15:57:33,079 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 2650, loss[loss=0.1084, beats_loss=0.01095, ecapa_loss=0.0001304, whisper_loss=0.0961, over 19492.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01083, ecapa_loss=0.0001591, whisper_loss=0.0913, over 3862070.49 frames. ], batch size: 74, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:57:33,254 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 18 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-13 15:57:43,985 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 25 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-13 15:57:44,242 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2200310.0, ans=0.0 2024-08-13 15:57:49,547 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.816e+01 2.280e+01 2.561e+01 2.894e+01 4.049e+01, threshold=5.122e+01, percent-clipped=0.0 2024-08-13 15:57:57,139 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 19 from LS+wenet, 18 from Vox, 49 fro AS 2024-08-13 15:58:05,313 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2200510.0, ans=0.125 2024-08-13 15:58:06,659 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=2200510.0, ans=0.2 2024-08-13 15:58:12,232 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 18 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-13 15:58:39,446 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2200710.0, ans=0.1 2024-08-13 15:58:46,463 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 36 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-13 15:58:52,145 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 2700, loss[loss=0.07562, beats_loss=0.01417, ecapa_loss=0.0001209, whisper_loss=0.06024, over 14374.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01093, ecapa_loss=0.0001595, whisper_loss=0.09045, over 3843391.80 frames. ], batch size: 58, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:59:00,023 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 27 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-13 15:59:05,747 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2200810.0, ans=0.1 2024-08-13 15:59:11,362 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2200910.0, ans=0.125 2024-08-13 15:59:41,956 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 21 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-13 15:59:46,406 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2201110.0, ans=0.2 2024-08-13 15:59:54,485 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 15 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-13 16:00:17,254 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 2750, loss[loss=0.101, beats_loss=0.009026, ecapa_loss=0.0001735, whisper_loss=0.09029, over 18486.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01089, ecapa_loss=0.0001599, whisper_loss=0.09058, over 3862982.46 frames. ], batch size: 76, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 16:00:19,224 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 24 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-13 16:00:27,733 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.93 vs. limit=15.0 2024-08-13 16:00:34,195 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.818e+01 2.357e+01 2.643e+01 3.055e+01 4.900e+01, threshold=5.285e+01, percent-clipped=0.0 2024-08-13 16:00:36,222 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2201410.0, ans=0.1 2024-08-13 16:00:54,408 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2201510.0, ans=0.125 2024-08-13 16:01:20,312 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 15 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-13 16:01:31,469 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 32 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-13 16:01:33,845 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 2800, loss[loss=0.08498, beats_loss=0.01231, ecapa_loss=0.0001509, whisper_loss=0.07116, over 23187.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01091, ecapa_loss=0.0001589, whisper_loss=0.09094, over 3881033.74 frames. ], batch size: 94, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 16:01:48,582 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=2201910.0, ans=10.0 2024-08-13 16:02:29,523 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 26 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-13 16:02:31,827 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.73 vs. limit=15.0 2024-08-13 16:02:43,712 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2202210.0, ans=0.125 2024-08-13 16:02:50,893 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 2850, loss[loss=0.1126, beats_loss=0.01007, ecapa_loss=0.000186, whisper_loss=0.1007, over 22337.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01087, ecapa_loss=0.0001596, whisper_loss=0.09161, over 3882323.41 frames. ], batch size: 88, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 16:03:00,612 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 27 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-13 16:03:03,596 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2202310.0, ans=0.0 2024-08-13 16:03:08,642 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.979e+01 2.307e+01 2.674e+01 3.004e+01 5.549e+01, threshold=5.349e+01, percent-clipped=1.0 2024-08-13 16:03:09,330 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 20 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-13 16:03:19,067 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2202410.0, ans=0.125 2024-08-13 16:03:24,805 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2202510.0, ans=0.1 2024-08-13 16:03:27,356 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 22 from LS+wenet, 21 from Vox, 11 fro AS 2024-08-13 16:03:27,713 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 16:03:48,535 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 20 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-13 16:03:50,273 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2202610.0, ans=0.2 2024-08-13 16:03:51,962 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 18 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-13 16:03:53,482 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 11 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-13 16:03:56,344 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2202710.0, ans=0.125 2024-08-13 16:03:59,843 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2202710.0, ans=0.125 2024-08-13 16:04:10,217 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 2900, loss[loss=0.08958, beats_loss=0.01088, ecapa_loss=0.0001399, whisper_loss=0.0773, over 14986.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01082, ecapa_loss=0.000161, whisper_loss=0.09179, over 3905353.13 frames. ], batch size: 58, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 16:04:10,391 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 19 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-13 16:04:17,029 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2202810.0, ans=0.125 2024-08-13 16:04:27,618 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2202910.0, ans=0.1 2024-08-13 16:04:39,739 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 31 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-13 16:04:48,199 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 22 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-13 16:04:53,181 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2203010.0, ans=0.125 2024-08-13 16:05:08,467 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2203210.0, ans=0.2 2024-08-13 16:05:10,533 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 18 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-13 16:05:22,460 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.74 vs. limit=15.0 2024-08-13 16:05:23,044 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 2950, loss[loss=0.09713, beats_loss=0.009531, ecapa_loss=0.0001701, whisper_loss=0.08589, over 22317.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01084, ecapa_loss=0.0001615, whisper_loss=0.09116, over 3924512.32 frames. ], batch size: 92, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 16:05:38,914 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.996e+01 2.341e+01 2.613e+01 3.038e+01 5.265e+01, threshold=5.226e+01, percent-clipped=0.0 2024-08-13 16:05:43,567 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.87 vs. limit=12.0 2024-08-13 16:05:57,825 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2203510.0, ans=10.0 2024-08-13 16:06:07,284 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2203610.0, ans=0.1 2024-08-13 16:06:16,490 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2203610.0, ans=0.125 2024-08-13 16:06:32,166 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 3000, loss[loss=0.08602, beats_loss=0.008407, ecapa_loss=0.000155, whisper_loss=0.07607, over 15909.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01078, ecapa_loss=0.0001618, whisper_loss=0.09164, over 3939325.52 frames. ], batch size: 60, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 16:06:32,166 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-13 16:07:12,428 INFO [train_multi_KD3.py:1149] (1/4) Epoch 16, validation on ASR_libri: loss=0.253, beats_loss=0, ecapa_loss=0.0005592, whisper_loss=0.2474, over 922467.00 frames. 2024-08-13 16:07:30,279 INFO [train_multi_KD3.py:1149] (1/4) Epoch 16, validation on SV_voxceleb1: loss=0.004334, beats_loss=0, ecapa_loss=0.0004334, whisper_loss=0, over 939242.00 frames. 2024-08-13 16:09:55,731 INFO [train_multi_KD3.py:1149] (1/4) Epoch 16, validation on AT_audioset: loss=0.02373, beats_loss=0.02373, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 16:09:55,735 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 30639MB 2024-08-13 16:10:02,635 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2203810.0, ans=0.125 2024-08-13 16:10:11,800 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2203910.0, ans=0.125 2024-08-13 16:10:14,541 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-13 16:10:47,909 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.54 vs. limit=15.0 2024-08-13 16:10:47,977 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.96 vs. limit=6.0 2024-08-13 16:11:01,410 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.57 vs. limit=12.0 2024-08-13 16:11:04,946 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.79 vs. limit=22.5 2024-08-13 16:11:14,718 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2204210.0, ans=0.035 2024-08-13 16:11:27,486 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 3050, loss[loss=0.1081, beats_loss=0.01236, ecapa_loss=0.0001677, whisper_loss=0.09409, over 22624.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01086, ecapa_loss=0.0001612, whisper_loss=0.09132, over 3937675.86 frames. ], batch size: 93, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 16:11:30,415 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 29 from LS+wenet, 30 from Vox, 27 fro AS 2024-08-13 16:11:35,951 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2204310.0, ans=0.0 2024-08-13 16:11:38,546 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2204310.0, ans=0.1 2024-08-13 16:11:41,126 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 28 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-13 16:11:42,240 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.018e+01 2.448e+01 2.786e+01 3.089e+01 5.850e+01, threshold=5.572e+01, percent-clipped=2.0 2024-08-13 16:11:53,618 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2204510.0, ans=0.2 2024-08-13 16:11:57,686 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2204510.0, ans=0.125 2024-08-13 16:12:13,864 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2204610.0, ans=0.1 2024-08-13 16:12:30,507 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2204710.0, ans=0.0 2024-08-13 16:12:33,580 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2204710.0, ans=0.07 2024-08-13 16:12:34,913 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.50 vs. limit=15.0 2024-08-13 16:12:35,603 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 3100, loss[loss=0.1077, beats_loss=0.0102, ecapa_loss=0.0001653, whisper_loss=0.09581, over 22459.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01076, ecapa_loss=0.0001634, whisper_loss=0.09199, over 3899826.20 frames. ], batch size: 92, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 16:13:06,444 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 23 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-13 16:13:14,113 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.40 vs. limit=12.0 2024-08-13 16:13:29,135 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2205110.0, ans=0.5 2024-08-13 16:13:40,146 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 21 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-13 16:13:45,563 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 3150, loss[loss=0.1142, beats_loss=0.0127, ecapa_loss=0.0001185, whisper_loss=0.1004, over 14939.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01078, ecapa_loss=0.0001632, whisper_loss=0.09185, over 3867896.31 frames. ], batch size: 56, lr: 3.95e-03, grad_scale: 5.764607523034235e+17 2024-08-13 16:13:53,365 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.96 vs. limit=15.0 2024-08-13 16:14:00,893 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.782e+01 2.370e+01 2.702e+01 3.002e+01 4.700e+01, threshold=5.405e+01, percent-clipped=0.0 2024-08-13 16:14:12,156 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2205510.0, ans=0.2 2024-08-13 16:14:18,930 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2205510.0, ans=0.1 2024-08-13 16:14:20,405 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2205510.0, ans=0.1 2024-08-13 16:14:35,520 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2205610.0, ans=0.125 2024-08-13 16:14:46,583 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.87 vs. limit=15.0 2024-08-13 16:14:55,567 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 28 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-13 16:14:56,993 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 3200, loss[loss=0.1189, beats_loss=0.008147, ecapa_loss=0.0001692, whisper_loss=0.1091, over 19960.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01071, ecapa_loss=0.0001639, whisper_loss=0.09269, over 3861301.91 frames. ], batch size: 78, lr: 3.95e-03, grad_scale: 5.764607523034235e+17 2024-08-13 16:15:06,333 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.89 vs. limit=22.5 2024-08-13 16:15:14,132 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-13 16:15:18,493 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 21 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-13 16:15:25,784 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2205910.0, ans=0.1 2024-08-13 16:15:27,012 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2206010.0, ans=0.125 2024-08-13 16:15:27,600 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.39 vs. limit=15.0 2024-08-13 16:15:29,738 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2206010.0, ans=0.0 2024-08-13 16:15:30,170 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.65 vs. limit=15.0 2024-08-13 16:15:31,104 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2206010.0, ans=0.125 2024-08-13 16:15:36,460 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 24 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-13 16:15:38,253 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=2206010.0, ans=15.0 2024-08-13 16:15:45,513 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.43 vs. limit=22.5 2024-08-13 16:15:50,444 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 30 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-13 16:15:50,967 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.93 vs. limit=15.0 2024-08-13 16:15:55,353 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 23 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-13 16:16:01,359 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2206210.0, ans=0.95 2024-08-13 16:16:02,299 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 23 from LS+wenet, 25 from Vox, 47 fro AS 2024-08-13 16:16:10,666 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 3250, loss[loss=0.1184, beats_loss=0.01027, ecapa_loss=0.0001722, whisper_loss=0.1064, over 22344.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01076, ecapa_loss=0.000163, whisper_loss=0.09262, over 3890490.62 frames. ], batch size: 92, lr: 3.95e-03, grad_scale: 5.764607523034235e+17 2024-08-13 16:16:18,951 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 20 from LS+wenet, 20 from Vox, 17 fro AS 2024-08-13 16:16:22,116 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.80 vs. limit=22.5 2024-08-13 16:16:24,292 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2206410.0, ans=0.125 2024-08-13 16:16:25,160 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.975e+01 2.428e+01 2.754e+01 3.023e+01 4.086e+01, threshold=5.507e+01, percent-clipped=0.0 2024-08-13 16:16:34,908 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-13 16:16:35,824 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.52 vs. limit=6.0 2024-08-13 16:16:42,752 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2206510.0, ans=0.1 2024-08-13 16:16:52,222 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.42 vs. limit=15.0 2024-08-13 16:17:22,015 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 3300, loss[loss=0.103, beats_loss=0.009558, ecapa_loss=0.0001847, whisper_loss=0.09159, over 17819.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01064, ecapa_loss=0.000164, whisper_loss=0.09309, over 3859720.52 frames. ], batch size: 70, lr: 3.95e-03, grad_scale: 5.764607523034235e+17 2024-08-13 16:17:24,040 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.47 vs. limit=15.0 2024-08-13 16:17:26,451 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 27 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-13 16:17:29,609 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 22 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-13 16:17:39,768 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2206910.0, ans=0.125 2024-08-13 16:17:40,786 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 24 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-13 16:18:04,017 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.06 vs. limit=6.0 2024-08-13 16:18:07,891 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2207110.0, ans=0.125 2024-08-13 16:18:13,812 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2207110.0, ans=0.1 2024-08-13 16:18:15,179 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 21 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-13 16:18:32,064 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-13 16:18:32,846 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.69 vs. limit=6.0 2024-08-13 16:18:33,277 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 3350, loss[loss=0.1137, beats_loss=0.009605, ecapa_loss=0.0001505, whisper_loss=0.1026, over 23771.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01062, ecapa_loss=0.0001643, whisper_loss=0.09266, over 3883338.30 frames. ], batch size: 91, lr: 3.95e-03, grad_scale: 5.764607523034235e+17 2024-08-13 16:18:45,778 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 17 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-13 16:18:46,143 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2207310.0, ans=0.125 2024-08-13 16:18:48,566 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 22 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-13 16:18:49,646 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.895e+01 2.421e+01 2.639e+01 2.919e+01 4.017e+01, threshold=5.278e+01, percent-clipped=0.0 2024-08-13 16:18:50,392 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2207410.0, ans=0.1 2024-08-13 16:18:51,677 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 24 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-13 16:18:56,935 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2207410.0, ans=0.015 2024-08-13 16:18:59,617 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 34 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-13 16:19:00,682 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 19 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-13 16:19:08,647 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2207510.0, ans=0.125 2024-08-13 16:19:11,577 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2207510.0, ans=0.125 2024-08-13 16:19:32,856 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2207710.0, ans=0.0 2024-08-13 16:19:45,944 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 3400, loss[loss=0.1049, beats_loss=0.00957, ecapa_loss=0.0002176, whisper_loss=0.09315, over 22270.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01065, ecapa_loss=0.0001632, whisper_loss=0.09283, over 3898115.09 frames. ], batch size: 93, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:20:18,286 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 26 from LS+wenet, 26 from Vox, 24 fro AS 2024-08-13 16:20:31,631 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.72 vs. limit=10.0 2024-08-13 16:20:36,108 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2208110.0, ans=0.125 2024-08-13 16:20:45,955 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2208210.0, ans=0.125 2024-08-13 16:20:56,403 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 3450, loss[loss=0.1199, beats_loss=0.01057, ecapa_loss=0.0001661, whisper_loss=0.1077, over 21202.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01072, ecapa_loss=0.0001634, whisper_loss=0.09216, over 3888080.21 frames. ], batch size: 82, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:21:06,623 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 27 from LS+wenet, 11 from Vox, 20 fro AS 2024-08-13 16:21:11,653 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.838e+01 2.477e+01 2.807e+01 3.322e+01 1.527e+02, threshold=5.614e+01, percent-clipped=5.0 2024-08-13 16:21:28,823 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.518e+05 2024-08-13 16:21:28,867 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2208510.0, ans=0.125 2024-08-13 16:21:38,728 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2208610.0, ans=0.1 2024-08-13 16:21:39,894 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2208610.0, ans=0.0 2024-08-13 16:21:43,309 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2208610.0, ans=0.1 2024-08-13 16:21:47,736 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2208610.0, ans=0.1 2024-08-13 16:22:06,949 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 3500, loss[loss=0.1067, beats_loss=0.01093, ecapa_loss=0.000168, whisper_loss=0.09411, over 21593.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01076, ecapa_loss=0.0001637, whisper_loss=0.09203, over 3892871.73 frames. ], batch size: 90, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:22:07,520 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=5.096e-03 2024-08-13 16:22:11,096 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2208810.0, ans=0.125 2024-08-13 16:22:17,314 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2208810.0, ans=0.125 2024-08-13 16:22:19,765 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2208810.0, ans=0.125 2024-08-13 16:22:27,597 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2208910.0, ans=0.1 2024-08-13 16:22:28,807 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2208910.0, ans=0.125 2024-08-13 16:22:38,336 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 24 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-13 16:22:43,657 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 17 from Vox, 49 fro AS 2024-08-13 16:22:50,572 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2209110.0, ans=0.1 2024-08-13 16:23:00,972 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-13 16:23:20,194 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 3550, loss[loss=0.1157, beats_loss=0.008805, ecapa_loss=0.0001749, whisper_loss=0.1052, over 20138.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01069, ecapa_loss=0.0001639, whisper_loss=0.09168, over 3900287.35 frames. ], batch size: 77, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:23:32,083 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 29 from LS+wenet, 11 from Vox, 42 fro AS 2024-08-13 16:23:36,559 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.460e+01 2.758e+01 3.003e+01 5.341e+01, threshold=5.516e+01, percent-clipped=0.0 2024-08-13 16:24:07,980 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2209610.0, ans=0.0 2024-08-13 16:24:27,534 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 17 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-13 16:24:33,635 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 25 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-13 16:24:34,748 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 3600, loss[loss=0.1123, beats_loss=0.008743, ecapa_loss=0.0001669, whisper_loss=0.1019, over 16558.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01078, ecapa_loss=0.0001635, whisper_loss=0.09098, over 3872528.65 frames. ], batch size: 64, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:24:34,908 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 17 from LS+wenet, 27 from Vox, 25 fro AS 2024-08-13 16:24:38,044 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 24 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-13 16:24:40,925 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.22 vs. limit=10.0 2024-08-13 16:24:50,165 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 15 from LS+wenet, 23 from Vox, 19 fro AS 2024-08-13 16:25:09,539 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2210010.0, ans=0.1 2024-08-13 16:25:17,700 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 21 from LS+wenet, 18 from Vox, 51 fro AS 2024-08-13 16:25:18,144 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.74 vs. limit=15.0 2024-08-13 16:25:46,298 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.72 vs. limit=6.0 2024-08-13 16:25:46,879 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 3650, loss[loss=0.1143, beats_loss=0.009154, ecapa_loss=0.0001449, whisper_loss=0.1037, over 14848.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01079, ecapa_loss=0.0001637, whisper_loss=0.09076, over 3851858.56 frames. ], batch size: 54, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:25:47,513 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.21 vs. limit=15.0 2024-08-13 16:26:02,826 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.319e+01 2.686e+01 3.119e+01 4.845e+01, threshold=5.372e+01, percent-clipped=0.0 2024-08-13 16:26:23,786 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2210510.0, ans=0.0 2024-08-13 16:26:32,837 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 34 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-13 16:26:35,657 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 22 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-13 16:26:43,287 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.25 vs. limit=15.0 2024-08-13 16:26:48,243 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2210710.0, ans=0.0 2024-08-13 16:26:52,155 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 25 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-13 16:26:55,520 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.33 vs. limit=15.0 2024-08-13 16:26:56,166 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 3700, loss[loss=0.1049, beats_loss=0.007921, ecapa_loss=0.0001614, whisper_loss=0.09534, over 17419.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01076, ecapa_loss=0.0001646, whisper_loss=0.09097, over 3861074.71 frames. ], batch size: 64, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:27:19,736 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 24 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-13 16:27:20,067 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2210910.0, ans=0.125 2024-08-13 16:27:22,900 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.65 vs. limit=12.0 2024-08-13 16:27:27,960 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-13 16:27:41,581 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2211110.0, ans=0.125 2024-08-13 16:28:03,121 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 3750, loss[loss=0.1064, beats_loss=0.01139, ecapa_loss=0.0001762, whisper_loss=0.09325, over 21882.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01083, ecapa_loss=0.0001635, whisper_loss=0.09032, over 3830808.12 frames. ], batch size: 91, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:28:12,431 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-13 16:28:17,734 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.983e+01 2.410e+01 2.677e+01 3.009e+01 6.113e+01, threshold=5.354e+01, percent-clipped=1.0 2024-08-13 16:28:36,015 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2211510.0, ans=0.1 2024-08-13 16:28:59,496 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2211710.0, ans=0.125 2024-08-13 16:29:04,353 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-13 16:29:04,687 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2211710.0, ans=0.1 2024-08-13 16:29:08,279 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 3800, loss[loss=0.1264, beats_loss=0.008527, ecapa_loss=0.0001276, whisper_loss=0.1166, over 16060.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01086, ecapa_loss=0.0001645, whisper_loss=0.09017, over 3854068.01 frames. ], batch size: 55, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:29:11,065 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2211810.0, ans=0.2 2024-08-13 16:29:11,408 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.65 vs. limit=22.5 2024-08-13 16:29:29,346 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.69 vs. limit=22.5 2024-08-13 16:29:53,904 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 19 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-13 16:29:57,699 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 20 from LS+wenet, 22 from Vox, 53 fro AS 2024-08-13 16:29:58,046 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2212110.0, ans=0.125 2024-08-13 16:30:04,429 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 30 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-13 16:30:08,515 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2212210.0, ans=0.125 2024-08-13 16:30:12,110 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-13 16:30:13,169 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 3850, loss[loss=0.1195, beats_loss=0.01099, ecapa_loss=0.0001946, whisper_loss=0.1065, over 22478.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0109, ecapa_loss=0.0001637, whisper_loss=0.09076, over 3866296.92 frames. ], batch size: 92, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:30:17,486 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2212310.0, ans=0.125 2024-08-13 16:30:24,107 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2212310.0, ans=0.125 2024-08-13 16:30:27,496 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.456e+01 2.760e+01 3.181e+01 8.437e+01, threshold=5.521e+01, percent-clipped=2.0 2024-08-13 16:30:47,740 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2212510.0, ans=0.1 2024-08-13 16:31:17,484 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-13 16:31:18,609 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 3900, loss[loss=0.1042, beats_loss=0.01192, ecapa_loss=0.0001847, whisper_loss=0.09048, over 21622.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01085, ecapa_loss=0.0001633, whisper_loss=0.09196, over 3879437.85 frames. ], batch size: 91, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:31:26,919 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 25 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-13 16:31:30,681 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 25 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-13 16:31:31,056 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2212910.0, ans=0.1 2024-08-13 16:31:31,407 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.93 vs. limit=15.0 2024-08-13 16:31:33,535 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 28 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-13 16:31:37,477 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2212910.0, ans=0.125 2024-08-13 16:31:38,375 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 30 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-13 16:31:42,278 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 31 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-13 16:31:43,767 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=2213010.0, ans=10.0 2024-08-13 16:31:50,011 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 37 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-13 16:32:14,694 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2213210.0, ans=0.125 2024-08-13 16:32:22,422 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2213310.0, ans=0.05 2024-08-13 16:32:23,147 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 3950, loss[loss=0.1077, beats_loss=0.01159, ecapa_loss=0.0001621, whisper_loss=0.09448, over 21102.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01071, ecapa_loss=0.0001651, whisper_loss=0.09279, over 3895906.20 frames. ], batch size: 88, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:32:24,750 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2213310.0, ans=0.125 2024-08-13 16:32:37,359 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.503e+01 2.824e+01 3.168e+01 4.630e+01, threshold=5.649e+01, percent-clipped=0.0 2024-08-13 16:32:41,683 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 29 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-13 16:32:41,911 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2213410.0, ans=0.2 2024-08-13 16:32:55,216 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.84 vs. limit=6.0 2024-08-13 16:33:09,088 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 28 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-13 16:33:13,395 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2213610.0, ans=0.125 2024-08-13 16:33:25,006 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2213710.0, ans=0.125 2024-08-13 16:33:28,433 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 4000, loss[loss=0.1055, beats_loss=0.01326, ecapa_loss=0.000119, whisper_loss=0.09103, over 20159.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01069, ecapa_loss=0.0001644, whisper_loss=0.09317, over 3908836.41 frames. ], batch size: 78, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:33:47,467 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2213910.0, ans=0.125 2024-08-13 16:33:51,470 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2213910.0, ans=0.1 2024-08-13 16:34:01,471 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 28 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-13 16:34:01,766 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2214010.0, ans=0.125 2024-08-13 16:34:05,841 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2214010.0, ans=0.125 2024-08-13 16:34:11,156 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2214110.0, ans=0.125 2024-08-13 16:34:13,706 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2214110.0, ans=0.0 2024-08-13 16:34:23,739 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 27 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-13 16:34:25,145 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 20 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-13 16:34:25,290 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2214210.0, ans=0.125 2024-08-13 16:34:29,178 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2214210.0, ans=0.125 2024-08-13 16:34:33,794 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 4050, loss[loss=0.1183, beats_loss=0.007909, ecapa_loss=0.0001773, whisper_loss=0.1086, over 23059.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01078, ecapa_loss=0.0001641, whisper_loss=0.09265, over 3897003.07 frames. ], batch size: 91, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:34:39,825 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2214310.0, ans=0.125 2024-08-13 16:34:40,996 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-13 16:34:45,051 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2214310.0, ans=0.125 2024-08-13 16:34:48,344 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.740e+01 2.510e+01 2.777e+01 3.045e+01 5.508e+01, threshold=5.554e+01, percent-clipped=0.0 2024-08-13 16:34:50,043 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2214410.0, ans=0.125 2024-08-13 16:34:53,977 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2214410.0, ans=0.1 2024-08-13 16:34:58,026 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.48 vs. limit=15.0 2024-08-13 16:35:17,494 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2214610.0, ans=0.0 2024-08-13 16:35:28,633 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 25 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-13 16:35:31,874 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.04 vs. limit=15.0 2024-08-13 16:35:33,927 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 16 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-13 16:35:38,234 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.10 vs. limit=15.0 2024-08-13 16:35:38,958 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 4100, loss[loss=0.08568, beats_loss=0.01301, ecapa_loss=0.0001404, whisper_loss=0.07127, over 20533.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01079, ecapa_loss=0.0001646, whisper_loss=0.09226, over 3892850.33 frames. ], batch size: 83, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:35:43,135 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2214810.0, ans=0.2 2024-08-13 16:36:01,719 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2214910.0, ans=0.05 2024-08-13 16:36:15,540 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 12 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-13 16:36:30,187 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2215210.0, ans=0.2 2024-08-13 16:36:35,373 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2215210.0, ans=0.0 2024-08-13 16:36:36,274 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 18 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-13 16:36:43,824 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 4150, loss[loss=0.1283, beats_loss=0.009861, ecapa_loss=0.0001576, whisper_loss=0.1169, over 18744.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.0109, ecapa_loss=0.0001637, whisper_loss=0.09183, over 3884598.87 frames. ], batch size: 71, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:36:44,180 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2215310.0, ans=0.0 2024-08-13 16:36:51,797 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2215310.0, ans=0.125 2024-08-13 16:36:57,844 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.841e+01 2.337e+01 2.557e+01 2.975e+01 8.257e+01, threshold=5.114e+01, percent-clipped=2.0 2024-08-13 16:37:14,088 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2215510.0, ans=0.125 2024-08-13 16:37:20,496 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 17 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-13 16:37:34,534 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-13 16:37:40,116 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2215710.0, ans=0.125 2024-08-13 16:37:42,642 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2215710.0, ans=0.125 2024-08-13 16:37:42,678 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2215710.0, ans=0.1 2024-08-13 16:37:48,718 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 4200, loss[loss=0.0811, beats_loss=0.01153, ecapa_loss=0.0001415, whisper_loss=0.06815, over 21597.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01076, ecapa_loss=0.0001652, whisper_loss=0.09225, over 3882653.29 frames. ], batch size: 89, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:37:55,651 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2215810.0, ans=0.2 2024-08-13 16:38:08,741 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2215910.0, ans=0.05 2024-08-13 16:38:12,913 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2215910.0, ans=0.0 2024-08-13 16:38:25,072 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.04 vs. limit=15.0 2024-08-13 16:38:38,009 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2216110.0, ans=10.0 2024-08-13 16:38:52,799 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.72 vs. limit=22.5 2024-08-13 16:38:56,494 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 4250, loss[loss=0.1142, beats_loss=0.012, ecapa_loss=0.0001705, whisper_loss=0.1004, over 22321.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01084, ecapa_loss=0.0001646, whisper_loss=0.09052, over 3877512.66 frames. ], batch size: 92, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:39:12,433 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.310e+01 2.639e+01 2.854e+01 4.176e+01, threshold=5.278e+01, percent-clipped=0.0 2024-08-13 16:39:17,385 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2216410.0, ans=10.0 2024-08-13 16:39:19,217 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2216410.0, ans=0.0 2024-08-13 16:39:22,398 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.13 vs. limit=15.0 2024-08-13 16:39:29,407 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2216510.0, ans=0.0 2024-08-13 16:39:38,242 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-13 16:39:45,303 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 23 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-13 16:40:04,262 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 34 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-13 16:40:11,892 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 4300, loss[loss=0.1072, beats_loss=0.01119, ecapa_loss=0.0001664, whisper_loss=0.09434, over 22701.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01083, ecapa_loss=0.0001632, whisper_loss=0.09021, over 3862749.10 frames. ], batch size: 90, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:40:23,826 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-13 16:40:26,659 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.85 vs. limit=15.0 2024-08-13 16:40:39,842 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2216910.0, ans=0.2 2024-08-13 16:40:55,434 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2217010.0, ans=0.125 2024-08-13 16:41:17,528 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2217210.0, ans=0.0 2024-08-13 16:41:24,150 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2217210.0, ans=0.1 2024-08-13 16:41:27,565 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 4350, loss[loss=0.1225, beats_loss=0.008972, ecapa_loss=0.0001674, whisper_loss=0.1118, over 22785.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01069, ecapa_loss=0.0001657, whisper_loss=0.09094, over 3861382.00 frames. ], batch size: 90, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:41:34,387 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2217310.0, ans=0.0 2024-08-13 16:41:36,929 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2217310.0, ans=0.125 2024-08-13 16:41:41,868 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.008e+01 2.464e+01 2.794e+01 3.090e+01 4.694e+01, threshold=5.588e+01, percent-clipped=0.0 2024-08-13 16:41:43,734 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2217410.0, ans=0.125 2024-08-13 16:41:49,157 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2217410.0, ans=0.125 2024-08-13 16:41:51,315 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-13 16:42:01,923 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2217510.0, ans=0.95 2024-08-13 16:42:02,933 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2217510.0, ans=0.0 2024-08-13 16:42:06,902 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2217610.0, ans=0.125 2024-08-13 16:42:08,883 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.84 vs. limit=6.0 2024-08-13 16:42:20,188 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=5.482e+01 2024-08-13 16:42:31,689 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 24 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-13 16:42:32,838 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 4400, loss[loss=0.1127, beats_loss=0.01052, ecapa_loss=0.0001576, whisper_loss=0.1006, over 17445.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01071, ecapa_loss=0.0001667, whisper_loss=0.09085, over 3839491.60 frames. ], batch size: 69, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:42:33,242 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2217810.0, ans=0.125 2024-08-13 16:42:34,227 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 15 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-13 16:42:37,228 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2217810.0, ans=0.125 2024-08-13 16:42:38,506 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2217810.0, ans=0.125 2024-08-13 16:42:43,848 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2217810.0, ans=0.125 2024-08-13 16:42:47,556 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2217910.0, ans=0.125 2024-08-13 16:43:10,435 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2218110.0, ans=10.0 2024-08-13 16:43:33,514 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 19 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-13 16:43:37,276 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 4450, loss[loss=0.1088, beats_loss=0.01288, ecapa_loss=0.0001412, whisper_loss=0.09452, over 20434.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01066, ecapa_loss=0.0001653, whisper_loss=0.09153, over 3847467.53 frames. ], batch size: 81, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:43:49,665 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2218410.0, ans=0.125 2024-08-13 16:43:50,944 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2218410.0, ans=0.0 2024-08-13 16:43:51,885 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.861e+01 2.377e+01 2.551e+01 3.070e+01 5.212e+01, threshold=5.103e+01, percent-clipped=0.0 2024-08-13 16:43:57,374 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2218410.0, ans=0.125 2024-08-13 16:44:10,174 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 18 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-13 16:44:32,915 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-13 16:44:38,146 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-13 16:44:38,293 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=2218710.0, ans=0.05 2024-08-13 16:44:41,664 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 4500, loss[loss=0.1204, beats_loss=0.009465, ecapa_loss=0.0001537, whisper_loss=0.1094, over 14563.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01066, ecapa_loss=0.0001652, whisper_loss=0.09146, over 3844032.24 frames. ], batch size: 57, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:44:43,461 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2218810.0, ans=0.125 2024-08-13 16:44:51,781 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.25 vs. limit=6.0 2024-08-13 16:44:53,889 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2218910.0, ans=0.07 2024-08-13 16:45:02,835 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2218910.0, ans=0.125 2024-08-13 16:45:12,239 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2219010.0, ans=0.125 2024-08-13 16:45:28,434 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2219110.0, ans=0.125 2024-08-13 16:45:32,643 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2219110.0, ans=0.125 2024-08-13 16:45:39,580 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2219210.0, ans=0.0 2024-08-13 16:45:49,613 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2219210.0, ans=0.0 2024-08-13 16:45:55,021 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 4550, loss[loss=0.1209, beats_loss=0.008705, ecapa_loss=0.0001523, whisper_loss=0.1107, over 19682.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01068, ecapa_loss=0.0001654, whisper_loss=0.09089, over 3846375.30 frames. ], batch size: 75, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:46:12,356 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.690e+01 2.447e+01 2.788e+01 3.187e+01 5.560e+01, threshold=5.575e+01, percent-clipped=2.0 2024-08-13 16:46:21,742 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 17 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-13 16:46:25,993 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.47 vs. limit=22.5 2024-08-13 16:46:48,081 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2219610.0, ans=0.1 2024-08-13 16:46:54,513 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2219610.0, ans=0.125 2024-08-13 16:47:12,595 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 17 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-13 16:47:12,823 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2219710.0, ans=0.125 2024-08-13 16:47:24,277 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 23 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-13 16:47:31,893 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 4600, loss[loss=0.08244, beats_loss=0.009904, ecapa_loss=0.0001789, whisper_loss=0.07075, over 14136.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01074, ecapa_loss=0.000164, whisper_loss=0.09083, over 3856716.96 frames. ], batch size: 55, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:47:32,232 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-13 16:47:46,271 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2219810.0, ans=0.1 2024-08-13 16:47:49,289 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2219810.0, ans=0.125 2024-08-13 16:48:01,278 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.48 vs. limit=15.0 2024-08-13 16:48:03,322 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2219910.0, ans=0.1 2024-08-13 16:48:05,965 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 28 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-13 16:48:13,554 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2219910.0, ans=0.015 2024-08-13 16:48:20,795 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2220010.0, ans=0.125 2024-08-13 16:48:33,353 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 15 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-13 16:49:07,030 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=4.987e-02 2024-08-13 16:49:24,663 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 4650, loss[loss=0.1105, beats_loss=0.009045, ecapa_loss=0.0001466, whisper_loss=0.09997, over 17459.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01075, ecapa_loss=0.0001635, whisper_loss=0.09083, over 3862015.27 frames. ], batch size: 67, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:49:34,104 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 26 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-13 16:49:46,211 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=2220410.0, ans=0.025 2024-08-13 16:49:50,315 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.768e+01 2.519e+01 2.721e+01 2.978e+01 4.976e+01, threshold=5.443e+01, percent-clipped=0.0 2024-08-13 16:49:57,068 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2220410.0, ans=0.0 2024-08-13 16:50:01,675 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2220410.0, ans=0.125 2024-08-13 16:50:04,269 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2220410.0, ans=0.05 2024-08-13 16:50:19,609 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2220510.0, ans=0.125 2024-08-13 16:50:37,667 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2220610.0, ans=0.0 2024-08-13 16:51:08,760 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 22 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-13 16:51:19,494 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 4700, loss[loss=0.09418, beats_loss=0.0113, ecapa_loss=0.0001449, whisper_loss=0.08143, over 21938.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01072, ecapa_loss=0.0001641, whisper_loss=0.09118, over 3873111.14 frames. ], batch size: 86, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:51:33,624 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 26 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-13 16:51:34,999 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 13 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-13 16:51:50,707 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2220910.0, ans=0.0 2024-08-13 16:52:20,412 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=8.816e+01 2024-08-13 16:52:55,609 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2221210.0, ans=0.125 2024-08-13 16:52:58,087 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 18 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-13 16:53:01,947 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 4750, loss[loss=0.09019, beats_loss=0.01106, ecapa_loss=0.0001816, whisper_loss=0.07731, over 20597.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01074, ecapa_loss=0.0001631, whisper_loss=0.0912, over 3877642.36 frames. ], batch size: 88, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:53:17,622 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.389e+01 2.725e+01 3.065e+01 4.342e+01, threshold=5.451e+01, percent-clipped=0.0 2024-08-13 16:53:30,099 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 26 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-13 16:53:36,616 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 12 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-13 16:54:02,867 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2221710.0, ans=0.2 2024-08-13 16:54:02,884 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2221710.0, ans=0.0 2024-08-13 16:54:03,309 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.46 vs. limit=15.0 2024-08-13 16:54:14,492 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 4800, loss[loss=0.105, beats_loss=0.009863, ecapa_loss=0.0002009, whisper_loss=0.09317, over 22597.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01079, ecapa_loss=0.0001633, whisper_loss=0.09105, over 3897003.19 frames. ], batch size: 93, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:54:24,498 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2221810.0, ans=0.1 2024-08-13 16:54:38,697 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 27 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-13 16:54:47,398 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2222010.0, ans=0.125 2024-08-13 16:54:51,785 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.22 vs. limit=15.0 2024-08-13 16:55:08,072 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2222110.0, ans=0.1 2024-08-13 16:55:09,939 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 14 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-13 16:55:12,202 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.22 vs. limit=12.0 2024-08-13 16:55:18,161 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 22 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-13 16:55:18,471 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2222210.0, ans=0.09899494936611666 2024-08-13 16:55:21,515 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2222210.0, ans=0.1 2024-08-13 16:55:29,326 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 18 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-13 16:55:35,392 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 4850, loss[loss=0.1169, beats_loss=0.009559, ecapa_loss=0.0001839, whisper_loss=0.1055, over 16842.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01079, ecapa_loss=0.0001641, whisper_loss=0.09153, over 3909229.65 frames. ], batch size: 67, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:55:40,681 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 18 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-13 16:55:44,317 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2222310.0, ans=0.0 2024-08-13 16:55:52,648 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.037e+01 2.475e+01 2.681e+01 3.157e+01 5.324e+01, threshold=5.362e+01, percent-clipped=0.0 2024-08-13 16:56:07,332 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2222510.0, ans=0.125 2024-08-13 16:56:14,590 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2222510.0, ans=0.2 2024-08-13 16:56:20,587 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 25 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-13 16:56:23,731 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2222610.0, ans=0.125 2024-08-13 16:56:36,085 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=2222710.0, ans=15.0 2024-08-13 16:56:37,630 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.70 vs. limit=12.0 2024-08-13 16:56:38,551 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2222710.0, ans=0.2 2024-08-13 16:56:49,660 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2222710.0, ans=0.1 2024-08-13 16:56:51,828 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 4900, loss[loss=0.08043, beats_loss=0.01268, ecapa_loss=0.000185, whisper_loss=0.06591, over 21058.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01083, ecapa_loss=0.0001631, whisper_loss=0.09105, over 3876640.70 frames. ], batch size: 91, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:56:53,871 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2222810.0, ans=0.2 2024-08-13 16:57:01,688 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.11 vs. limit=15.0 2024-08-13 16:57:06,465 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2222910.0, ans=0.1 2024-08-13 16:57:09,589 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2222910.0, ans=0.125 2024-08-13 16:57:14,512 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2222910.0, ans=0.125 2024-08-13 16:57:14,612 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2222910.0, ans=0.1 2024-08-13 16:57:25,129 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2223010.0, ans=0.1 2024-08-13 16:57:41,885 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.21 vs. limit=15.0 2024-08-13 16:57:51,283 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2223110.0, ans=0.125 2024-08-13 16:57:53,852 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2024-08-13 16:57:56,747 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-13 16:58:08,912 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 4950, loss[loss=0.09798, beats_loss=0.01041, ecapa_loss=0.0001742, whisper_loss=0.08583, over 22500.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01084, ecapa_loss=0.0001649, whisper_loss=0.09044, over 3846454.78 frames. ], batch size: 91, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:58:23,217 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2223410.0, ans=0.125 2024-08-13 16:58:26,186 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.788e+01 2.318e+01 2.496e+01 2.819e+01 1.833e+02, threshold=4.991e+01, percent-clipped=1.0 2024-08-13 16:58:29,993 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.02 vs. limit=15.0 2024-08-13 16:59:01,826 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2223610.0, ans=0.2 2024-08-13 16:59:22,352 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2223710.0, ans=0.125 2024-08-13 16:59:26,555 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 5000, loss[loss=0.1177, beats_loss=0.007147, ecapa_loss=0.0002137, whisper_loss=0.1084, over 18141.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01085, ecapa_loss=0.0001639, whisper_loss=0.09063, over 3866089.67 frames. ], batch size: 74, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:59:36,021 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2223810.0, ans=0.0 2024-08-13 16:59:37,777 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2223810.0, ans=0.1 2024-08-13 16:59:37,819 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2223810.0, ans=0.125 2024-08-13 16:59:42,336 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2223910.0, ans=0.1 2024-08-13 16:59:53,343 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 29 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-13 16:59:57,957 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 28 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-13 17:00:01,146 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 25 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-13 17:00:25,457 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2224210.0, ans=0.025 2024-08-13 17:00:30,742 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.52 vs. limit=22.5 2024-08-13 17:00:35,090 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2224210.0, ans=0.125 2024-08-13 17:00:41,654 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 5050, loss[loss=0.1069, beats_loss=0.01322, ecapa_loss=0.0001785, whisper_loss=0.09192, over 22730.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.011, ecapa_loss=0.0001631, whisper_loss=0.09059, over 3865231.83 frames. ], batch size: 94, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:00:56,190 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 17 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-13 17:01:00,006 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.86 vs. limit=22.5 2024-08-13 17:01:00,457 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.944e+01 2.341e+01 2.653e+01 3.152e+01 4.271e+01, threshold=5.307e+01, percent-clipped=0.0 2024-08-13 17:01:08,540 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2224410.0, ans=0.5 2024-08-13 17:01:30,488 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2224610.0, ans=0.125 2024-08-13 17:01:33,235 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2224610.0, ans=0.125 2024-08-13 17:01:33,741 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.04 vs. limit=15.0 2024-08-13 17:01:39,246 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2224610.0, ans=0.125 2024-08-13 17:01:47,133 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2224710.0, ans=0.2 2024-08-13 17:01:51,374 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2224710.0, ans=0.0 2024-08-13 17:01:57,619 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 5100, loss[loss=0.07376, beats_loss=0.01579, ecapa_loss=0.0001173, whisper_loss=0.05679, over 13883.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01104, ecapa_loss=0.0001625, whisper_loss=0.09097, over 3881921.33 frames. ], batch size: 55, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:02:07,846 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.whiten.whitening_limit, batch_count=2224810.0, ans=12.0 2024-08-13 17:02:12,584 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.25 vs. limit=6.0 2024-08-13 17:02:16,011 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 22 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-13 17:02:17,695 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2224910.0, ans=0.0 2024-08-13 17:02:18,960 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 34 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-13 17:02:19,181 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2224910.0, ans=0.0 2024-08-13 17:02:26,033 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 24 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-13 17:02:34,947 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2225010.0, ans=0.2 2024-08-13 17:02:41,700 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 15 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-13 17:02:45,228 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2225110.0, ans=0.1 2024-08-13 17:02:55,692 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2225110.0, ans=0.025 2024-08-13 17:02:58,920 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=2225210.0, ans=15.0 2024-08-13 17:02:59,922 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 18 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-13 17:03:12,567 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 22 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-13 17:03:14,258 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 5150, loss[loss=0.09963, beats_loss=0.01128, ecapa_loss=0.0001684, whisper_loss=0.08667, over 20170.00 frames. ], tot_loss[loss=0.103, beats_loss=0.011, ecapa_loss=0.0001623, whisper_loss=0.09041, over 3869107.31 frames. ], batch size: 83, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:03:18,541 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2225310.0, ans=0.125 2024-08-13 17:03:29,599 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.384e+01 2.654e+01 2.972e+01 6.587e+01, threshold=5.307e+01, percent-clipped=1.0 2024-08-13 17:03:44,580 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.39 vs. limit=12.0 2024-08-13 17:03:47,069 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 25 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-13 17:03:57,554 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2225610.0, ans=0.125 2024-08-13 17:04:28,253 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 5200, loss[loss=0.101, beats_loss=0.01309, ecapa_loss=0.0001576, whisper_loss=0.08631, over 21358.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.011, ecapa_loss=0.0001614, whisper_loss=0.09085, over 3852896.20 frames. ], batch size: 88, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:04:33,871 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 21 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-13 17:04:47,107 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2225910.0, ans=0.1 2024-08-13 17:04:49,125 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.41 vs. limit=22.5 2024-08-13 17:04:55,087 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.29 vs. limit=22.5 2024-08-13 17:05:12,336 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2226110.0, ans=0.2 2024-08-13 17:05:15,889 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.14 vs. limit=6.0 2024-08-13 17:05:41,217 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 5250, loss[loss=0.0936, beats_loss=0.0119, ecapa_loss=0.0001898, whisper_loss=0.0798, over 13656.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01089, ecapa_loss=0.0001633, whisper_loss=0.09062, over 3866719.41 frames. ], batch size: 57, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:05:58,276 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.945e+01 2.417e+01 2.576e+01 2.914e+01 4.655e+01, threshold=5.152e+01, percent-clipped=0.0 2024-08-13 17:05:59,027 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=8.493e-01 2024-08-13 17:06:03,634 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2226410.0, ans=0.0 2024-08-13 17:06:09,290 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2226410.0, ans=0.05 2024-08-13 17:06:27,464 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2226610.0, ans=0.125 2024-08-13 17:06:32,823 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 25 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-13 17:06:39,721 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2226610.0, ans=0.0 2024-08-13 17:06:40,742 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 28 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-13 17:06:54,184 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.15 vs. limit=6.0 2024-08-13 17:06:56,351 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-13 17:06:59,311 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 5300, loss[loss=0.09853, beats_loss=0.01039, ecapa_loss=0.0001677, whisper_loss=0.08647, over 15834.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01086, ecapa_loss=0.000163, whisper_loss=0.09083, over 3861032.01 frames. ], batch size: 64, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:07:01,590 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.88 vs. limit=15.0 2024-08-13 17:07:14,106 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.83 vs. limit=12.0 2024-08-13 17:07:38,855 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.85 vs. limit=22.5 2024-08-13 17:07:41,255 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2227010.0, ans=0.0 2024-08-13 17:07:44,606 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2227110.0, ans=0.125 2024-08-13 17:08:10,847 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 7 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-13 17:08:16,587 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 5350, loss[loss=0.1134, beats_loss=0.009611, ecapa_loss=0.0001748, whisper_loss=0.1021, over 20364.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01084, ecapa_loss=0.0001624, whisper_loss=0.09097, over 3867203.69 frames. ], batch size: 82, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:08:21,528 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 33 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-13 17:08:32,866 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2227410.0, ans=0.0 2024-08-13 17:08:34,132 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.978e+01 2.323e+01 2.552e+01 2.858e+01 4.460e+01, threshold=5.104e+01, percent-clipped=0.0 2024-08-13 17:08:56,579 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-13 17:09:15,915 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 25 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-13 17:09:16,158 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 17:09:35,293 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 5400, loss[loss=0.1195, beats_loss=0.009847, ecapa_loss=0.0001958, whisper_loss=0.1077, over 22270.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.0108, ecapa_loss=0.000162, whisper_loss=0.09175, over 3874008.63 frames. ], batch size: 91, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:09:49,602 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 16 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-13 17:09:52,985 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-13 17:09:55,112 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2227910.0, ans=0.0 2024-08-13 17:10:23,459 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 18 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-13 17:10:27,799 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.98 vs. limit=15.0 2024-08-13 17:10:29,123 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.04 vs. limit=6.0 2024-08-13 17:10:44,326 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2228210.0, ans=0.1 2024-08-13 17:10:47,131 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 21 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-13 17:10:53,098 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 5450, loss[loss=0.07724, beats_loss=0.01146, ecapa_loss=0.0001178, whisper_loss=0.0646, over 15492.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01082, ecapa_loss=0.0001629, whisper_loss=0.09179, over 3858092.80 frames. ], batch size: 58, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:10:56,584 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2228310.0, ans=0.0 2024-08-13 17:11:01,413 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 17:11:01,482 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2228310.0, ans=0.2 2024-08-13 17:11:07,975 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 20 from LS+wenet, 9 from Vox, 25 fro AS 2024-08-13 17:11:11,578 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.921e+01 2.403e+01 2.600e+01 2.908e+01 1.736e+02, threshold=5.201e+01, percent-clipped=2.0 2024-08-13 17:11:17,500 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.41 vs. limit=10.0 2024-08-13 17:11:22,531 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2228410.0, ans=0.0 2024-08-13 17:11:40,101 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 17 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-13 17:11:45,995 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 25 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-13 17:12:06,646 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2228710.0, ans=0.0 2024-08-13 17:12:07,796 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 21 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-13 17:12:12,302 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 5500, loss[loss=0.08265, beats_loss=0.01164, ecapa_loss=0.0001473, whisper_loss=0.06954, over 20469.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01086, ecapa_loss=0.0001622, whisper_loss=0.09179, over 3880874.70 frames. ], batch size: 81, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:12:24,582 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2228810.0, ans=0.125 2024-08-13 17:12:36,411 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 11 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-13 17:12:47,169 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 16 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-13 17:12:51,347 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2229010.0, ans=0.125 2024-08-13 17:12:51,358 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2229010.0, ans=0.1 2024-08-13 17:12:52,484 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2229010.0, ans=0.125 2024-08-13 17:12:54,342 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=2229010.0, ans=15.0 2024-08-13 17:12:57,154 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.869e-03 2024-08-13 17:13:09,714 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-13 17:13:19,581 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 17 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-13 17:13:24,705 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 19 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-13 17:13:29,728 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2229310.0, ans=0.0 2024-08-13 17:13:30,612 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 5550, loss[loss=0.1236, beats_loss=0.009692, ecapa_loss=0.0001803, whisper_loss=0.1121, over 22432.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01089, ecapa_loss=0.0001622, whisper_loss=0.09129, over 3892569.36 frames. ], batch size: 88, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:13:33,145 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 23 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-13 17:13:41,087 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2229310.0, ans=0.2 2024-08-13 17:13:46,783 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2229410.0, ans=0.125 2024-08-13 17:13:51,532 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.725e+01 2.399e+01 2.709e+01 2.923e+01 5.241e+01, threshold=5.419e+01, percent-clipped=1.0 2024-08-13 17:13:52,166 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2229410.0, ans=0.125 2024-08-13 17:14:11,421 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.51 vs. limit=6.0 2024-08-13 17:14:34,408 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2229610.0, ans=0.2 2024-08-13 17:14:39,948 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2229710.0, ans=0.125 2024-08-13 17:14:50,437 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2229710.0, ans=0.0 2024-08-13 17:14:54,202 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 5600, loss[loss=0.09339, beats_loss=0.0144, ecapa_loss=0.0001298, whisper_loss=0.07769, over 17382.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01086, ecapa_loss=0.0001609, whisper_loss=0.09175, over 3921278.51 frames. ], batch size: 68, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:15:06,605 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 21 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-13 17:15:08,138 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2229810.0, ans=0.1 2024-08-13 17:15:27,709 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.94 vs. limit=6.0 2024-08-13 17:15:34,149 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.22 vs. limit=15.0 2024-08-13 17:15:35,470 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2230010.0, ans=0.04949747468305833 2024-08-13 17:15:51,655 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=2230110.0, ans=0.5 2024-08-13 17:15:56,505 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2230210.0, ans=0.125 2024-08-13 17:15:57,980 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 26 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-13 17:16:02,678 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2230210.0, ans=0.0 2024-08-13 17:16:11,143 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 5650, loss[loss=0.105, beats_loss=0.0124, ecapa_loss=0.0001448, whisper_loss=0.09114, over 22999.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01087, ecapa_loss=0.0001627, whisper_loss=0.09121, over 3927261.16 frames. ], batch size: 93, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:16:11,242 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 26 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-13 17:16:17,948 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2230310.0, ans=0.0 2024-08-13 17:16:27,270 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 17:16:29,590 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.435e+01 2.742e+01 3.035e+01 1.015e+02, threshold=5.483e+01, percent-clipped=1.0 2024-08-13 17:16:29,790 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 24 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-13 17:16:31,452 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 14 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-13 17:16:35,736 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 29 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-13 17:16:39,296 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 17:16:45,924 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2230510.0, ans=0.125 2024-08-13 17:16:48,803 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2230510.0, ans=0.0 2024-08-13 17:16:52,073 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2230510.0, ans=0.125 2024-08-13 17:17:09,487 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2230610.0, ans=0.0 2024-08-13 17:17:10,946 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 19 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-13 17:17:16,915 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.32 vs. limit=15.0 2024-08-13 17:17:22,307 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.57 vs. limit=8.0 2024-08-13 17:17:29,041 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 5700, loss[loss=0.1, beats_loss=0.008989, ecapa_loss=0.0001594, whisper_loss=0.08943, over 14090.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0109, ecapa_loss=0.0001634, whisper_loss=0.09065, over 3924927.92 frames. ], batch size: 56, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:17:37,659 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2230810.0, ans=0.125 2024-08-13 17:17:42,061 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2230810.0, ans=0.1 2024-08-13 17:17:50,258 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 24 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-13 17:17:56,334 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 20 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-13 17:18:00,826 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 16 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-13 17:18:08,725 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 25 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-13 17:18:21,632 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.40 vs. limit=12.0 2024-08-13 17:18:48,327 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 5750, loss[loss=0.1005, beats_loss=0.008775, ecapa_loss=0.0001305, whisper_loss=0.09041, over 14848.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01085, ecapa_loss=0.000163, whisper_loss=0.09077, over 3921460.04 frames. ], batch size: 53, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:19:05,455 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2231410.0, ans=0.2 2024-08-13 17:19:06,778 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2231410.0, ans=0.1 2024-08-13 17:19:07,385 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.664e+01 2.349e+01 2.635e+01 2.966e+01 1.104e+02, threshold=5.269e+01, percent-clipped=1.0 2024-08-13 17:19:17,537 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2231410.0, ans=0.1 2024-08-13 17:19:22,215 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2231510.0, ans=0.125 2024-08-13 17:19:33,739 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2231510.0, ans=0.0 2024-08-13 17:20:05,065 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 5800, loss[loss=0.09968, beats_loss=0.01136, ecapa_loss=0.000156, whisper_loss=0.08676, over 22662.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01084, ecapa_loss=0.0001642, whisper_loss=0.09011, over 3888375.55 frames. ], batch size: 93, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:20:06,631 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 18 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-13 17:20:23,186 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2231910.0, ans=0.0 2024-08-13 17:20:44,688 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2232010.0, ans=0.0 2024-08-13 17:20:47,894 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.49 vs. limit=12.0 2024-08-13 17:20:54,059 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2232110.0, ans=0.125 2024-08-13 17:21:06,275 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.92 vs. limit=22.5 2024-08-13 17:21:11,128 INFO [train_multi_KD3.py:844] (1/4) A total of 96 cuts. 29 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-13 17:21:11,751 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2232210.0, ans=0.0 2024-08-13 17:21:18,664 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 20 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-13 17:21:19,211 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2232310.0, ans=0.07 2024-08-13 17:21:20,012 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 5850, loss[loss=0.09877, beats_loss=0.01097, ecapa_loss=0.0001422, whisper_loss=0.08638, over 17026.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01085, ecapa_loss=0.0001646, whisper_loss=0.0902, over 3883264.91 frames. ], batch size: 67, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:21:31,081 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.973e+01 2024-08-13 17:21:31,085 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2232310.0, ans=0.125 2024-08-13 17:21:35,354 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2232410.0, ans=10.0 2024-08-13 17:21:37,622 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.300e+01 2.602e+01 2.847e+01 5.570e+01, threshold=5.204e+01, percent-clipped=1.0 2024-08-13 17:21:43,839 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2232410.0, ans=0.125 2024-08-13 17:22:00,694 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 26 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-13 17:22:21,327 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-13 17:22:33,019 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 5900, loss[loss=0.0905, beats_loss=0.01105, ecapa_loss=0.0001556, whisper_loss=0.07789, over 19841.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0109, ecapa_loss=0.0001644, whisper_loss=0.08995, over 3901604.12 frames. ], batch size: 78, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:22:38,644 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2232810.0, ans=0.0 2024-08-13 17:22:52,664 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2232910.0, ans=0.07 2024-08-13 17:23:13,081 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-13 17:23:21,762 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2233110.0, ans=0.125 2024-08-13 17:23:26,808 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2233110.0, ans=0.125 2024-08-13 17:23:28,423 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.64 vs. limit=12.0 2024-08-13 17:23:33,297 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2233210.0, ans=0.125 2024-08-13 17:23:37,052 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.26 vs. limit=15.0 2024-08-13 17:23:39,464 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2233210.0, ans=0.125 2024-08-13 17:23:39,486 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2233210.0, ans=0.0 2024-08-13 17:23:44,682 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 5950, loss[loss=0.08373, beats_loss=0.009123, ecapa_loss=0.0001662, whisper_loss=0.07295, over 13795.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01098, ecapa_loss=0.0001627, whisper_loss=0.08991, over 3905795.64 frames. ], batch size: 53, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:24:01,017 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.824e+01 2.407e+01 2.699e+01 3.092e+01 2.272e+02, threshold=5.398e+01, percent-clipped=4.0 2024-08-13 17:24:14,301 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2233510.0, ans=0.125 2024-08-13 17:24:14,375 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2233510.0, ans=0.04949747468305833 2024-08-13 17:24:14,656 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.16 vs. limit=15.0 2024-08-13 17:24:35,911 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2233610.0, ans=0.125 2024-08-13 17:24:46,173 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2233710.0, ans=0.0 2024-08-13 17:24:52,271 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2233710.0, ans=0.125 2024-08-13 17:24:57,916 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 6000, loss[loss=0.115, beats_loss=0.01041, ecapa_loss=0.0001802, whisper_loss=0.1028, over 18301.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01097, ecapa_loss=0.000163, whisper_loss=0.09021, over 3922785.34 frames. ], batch size: 74, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:24:57,916 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-13 17:25:32,958 INFO [train_multi_KD3.py:1149] (1/4) Epoch 16, validation on ASR_libri: loss=0.2531, beats_loss=0, ecapa_loss=0.0005624, whisper_loss=0.2475, over 922467.00 frames. 2024-08-13 17:25:51,870 INFO [train_multi_KD3.py:1149] (1/4) Epoch 16, validation on SV_voxceleb1: loss=0.004549, beats_loss=0, ecapa_loss=0.0004549, whisper_loss=0, over 939242.00 frames. 2024-08-13 17:27:33,639 INFO [train_multi_KD3.py:1149] (1/4) Epoch 16, validation on AT_audioset: loss=0.02369, beats_loss=0.02369, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 17:27:33,642 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 30639MB 2024-08-13 17:27:34,062 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2233810.0, ans=0.125 2024-08-13 17:28:02,028 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2234010.0, ans=0.0 2024-08-13 17:28:30,586 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 30 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-13 17:28:31,801 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 16 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-13 17:28:43,729 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=2234210.0, ans=22.5 2024-08-13 17:28:47,972 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 6050, loss[loss=0.09966, beats_loss=0.01222, ecapa_loss=0.0001513, whisper_loss=0.08592, over 14995.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01086, ecapa_loss=0.0001626, whisper_loss=0.09093, over 3907797.17 frames. ], batch size: 60, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:28:56,868 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 27 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-13 17:29:01,570 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 17 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-13 17:29:03,626 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.05 vs. limit=22.5 2024-08-13 17:29:06,647 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.686e+01 2.365e+01 2.582e+01 2.840e+01 3.927e+01, threshold=5.165e+01, percent-clipped=0.0 2024-08-13 17:29:08,688 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 15 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-13 17:29:16,179 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 30 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-13 17:29:21,767 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2234510.0, ans=0.1 2024-08-13 17:29:22,657 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 22 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-13 17:30:03,701 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 6100, loss[loss=0.09184, beats_loss=0.01251, ecapa_loss=0.000173, whisper_loss=0.0776, over 19391.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01091, ecapa_loss=0.000163, whisper_loss=0.09037, over 3904801.58 frames. ], batch size: 81, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:30:20,026 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 25 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-13 17:30:22,053 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2234910.0, ans=0.04949747468305833 2024-08-13 17:30:23,758 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2234910.0, ans=0.125 2024-08-13 17:30:41,673 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 28 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-13 17:30:47,709 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 20 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-13 17:31:05,042 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2235210.0, ans=0.0 2024-08-13 17:31:15,446 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 6150, loss[loss=0.08674, beats_loss=0.01278, ecapa_loss=0.0001628, whisper_loss=0.07233, over 20101.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01091, ecapa_loss=0.0001619, whisper_loss=0.09079, over 3900078.60 frames. ], batch size: 87, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:31:20,381 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 24 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-13 17:31:25,017 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2235310.0, ans=0.125 2024-08-13 17:31:29,953 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2235410.0, ans=0.0 2024-08-13 17:31:30,060 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2235410.0, ans=0.0 2024-08-13 17:31:32,471 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2235410.0, ans=0.0 2024-08-13 17:31:33,220 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.323e+01 2.638e+01 2.979e+01 5.632e+01, threshold=5.276e+01, percent-clipped=1.0 2024-08-13 17:31:35,523 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2235410.0, ans=0.125 2024-08-13 17:31:46,083 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.74 vs. limit=22.5 2024-08-13 17:31:46,945 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2235510.0, ans=0.125 2024-08-13 17:32:02,544 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 17:32:26,998 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 24 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-13 17:32:29,450 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 6200, loss[loss=0.09594, beats_loss=0.01113, ecapa_loss=0.0001829, whisper_loss=0.08298, over 18831.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01093, ecapa_loss=0.0001601, whisper_loss=0.09091, over 3899205.80 frames. ], batch size: 79, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:32:51,957 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2235910.0, ans=0.125 2024-08-13 17:32:57,066 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2235910.0, ans=0.1 2024-08-13 17:33:00,147 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 16 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-13 17:33:01,480 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2236010.0, ans=0.125 2024-08-13 17:33:01,482 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2236010.0, ans=0.0 2024-08-13 17:33:09,160 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2236010.0, ans=0.0 2024-08-13 17:33:27,853 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 23 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-13 17:33:33,057 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=2236210.0, ans=0.02 2024-08-13 17:33:36,203 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2236210.0, ans=0.125 2024-08-13 17:33:48,086 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 6250, loss[loss=0.1031, beats_loss=0.01005, ecapa_loss=0.0001236, whisper_loss=0.09177, over 15348.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01085, ecapa_loss=0.0001606, whisper_loss=0.09066, over 3860080.11 frames. ], batch size: 57, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:34:05,989 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.800e+01 2.429e+01 2.660e+01 2.868e+01 5.842e+01, threshold=5.321e+01, percent-clipped=1.0 2024-08-13 17:34:10,022 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2236410.0, ans=0.0 2024-08-13 17:34:13,397 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2236410.0, ans=0.125 2024-08-13 17:34:28,382 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 25 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-13 17:34:29,147 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.34 vs. limit=22.5 2024-08-13 17:34:30,216 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2236510.0, ans=0.125 2024-08-13 17:34:54,106 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 20 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-13 17:34:59,392 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2236710.0, ans=0.0 2024-08-13 17:35:04,463 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 6300, loss[loss=0.09873, beats_loss=0.01331, ecapa_loss=0.0001427, whisper_loss=0.08399, over 22335.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01088, ecapa_loss=0.0001598, whisper_loss=0.0911, over 3875535.82 frames. ], batch size: 90, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:35:10,287 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2236810.0, ans=0.0 2024-08-13 17:35:19,545 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 22 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-13 17:35:26,731 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2236910.0, ans=0.0 2024-08-13 17:35:55,795 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.85 vs. limit=15.0 2024-08-13 17:35:56,732 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 20 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-13 17:36:05,899 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 27 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-13 17:36:13,674 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 24 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-13 17:36:22,907 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 6350, loss[loss=0.1024, beats_loss=0.01247, ecapa_loss=0.0001455, whisper_loss=0.08849, over 20747.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01084, ecapa_loss=0.0001617, whisper_loss=0.09128, over 3881120.04 frames. ], batch size: 81, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:36:39,525 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 17 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-13 17:36:42,413 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.942e+01 2.465e+01 2.710e+01 3.055e+01 1.101e+02, threshold=5.419e+01, percent-clipped=2.0 2024-08-13 17:36:43,135 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 29 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-13 17:36:46,175 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.25 vs. limit=15.0 2024-08-13 17:36:48,450 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-13 17:36:49,811 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 21 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-13 17:36:53,747 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 28 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-13 17:37:21,602 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.10 vs. limit=15.0 2024-08-13 17:37:29,695 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 17:37:38,296 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.25 vs. limit=22.5 2024-08-13 17:37:45,745 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 6400, loss[loss=0.1218, beats_loss=0.01009, ecapa_loss=0.0001618, whisper_loss=0.1101, over 22878.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01084, ecapa_loss=0.000161, whisper_loss=0.09176, over 3934395.84 frames. ], batch size: 91, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:37:49,602 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=7.294e-02 2024-08-13 17:38:10,337 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2237910.0, ans=0.125 2024-08-13 17:38:32,944 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2238110.0, ans=0.0 2024-08-13 17:38:42,426 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2238110.0, ans=0.0 2024-08-13 17:39:03,205 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 6450, loss[loss=0.1258, beats_loss=0.009963, ecapa_loss=0.000178, whisper_loss=0.1141, over 23696.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.0108, ecapa_loss=0.0001625, whisper_loss=0.09203, over 3918689.40 frames. ], batch size: 93, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:39:05,101 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2238310.0, ans=0.05 2024-08-13 17:39:15,533 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2238310.0, ans=0.1 2024-08-13 17:39:22,419 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.11 vs. limit=22.5 2024-08-13 17:39:22,729 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.769e+01 2.439e+01 2.707e+01 3.110e+01 4.905e+01, threshold=5.413e+01, percent-clipped=0.0 2024-08-13 17:39:34,005 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 25 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-13 17:39:48,784 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2238510.0, ans=0.1 2024-08-13 17:39:51,415 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2238610.0, ans=0.0 2024-08-13 17:40:01,004 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 19 from LS+wenet, 26 from Vox, 21 fro AS 2024-08-13 17:40:03,912 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2238610.0, ans=0.5 2024-08-13 17:40:08,146 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 16 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-13 17:40:19,394 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2238710.0, ans=0.0 2024-08-13 17:40:21,032 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 19 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-13 17:40:22,073 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 6500, loss[loss=0.1041, beats_loss=0.009414, ecapa_loss=0.0001699, whisper_loss=0.09296, over 14480.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01073, ecapa_loss=0.0001619, whisper_loss=0.09244, over 3920271.64 frames. ], batch size: 57, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:40:33,947 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=2238810.0, ans=6.0 2024-08-13 17:40:35,228 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 25 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-13 17:40:37,106 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2238910.0, ans=0.0 2024-08-13 17:40:41,723 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2238910.0, ans=0.125 2024-08-13 17:41:05,731 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2239010.0, ans=0.1 2024-08-13 17:41:22,341 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.35 vs. limit=15.0 2024-08-13 17:41:27,747 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2239210.0, ans=0.125 2024-08-13 17:41:29,494 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2239210.0, ans=0.09899494936611666 2024-08-13 17:41:29,801 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.41 vs. limit=15.0 2024-08-13 17:41:39,274 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 6550, loss[loss=0.09717, beats_loss=0.00872, ecapa_loss=0.0001971, whisper_loss=0.08648, over 20418.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01077, ecapa_loss=0.0001614, whisper_loss=0.092, over 3908484.30 frames. ], batch size: 88, lr: 3.92e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:41:50,524 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2239310.0, ans=0.1 2024-08-13 17:41:57,809 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.916e+01 2.430e+01 2.695e+01 2.938e+01 3.674e+01, threshold=5.390e+01, percent-clipped=0.0 2024-08-13 17:42:16,759 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2239510.0, ans=0.125 2024-08-13 17:42:22,068 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 23 from LS+wenet, 22 from Vox, 49 fro AS 2024-08-13 17:42:37,406 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2239610.0, ans=0.0 2024-08-13 17:42:45,888 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2239710.0, ans=0.125 2024-08-13 17:42:54,062 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.21 vs. limit=15.0 2024-08-13 17:42:58,578 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 6600, loss[loss=0.09493, beats_loss=0.01162, ecapa_loss=0.000153, whisper_loss=0.08178, over 15877.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01064, ecapa_loss=0.0001641, whisper_loss=0.09206, over 3937704.14 frames. ], batch size: 64, lr: 3.92e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:43:07,724 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2239810.0, ans=0.125 2024-08-13 17:43:18,531 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 17 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-13 17:43:44,507 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2240010.0, ans=0.0 2024-08-13 17:43:48,486 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2240010.0, ans=0.0 2024-08-13 17:43:52,257 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2240110.0, ans=0.125 2024-08-13 17:44:03,193 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.55 vs. limit=6.0 2024-08-13 17:44:07,002 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 30 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-13 17:44:16,500 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2240210.0, ans=0.125 2024-08-13 17:44:17,786 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 29 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-13 17:44:26,132 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 6650, loss[loss=0.09335, beats_loss=0.01332, ecapa_loss=0.0001595, whisper_loss=0.07843, over 15416.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01068, ecapa_loss=0.0001637, whisper_loss=0.09196, over 3937658.85 frames. ], batch size: 62, lr: 3.92e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:44:46,504 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.019e+01 2.433e+01 2.609e+01 2.879e+01 3.999e+01, threshold=5.218e+01, percent-clipped=0.0 2024-08-13 17:45:18,377 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=2240610.0, ans=0.2 2024-08-13 17:45:25,439 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2240610.0, ans=0.0 2024-08-13 17:45:26,491 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 25 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-13 17:45:30,502 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 22 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-13 17:45:50,201 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 6700, loss[loss=0.09394, beats_loss=0.01094, ecapa_loss=0.0002117, whisper_loss=0.08088, over 13741.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01071, ecapa_loss=0.0001643, whisper_loss=0.09182, over 3961910.73 frames. ], batch size: 56, lr: 3.92e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:46:05,540 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.12 vs. limit=15.0 2024-08-13 17:46:29,133 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.43 vs. limit=15.0 2024-08-13 17:46:38,042 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 25 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-13 17:46:50,512 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 30 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-13 17:46:53,969 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 21 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-13 17:47:07,114 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2241210.0, ans=0.0 2024-08-13 17:47:08,698 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 18 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-13 17:47:13,577 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2241210.0, ans=0.0 2024-08-13 17:47:15,164 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2241310.0, ans=10.0 2024-08-13 17:47:15,823 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 6750, loss[loss=0.1077, beats_loss=0.009238, ecapa_loss=0.0001462, whisper_loss=0.09696, over 23716.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01063, ecapa_loss=0.0001642, whisper_loss=0.09195, over 3957067.63 frames. ], batch size: 92, lr: 3.92e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:47:28,366 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2241310.0, ans=0.1 2024-08-13 17:47:37,465 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.480e+01 2.825e+01 3.141e+01 1.321e+02, threshold=5.651e+01, percent-clipped=2.0 2024-08-13 17:48:02,074 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2241510.0, ans=0.125 2024-08-13 17:48:09,784 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-13 17:48:10,126 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2241610.0, ans=0.125 2024-08-13 17:48:25,092 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.62 vs. limit=10.0 2024-08-13 17:48:38,722 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 6800, loss[loss=0.06946, beats_loss=0.009492, ecapa_loss=0.0001561, whisper_loss=0.05841, over 17613.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01072, ecapa_loss=0.0001627, whisper_loss=0.0914, over 3949677.65 frames. ], batch size: 69, lr: 3.92e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:48:43,126 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 35 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-13 17:48:45,279 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 18 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-13 17:48:45,541 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2241810.0, ans=0.1 2024-08-13 17:48:55,848 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 22 from LS+wenet, 20 from Vox, 14 fro AS 2024-08-13 17:48:59,262 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2241910.0, ans=0.09899494936611666 2024-08-13 17:49:00,773 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2241910.0, ans=0.125 2024-08-13 17:49:10,808 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2242010.0, ans=0.2 2024-08-13 17:49:13,742 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 12 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-13 17:49:34,449 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2242110.0, ans=0.07 2024-08-13 17:49:37,497 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2242110.0, ans=0.125 2024-08-13 17:49:40,914 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.59 vs. limit=15.0 2024-08-13 17:50:00,773 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 6850, loss[loss=0.1071, beats_loss=0.01192, ecapa_loss=0.0001687, whisper_loss=0.09346, over 22721.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01068, ecapa_loss=0.000164, whisper_loss=0.09151, over 3933821.26 frames. ], batch size: 93, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 17:50:01,286 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2242310.0, ans=0.2 2024-08-13 17:50:01,427 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.46 vs. limit=15.0 2024-08-13 17:50:13,806 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 17 from LS+wenet, 9 from Vox, 30 fro AS 2024-08-13 17:50:18,642 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2242410.0, ans=0.0 2024-08-13 17:50:20,690 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.786e+01 2.361e+01 2.636e+01 2.867e+01 1.284e+02, threshold=5.272e+01, percent-clipped=1.0 2024-08-13 17:50:21,153 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2242410.0, ans=0.1 2024-08-13 17:50:35,887 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 24 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-13 17:50:51,690 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.07 vs. limit=22.5 2024-08-13 17:51:06,533 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 24 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-13 17:51:11,825 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 20 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-13 17:51:20,989 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 6900, loss[loss=0.1045, beats_loss=0.01005, ecapa_loss=0.0001519, whisper_loss=0.09291, over 22890.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01077, ecapa_loss=0.0001641, whisper_loss=0.09174, over 3914552.73 frames. ], batch size: 90, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 17:51:22,599 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 16 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-13 17:51:37,623 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2242910.0, ans=0.125 2024-08-13 17:51:42,638 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.566e+00 2024-08-13 17:51:46,924 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2242910.0, ans=10.0 2024-08-13 17:51:48,555 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.61 vs. limit=15.0 2024-08-13 17:52:00,566 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=28.58 vs. limit=22.5 2024-08-13 17:52:11,118 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 14 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-13 17:52:32,832 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.05 vs. limit=15.0 2024-08-13 17:52:42,053 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 6950, loss[loss=0.1095, beats_loss=0.01031, ecapa_loss=0.000157, whisper_loss=0.09762, over 21945.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01072, ecapa_loss=0.0001634, whisper_loss=0.09176, over 3875672.92 frames. ], batch size: 88, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 17:52:55,592 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2243310.0, ans=0.025 2024-08-13 17:53:01,874 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2243410.0, ans=0.0 2024-08-13 17:53:02,338 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.54 vs. limit=12.0 2024-08-13 17:53:02,731 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.690e+01 2.338e+01 2.546e+01 2.937e+01 5.530e+01, threshold=5.093e+01, percent-clipped=1.0 2024-08-13 17:53:08,306 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.49 vs. limit=10.0 2024-08-13 17:53:18,090 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.67 vs. limit=15.0 2024-08-13 17:53:30,357 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 17:53:42,585 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2243610.0, ans=0.1 2024-08-13 17:53:53,846 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=5.217e+05 2024-08-13 17:54:02,680 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2243810.0, ans=0.1 2024-08-13 17:54:03,725 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 7000, loss[loss=0.09204, beats_loss=0.01142, ecapa_loss=0.000154, whisper_loss=0.07908, over 20433.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01082, ecapa_loss=0.0001623, whisper_loss=0.09117, over 3864969.38 frames. ], batch size: 81, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 17:54:09,171 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2243810.0, ans=0.125 2024-08-13 17:54:25,864 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 28 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-13 17:54:35,108 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2244010.0, ans=0.1 2024-08-13 17:54:40,283 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 34 from Vox, 29 fro AS 2024-08-13 17:54:57,135 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2244110.0, ans=0.125 2024-08-13 17:55:01,006 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2244110.0, ans=0.035 2024-08-13 17:55:26,839 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 7050, loss[loss=0.1068, beats_loss=0.01106, ecapa_loss=0.0001903, whisper_loss=0.09384, over 21063.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01074, ecapa_loss=0.0001628, whisper_loss=0.09184, over 3869560.28 frames. ], batch size: 87, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 17:55:29,584 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2244310.0, ans=0.125 2024-08-13 17:55:48,075 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.036e+01 2.461e+01 2.675e+01 2.991e+01 1.291e+02, threshold=5.351e+01, percent-clipped=1.0 2024-08-13 17:55:48,509 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2244410.0, ans=0.125 2024-08-13 17:56:04,026 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 26 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-13 17:56:42,900 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2244710.0, ans=0.1 2024-08-13 17:56:47,657 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 7100, loss[loss=0.1005, beats_loss=0.009603, ecapa_loss=0.0001942, whisper_loss=0.08899, over 14101.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01081, ecapa_loss=0.0001631, whisper_loss=0.09126, over 3842640.23 frames. ], batch size: 56, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 17:56:52,157 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-13 17:56:59,659 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2244810.0, ans=0.125 2024-08-13 17:57:03,095 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.22 vs. limit=15.0 2024-08-13 17:57:04,612 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2244910.0, ans=0.0 2024-08-13 17:57:11,067 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.46 vs. limit=15.0 2024-08-13 17:57:35,218 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2245110.0, ans=0.125 2024-08-13 17:57:44,618 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2245110.0, ans=0.125 2024-08-13 17:57:48,022 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2245110.0, ans=0.125 2024-08-13 17:58:04,615 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2245210.0, ans=0.0 2024-08-13 17:58:08,745 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 7150, loss[loss=0.1145, beats_loss=0.01196, ecapa_loss=0.0001546, whisper_loss=0.101, over 23640.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01084, ecapa_loss=0.0001625, whisper_loss=0.0919, over 3898897.53 frames. ], batch size: 92, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 17:58:10,700 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2245310.0, ans=0.1 2024-08-13 17:58:31,362 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.896e+01 2.399e+01 2.676e+01 3.068e+01 5.307e+01, threshold=5.353e+01, percent-clipped=0.0 2024-08-13 17:58:40,320 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=2245410.0, ans=0.02 2024-08-13 17:58:42,226 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2245510.0, ans=0.125 2024-08-13 17:58:42,308 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2245510.0, ans=0.125 2024-08-13 17:59:12,051 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.03 vs. limit=10.0 2024-08-13 17:59:29,290 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2245710.0, ans=0.1 2024-08-13 17:59:30,597 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2245710.0, ans=0.125 2024-08-13 17:59:33,171 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 7200, loss[loss=0.1153, beats_loss=0.007504, ecapa_loss=0.0001649, whisper_loss=0.1062, over 18550.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01076, ecapa_loss=0.0001633, whisper_loss=0.09179, over 3898838.29 frames. ], batch size: 72, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 17:59:35,200 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 19 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-13 17:59:37,471 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2245810.0, ans=0.125 2024-08-13 17:59:44,285 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.40 vs. limit=15.0 2024-08-13 17:59:50,154 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2245910.0, ans=0.0 2024-08-13 17:59:54,691 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2245910.0, ans=0.1 2024-08-13 17:59:56,300 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 20 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-13 18:00:03,402 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 13 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-13 18:00:05,289 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2246010.0, ans=0.0 2024-08-13 18:00:11,913 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2246010.0, ans=0.0 2024-08-13 18:00:18,285 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 32 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-13 18:00:22,372 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.21 vs. limit=22.5 2024-08-13 18:00:25,096 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2246110.0, ans=0.125 2024-08-13 18:00:31,757 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2246110.0, ans=0.125 2024-08-13 18:00:49,764 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 22 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-13 18:00:53,816 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 7250, loss[loss=0.09733, beats_loss=0.01045, ecapa_loss=0.0001591, whisper_loss=0.08528, over 19582.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01079, ecapa_loss=0.0001618, whisper_loss=0.09218, over 3918686.14 frames. ], batch size: 81, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:00:54,337 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=2246310.0, ans=10.0 2024-08-13 18:00:59,361 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.25 vs. limit=15.0 2024-08-13 18:01:15,018 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.504e+01 2.815e+01 3.088e+01 1.145e+02, threshold=5.629e+01, percent-clipped=1.0 2024-08-13 18:01:18,784 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.20 vs. limit=22.5 2024-08-13 18:02:12,948 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 26 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-13 18:02:15,661 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 7300, loss[loss=0.1029, beats_loss=0.009576, ecapa_loss=0.0001817, whisper_loss=0.09151, over 17687.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01082, ecapa_loss=0.0001617, whisper_loss=0.09231, over 3935108.07 frames. ], batch size: 73, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:02:19,523 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 29 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-13 18:02:33,742 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2246910.0, ans=0.125 2024-08-13 18:02:40,902 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.88 vs. limit=15.0 2024-08-13 18:02:48,336 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 26 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-13 18:02:54,949 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 18:02:57,144 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.75 vs. limit=22.5 2024-08-13 18:02:57,902 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 28 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-13 18:02:59,381 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 22 from LS+wenet, 26 from Vox, 45 fro AS 2024-08-13 18:03:12,463 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2247110.0, ans=0.125 2024-08-13 18:03:17,392 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2247110.0, ans=0.1 2024-08-13 18:03:36,849 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 7350, loss[loss=0.1041, beats_loss=0.01104, ecapa_loss=0.0001829, whisper_loss=0.09122, over 21546.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01083, ecapa_loss=0.0001625, whisper_loss=0.09132, over 3916597.03 frames. ], batch size: 91, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:03:47,112 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2247310.0, ans=0.125 2024-08-13 18:03:58,282 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.974e+01 2.425e+01 2.697e+01 3.109e+01 4.252e+01, threshold=5.395e+01, percent-clipped=0.0 2024-08-13 18:04:14,523 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2247510.0, ans=0.125 2024-08-13 18:04:22,426 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2247510.0, ans=0.0 2024-08-13 18:04:24,861 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 20 from LS+wenet, 21 from Vox, 49 fro AS 2024-08-13 18:04:31,595 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 32 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-13 18:04:59,185 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 7400, loss[loss=0.108, beats_loss=0.01007, ecapa_loss=0.0001458, whisper_loss=0.09643, over 23484.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.0108, ecapa_loss=0.0001622, whisper_loss=0.09142, over 3907602.69 frames. ], batch size: 91, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:05:08,419 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 30 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-13 18:05:13,634 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 30 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-13 18:05:24,030 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-13 18:05:26,071 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.99 vs. limit=22.5 2024-08-13 18:05:29,689 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.70 vs. limit=22.5 2024-08-13 18:05:38,291 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 19 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-13 18:06:17,262 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 7450, loss[loss=0.09779, beats_loss=0.007879, ecapa_loss=0.0002053, whisper_loss=0.08786, over 15941.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01074, ecapa_loss=0.0001628, whisper_loss=0.09234, over 3937223.01 frames. ], batch size: 66, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:06:26,279 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.56 vs. limit=15.0 2024-08-13 18:06:32,112 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 18 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-13 18:06:37,369 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.941e+01 2.475e+01 2.713e+01 3.023e+01 4.384e+01, threshold=5.427e+01, percent-clipped=0.0 2024-08-13 18:06:38,383 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-13 18:06:51,245 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2248510.0, ans=0.125 2024-08-13 18:07:02,627 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2248510.0, ans=0.025 2024-08-13 18:07:04,566 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.14 vs. limit=15.0 2024-08-13 18:07:05,409 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 22 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-13 18:07:30,048 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2248710.0, ans=0.0 2024-08-13 18:07:37,738 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 7500, loss[loss=0.1377, beats_loss=0.008869, ecapa_loss=0.0001584, whisper_loss=0.1273, over 18390.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01077, ecapa_loss=0.0001631, whisper_loss=0.09156, over 3921346.24 frames. ], batch size: 67, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:07:38,056 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 17 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-13 18:07:42,812 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2248810.0, ans=0.125 2024-08-13 18:07:48,806 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 21 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-13 18:08:07,168 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 24 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-13 18:08:12,378 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.22 vs. limit=15.0 2024-08-13 18:08:25,042 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2249010.0, ans=0.1 2024-08-13 18:08:34,692 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2249110.0, ans=0.125 2024-08-13 18:08:40,246 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 23 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-13 18:08:45,333 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.22 vs. limit=22.5 2024-08-13 18:08:57,310 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2249310.0, ans=0.125 2024-08-13 18:08:58,553 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 7550, loss[loss=0.09696, beats_loss=0.01073, ecapa_loss=0.0001609, whisper_loss=0.08462, over 18634.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01076, ecapa_loss=0.0001629, whisper_loss=0.09125, over 3890489.92 frames. ], batch size: 74, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:09:08,004 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 27 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-13 18:09:19,255 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.754e+01 2.370e+01 2.691e+01 3.011e+01 5.049e+01, threshold=5.381e+01, percent-clipped=0.0 2024-08-13 18:09:21,327 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.301e-01 2024-08-13 18:09:33,124 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-13 18:10:14,094 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2249710.0, ans=0.05 2024-08-13 18:10:17,708 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 7600, loss[loss=0.09806, beats_loss=0.01334, ecapa_loss=0.0001358, whisper_loss=0.08337, over 18404.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01078, ecapa_loss=0.0001614, whisper_loss=0.09138, over 3884158.14 frames. ], batch size: 74, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:10:21,997 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2249810.0, ans=0.0 2024-08-13 18:10:24,600 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-13 18:10:27,843 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2249810.0, ans=0.0 2024-08-13 18:10:36,964 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2249910.0, ans=0.1 2024-08-13 18:10:41,187 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2249910.0, ans=0.125 2024-08-13 18:10:55,140 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 18 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-13 18:11:08,553 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-13 18:11:21,236 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 25 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-13 18:11:22,955 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 22 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-13 18:11:25,146 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=3.97 vs. limit=12.0 2024-08-13 18:11:38,312 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 7650, loss[loss=0.1055, beats_loss=0.01159, ecapa_loss=0.000143, whisper_loss=0.09247, over 23460.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01083, ecapa_loss=0.0001614, whisper_loss=0.09148, over 3901186.42 frames. ], batch size: 93, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:11:57,715 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.818e+01 2.418e+01 2.664e+01 3.048e+01 4.401e+01, threshold=5.328e+01, percent-clipped=0.0 2024-08-13 18:12:48,553 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2250710.0, ans=0.125 2024-08-13 18:12:52,366 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 7700, loss[loss=0.0901, beats_loss=0.01162, ecapa_loss=0.0001207, whisper_loss=0.07728, over 14552.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01078, ecapa_loss=0.0001623, whisper_loss=0.09131, over 3902506.63 frames. ], batch size: 55, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:12:59,915 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2250810.0, ans=0.125 2024-08-13 18:13:28,561 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 19 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-13 18:14:05,652 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 7750, loss[loss=0.1182, beats_loss=0.009916, ecapa_loss=0.0001485, whisper_loss=0.1068, over 23168.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01071, ecapa_loss=0.0001626, whisper_loss=0.09137, over 3903592.07 frames. ], batch size: 88, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:14:06,054 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 22 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-13 18:14:06,290 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2251310.0, ans=0.125 2024-08-13 18:14:07,189 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 27 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-13 18:14:16,897 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.61 vs. limit=15.0 2024-08-13 18:14:23,375 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2251410.0, ans=0.0 2024-08-13 18:14:24,176 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.992e+01 2.510e+01 2.727e+01 3.092e+01 1.354e+02, threshold=5.455e+01, percent-clipped=2.0 2024-08-13 18:14:43,692 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 14 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-13 18:14:55,533 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2251610.0, ans=0.0 2024-08-13 18:15:06,968 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2251710.0, ans=0.0 2024-08-13 18:15:11,806 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2251710.0, ans=0.2 2024-08-13 18:15:17,783 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 7800, loss[loss=0.1119, beats_loss=0.00951, ecapa_loss=0.0002385, whisper_loss=0.1, over 20708.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01079, ecapa_loss=0.0001613, whisper_loss=0.09083, over 3902009.71 frames. ], batch size: 90, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:15:31,600 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2251910.0, ans=10.0 2024-08-13 18:15:47,396 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2252010.0, ans=0.125 2024-08-13 18:16:26,253 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2252210.0, ans=0.125 2024-08-13 18:16:30,388 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 7850, loss[loss=0.09054, beats_loss=0.01088, ecapa_loss=0.0001766, whisper_loss=0.0779, over 20184.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0108, ecapa_loss=0.0001607, whisper_loss=0.09071, over 3873355.46 frames. ], batch size: 82, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:16:46,205 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 16 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-13 18:16:47,514 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 15 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-13 18:16:48,520 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.348e+01 2.635e+01 3.053e+01 4.732e+01, threshold=5.269e+01, percent-clipped=0.0 2024-08-13 18:16:53,400 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2252410.0, ans=0.125 2024-08-13 18:16:55,287 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2252410.0, ans=0.0 2024-08-13 18:16:56,778 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2252410.0, ans=0.125 2024-08-13 18:17:21,338 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2252610.0, ans=0.0 2024-08-13 18:17:44,072 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 7900, loss[loss=0.09411, beats_loss=0.01098, ecapa_loss=0.0001778, whisper_loss=0.08135, over 19135.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01084, ecapa_loss=0.0001619, whisper_loss=0.09096, over 3851619.77 frames. ], batch size: 77, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:17:53,794 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 16 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-13 18:17:58,608 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 18:18:01,196 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 20 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-13 18:18:03,775 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 21 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-13 18:18:18,568 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2253010.0, ans=0.125 2024-08-13 18:18:21,386 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 14 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-13 18:18:23,563 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2253010.0, ans=0.125 2024-08-13 18:18:33,382 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.90 vs. limit=10.0 2024-08-13 18:18:42,504 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 28 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-13 18:18:57,420 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 7950, loss[loss=0.1071, beats_loss=0.01168, ecapa_loss=0.0001391, whisper_loss=0.09403, over 20652.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0109, ecapa_loss=0.0001623, whisper_loss=0.0902, over 3864739.78 frames. ], batch size: 81, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:18:57,833 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=2253310.0, ans=0.1 2024-08-13 18:19:12,338 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2253410.0, ans=0.125 2024-08-13 18:19:12,348 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2253410.0, ans=0.2 2024-08-13 18:19:15,908 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.363e+01 2.645e+01 3.044e+01 5.205e+01, threshold=5.290e+01, percent-clipped=0.0 2024-08-13 18:19:16,039 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-13 18:19:19,813 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 22 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-13 18:19:27,518 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2253510.0, ans=0.1 2024-08-13 18:19:36,067 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 21 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-13 18:19:37,373 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-13 18:19:44,667 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2253610.0, ans=0.125 2024-08-13 18:19:46,035 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2253610.0, ans=0.125 2024-08-13 18:20:13,348 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 8000, loss[loss=0.1053, beats_loss=0.009164, ecapa_loss=0.0001817, whisper_loss=0.09432, over 18470.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0109, ecapa_loss=0.0001628, whisper_loss=0.09027, over 3873546.55 frames. ], batch size: 75, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:21:02,856 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.32 vs. limit=22.5 2024-08-13 18:21:05,350 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2254110.0, ans=0.125 2024-08-13 18:21:09,787 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2254110.0, ans=0.1 2024-08-13 18:21:13,510 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 21 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-13 18:21:26,850 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 8050, loss[loss=0.1039, beats_loss=0.01075, ecapa_loss=0.0001617, whisper_loss=0.09152, over 21791.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01091, ecapa_loss=0.0001626, whisper_loss=0.08993, over 3884994.63 frames. ], batch size: 86, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:21:27,042 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 24 from LS+wenet, 19 from Vox, 16 fro AS 2024-08-13 18:21:38,971 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2254310.0, ans=0.1 2024-08-13 18:21:39,895 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 16 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-13 18:21:45,525 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2254410.0, ans=0.125 2024-08-13 18:21:46,282 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.332e+01 2.558e+01 3.003e+01 5.582e+01, threshold=5.115e+01, percent-clipped=1.0 2024-08-13 18:21:50,644 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2254410.0, ans=0.0 2024-08-13 18:21:52,143 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2254410.0, ans=0.0 2024-08-13 18:22:03,413 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 24 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-13 18:22:09,302 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2254610.0, ans=0.125 2024-08-13 18:22:10,912 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=2254610.0, ans=15.0 2024-08-13 18:22:13,256 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2254610.0, ans=0.0 2024-08-13 18:22:15,934 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2254610.0, ans=0.0 2024-08-13 18:22:23,378 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2254710.0, ans=0.125 2024-08-13 18:22:28,797 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2254710.0, ans=0.0 2024-08-13 18:22:29,953 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2254710.0, ans=0.125 2024-08-13 18:22:38,351 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 8100, loss[loss=0.1052, beats_loss=0.01175, ecapa_loss=0.000158, whisper_loss=0.09187, over 22962.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01084, ecapa_loss=0.0001629, whisper_loss=0.0905, over 3889053.58 frames. ], batch size: 93, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:22:49,704 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 36 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-13 18:23:12,368 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2255010.0, ans=0.0 2024-08-13 18:23:16,903 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2255010.0, ans=0.125 2024-08-13 18:23:37,174 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 23 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-13 18:23:43,255 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2255210.0, ans=0.1 2024-08-13 18:23:43,259 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2255210.0, ans=0.2 2024-08-13 18:23:43,588 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.10 vs. limit=10.0 2024-08-13 18:23:49,963 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 8150, loss[loss=0.1145, beats_loss=0.008417, ecapa_loss=0.0001733, whisper_loss=0.1044, over 16995.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01071, ecapa_loss=0.0001634, whisper_loss=0.09183, over 3892706.89 frames. ], batch size: 65, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:24:09,228 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.440e+01 2.841e+01 3.164e+01 5.500e+01, threshold=5.681e+01, percent-clipped=1.0 2024-08-13 18:24:27,706 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2255510.0, ans=0.125 2024-08-13 18:24:36,489 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2255610.0, ans=0.125 2024-08-13 18:24:37,455 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 35 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-13 18:24:42,442 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2255610.0, ans=0.0 2024-08-13 18:24:49,483 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 33 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-13 18:25:02,914 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 8200, loss[loss=0.09935, beats_loss=0.01191, ecapa_loss=0.0001874, whisper_loss=0.08556, over 21946.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01079, ecapa_loss=0.0001626, whisper_loss=0.09126, over 3893826.45 frames. ], batch size: 89, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:25:14,986 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2255810.0, ans=0.0 2024-08-13 18:25:51,090 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 30 from LS+wenet, 31 from Vox, 32 fro AS 2024-08-13 18:26:07,800 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2256210.0, ans=0.0 2024-08-13 18:26:14,154 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 8250, loss[loss=0.1011, beats_loss=0.01069, ecapa_loss=0.0001366, whisper_loss=0.08904, over 15134.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01076, ecapa_loss=0.0001634, whisper_loss=0.09181, over 3907870.48 frames. ], batch size: 56, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:26:32,166 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.899e+01 2.303e+01 2.576e+01 2.826e+01 3.811e+01, threshold=5.152e+01, percent-clipped=0.0 2024-08-13 18:26:45,892 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 21 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-13 18:26:51,203 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2256510.0, ans=0.125 2024-08-13 18:26:58,556 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 19 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-13 18:27:00,431 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=3.90 vs. limit=15.0 2024-08-13 18:27:01,529 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2256610.0, ans=0.1 2024-08-13 18:27:04,224 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 27 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-13 18:27:21,563 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.23 vs. limit=15.0 2024-08-13 18:27:24,816 WARNING [optim.py:496] (1/4) Scaling gradients by 0.04663357511162758, model_norm_threshold=51.52228546142578 2024-08-13 18:27:25,044 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.2.self_attn_weights.linear_pos.weight with proportion 0.95, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.156e+06, grad_sumsq=1.333e+05, orig_rms_sq=8.675e+00 2024-08-13 18:27:25,069 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 8300, loss[loss=0.09623, beats_loss=0.009399, ecapa_loss=0.0001847, whisper_loss=0.08499, over 15176.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01072, ecapa_loss=0.0001621, whisper_loss=0.09142, over 3890972.07 frames. ], batch size: 64, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:27:32,243 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.95 vs. limit=22.5 2024-08-13 18:27:42,012 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2256910.0, ans=0.0 2024-08-13 18:27:53,688 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2257010.0, ans=0.0 2024-08-13 18:28:02,727 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 24 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-13 18:28:11,006 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2257110.0, ans=0.125 2024-08-13 18:28:30,657 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 8350, loss[loss=0.1169, beats_loss=0.01167, ecapa_loss=0.0001691, whisper_loss=0.1035, over 19232.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01081, ecapa_loss=0.0001625, whisper_loss=0.09091, over 3883763.10 frames. ], batch size: 76, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:28:40,240 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2257310.0, ans=0.1 2024-08-13 18:28:47,997 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.980e+01 2.502e+01 2.789e+01 3.217e+01 1.105e+03, threshold=5.579e+01, percent-clipped=3.0 2024-08-13 18:28:58,678 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2257510.0, ans=0.07 2024-08-13 18:29:11,898 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.70 vs. limit=22.5 2024-08-13 18:29:15,077 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 33 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-13 18:29:20,278 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 16 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-13 18:29:25,905 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2257710.0, ans=0.1 2024-08-13 18:29:35,833 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 8400, loss[loss=0.1156, beats_loss=0.01089, ecapa_loss=0.0001373, whisper_loss=0.1034, over 21882.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01077, ecapa_loss=0.0001626, whisper_loss=0.09139, over 3905385.44 frames. ], batch size: 84, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:29:39,012 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.11 vs. limit=15.0 2024-08-13 18:30:06,118 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2258010.0, ans=0.125 2024-08-13 18:30:19,443 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-13 18:30:30,232 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2258210.0, ans=0.2 2024-08-13 18:30:33,979 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2258210.0, ans=0.125 2024-08-13 18:30:40,414 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 17 from LS+wenet, 22 from Vox, 15 fro AS 2024-08-13 18:30:42,833 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 8450, loss[loss=0.08703, beats_loss=0.01326, ecapa_loss=0.0001603, whisper_loss=0.07217, over 20610.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01074, ecapa_loss=0.0001637, whisper_loss=0.0917, over 3888110.15 frames. ], batch size: 84, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:30:59,949 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.752e+01 2.525e+01 2.749e+01 3.077e+01 1.697e+02, threshold=5.498e+01, percent-clipped=1.0 2024-08-13 18:31:00,374 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2258410.0, ans=0.2 2024-08-13 18:31:05,662 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.27 vs. limit=15.0 2024-08-13 18:31:12,851 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 27 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-13 18:31:19,635 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 28 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-13 18:31:20,890 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 24 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-13 18:31:29,032 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2258610.0, ans=0.1 2024-08-13 18:31:31,573 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2258610.0, ans=0.125 2024-08-13 18:31:32,966 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2258610.0, ans=0.125 2024-08-13 18:31:34,379 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2258710.0, ans=0.125 2024-08-13 18:31:36,094 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.74 vs. limit=22.5 2024-08-13 18:31:45,999 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2258710.0, ans=0.125 2024-08-13 18:31:48,364 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 8500, loss[loss=0.08558, beats_loss=0.01214, ecapa_loss=0.000168, whisper_loss=0.07176, over 22092.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01073, ecapa_loss=0.0001635, whisper_loss=0.0917, over 3888096.24 frames. ], batch size: 91, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:31:48,922 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.36 vs. limit=15.0 2024-08-13 18:31:54,727 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 21 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-13 18:32:23,612 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=2259010.0, ans=15.0 2024-08-13 18:32:24,356 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2259010.0, ans=0.125 2024-08-13 18:32:28,242 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-13 18:32:33,683 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2259110.0, ans=0.0 2024-08-13 18:32:41,336 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-13 18:32:54,185 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 8550, loss[loss=0.09492, beats_loss=0.01267, ecapa_loss=0.0001614, whisper_loss=0.08064, over 16469.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01076, ecapa_loss=0.0001633, whisper_loss=0.0917, over 3877514.49 frames. ], batch size: 66, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:32:54,682 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2259310.0, ans=0.2 2024-08-13 18:32:56,956 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 37 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-13 18:33:02,309 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2259310.0, ans=0.2 2024-08-13 18:33:04,547 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 22 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-13 18:33:10,943 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.989e+01 2.379e+01 2.646e+01 2.938e+01 4.520e+01, threshold=5.292e+01, percent-clipped=0.0 2024-08-13 18:33:19,128 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2259510.0, ans=0.125 2024-08-13 18:33:20,206 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 22 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-13 18:33:30,944 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 26 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-13 18:33:47,654 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2259710.0, ans=0.125 2024-08-13 18:33:52,683 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 34 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-13 18:33:58,726 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 8600, loss[loss=0.103, beats_loss=0.01086, ecapa_loss=0.0001122, whisper_loss=0.09099, over 20345.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01067, ecapa_loss=0.0001634, whisper_loss=0.09231, over 3874511.58 frames. ], batch size: 75, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:34:06,070 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.00 vs. limit=22.5 2024-08-13 18:34:12,276 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2259910.0, ans=0.5 2024-08-13 18:35:04,913 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.73 vs. limit=15.0 2024-08-13 18:35:06,606 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 8650, loss[loss=0.08659, beats_loss=0.01309, ecapa_loss=0.0001427, whisper_loss=0.07207, over 16118.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01082, ecapa_loss=0.0001633, whisper_loss=0.09129, over 3864829.95 frames. ], batch size: 65, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:35:10,462 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.45 vs. limit=15.0 2024-08-13 18:35:11,727 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2260310.0, ans=0.125 2024-08-13 18:35:13,072 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2260310.0, ans=0.0 2024-08-13 18:35:24,162 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.838e+01 2.420e+01 2.581e+01 2.912e+01 4.652e+01, threshold=5.162e+01, percent-clipped=0.0 2024-08-13 18:35:35,065 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 29 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-13 18:35:36,631 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2260510.0, ans=0.0 2024-08-13 18:35:43,661 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.56 vs. limit=15.0 2024-08-13 18:35:55,850 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.04 vs. limit=15.0 2024-08-13 18:36:11,845 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-13 18:36:14,661 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 8700, loss[loss=0.08266, beats_loss=0.01063, ecapa_loss=0.0001518, whisper_loss=0.07052, over 18759.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01082, ecapa_loss=0.0001628, whisper_loss=0.09094, over 3855027.99 frames. ], batch size: 76, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:36:29,105 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2260910.0, ans=0.0 2024-08-13 18:36:44,047 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2260910.0, ans=0.125 2024-08-13 18:36:46,722 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-13 18:37:08,295 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2261110.0, ans=0.0 2024-08-13 18:37:31,157 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=15.24 vs. limit=15.0 2024-08-13 18:37:32,879 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 8750, loss[loss=0.09304, beats_loss=0.009881, ecapa_loss=0.0002065, whisper_loss=0.08109, over 21408.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01077, ecapa_loss=0.0001631, whisper_loss=0.09136, over 3830844.53 frames. ], batch size: 92, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:37:50,597 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.965e+01 2.392e+01 2.713e+01 3.025e+01 4.261e+01, threshold=5.425e+01, percent-clipped=0.0 2024-08-13 18:38:05,986 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 20 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-13 18:38:10,538 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 18:38:29,534 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.39 vs. limit=15.0 2024-08-13 18:38:41,231 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=20.49 vs. limit=22.5 2024-08-13 18:38:53,503 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-13 18:38:54,721 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 8800, loss[loss=0.1079, beats_loss=0.0111, ecapa_loss=0.0001569, whisper_loss=0.09523, over 23355.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.0108, ecapa_loss=0.000162, whisper_loss=0.09219, over 3894796.83 frames. ], batch size: 93, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:39:09,031 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-13 18:39:11,137 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-13 18:39:11,205 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2261810.0, ans=0.125 2024-08-13 18:39:26,862 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-13 18:39:35,439 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 16 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-13 18:39:41,098 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.54 vs. limit=15.0 2024-08-13 18:40:03,462 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2262110.0, ans=0.125 2024-08-13 18:40:10,724 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.44 vs. limit=6.0 2024-08-13 18:40:21,379 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2262210.0, ans=0.2 2024-08-13 18:40:33,284 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 8850, loss[loss=0.05923, beats_loss=0.0154, ecapa_loss=0.0001109, whisper_loss=0.04273, over 17103.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01086, ecapa_loss=0.0001625, whisper_loss=0.09065, over 3854680.64 frames. ], batch size: 71, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:40:43,037 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 23 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-13 18:40:56,094 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 38 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-13 18:40:57,438 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.733e+01 2.352e+01 2.612e+01 3.083e+01 5.604e+01, threshold=5.223e+01, percent-clipped=1.0 2024-08-13 18:40:58,797 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2262410.0, ans=0.125 2024-08-13 18:41:01,110 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 22 from LS+wenet, 18 from Vox, 16 fro AS 2024-08-13 18:41:05,792 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.93 vs. limit=12.0 2024-08-13 18:41:07,537 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 26 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-13 18:41:28,207 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2262510.0, ans=0.95 2024-08-13 18:41:50,076 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2262710.0, ans=0.0 2024-08-13 18:41:58,603 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 18:42:09,444 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 8900, loss[loss=0.1032, beats_loss=0.01138, ecapa_loss=0.0001036, whisper_loss=0.09077, over 19679.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01083, ecapa_loss=0.0001613, whisper_loss=0.09134, over 3874153.74 frames. ], batch size: 73, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:42:09,944 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2262810.0, ans=0.0 2024-08-13 18:42:39,384 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2262910.0, ans=0.125 2024-08-13 18:43:02,787 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 18:43:22,986 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 21 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-13 18:43:28,299 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 18 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-13 18:43:33,212 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 8950, loss[loss=0.124, beats_loss=0.007288, ecapa_loss=0.0001599, whisper_loss=0.1151, over 15579.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01081, ecapa_loss=0.0001619, whisper_loss=0.09135, over 3865857.92 frames. ], batch size: 57, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:43:37,211 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2263310.0, ans=0.0 2024-08-13 18:43:44,604 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2263410.0, ans=0.2 2024-08-13 18:43:49,622 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.772e+01 2.368e+01 2.604e+01 2.902e+01 4.386e+01, threshold=5.207e+01, percent-clipped=0.0 2024-08-13 18:43:50,142 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2263410.0, ans=0.09899494936611666 2024-08-13 18:43:56,337 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2263410.0, ans=0.5 2024-08-13 18:43:59,080 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2263510.0, ans=0.1 2024-08-13 18:44:14,221 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.56 vs. limit=15.0 2024-08-13 18:44:22,759 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 25 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-13 18:44:38,237 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 9000, loss[loss=0.1122, beats_loss=0.009566, ecapa_loss=0.0001729, whisper_loss=0.1009, over 22025.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01079, ecapa_loss=0.0001624, whisper_loss=0.09185, over 3859825.50 frames. ], batch size: 84, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:44:38,237 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-13 18:45:17,653 INFO [train_multi_KD3.py:1149] (1/4) Epoch 16, validation on ASR_libri: loss=0.2538, beats_loss=0, ecapa_loss=0.0005571, whisper_loss=0.2482, over 922467.00 frames. 2024-08-13 18:45:37,223 INFO [train_multi_KD3.py:1149] (1/4) Epoch 16, validation on SV_voxceleb1: loss=0.004514, beats_loss=0, ecapa_loss=0.0004514, whisper_loss=0, over 939242.00 frames. 2024-08-13 18:47:38,923 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.3889, 2.1235, 2.4973, 1.3179], device='cuda:1') 2024-08-13 18:47:39,554 INFO [train_multi_KD3.py:1149] (1/4) Epoch 16, validation on AT_audioset: loss=0.02381, beats_loss=0.02381, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 18:47:39,557 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 30639MB 2024-08-13 18:48:26,744 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 16 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-13 18:48:35,447 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.67 vs. limit=15.0 2024-08-13 18:48:47,711 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 9050, loss[loss=0.1093, beats_loss=0.009477, ecapa_loss=0.000153, whisper_loss=0.09832, over 23300.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01079, ecapa_loss=0.0001626, whisper_loss=0.092, over 3898416.74 frames. ], batch size: 92, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:48:52,323 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2264310.0, ans=0.0 2024-08-13 18:48:56,580 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.97 vs. limit=10.0 2024-08-13 18:48:59,056 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.14 vs. limit=15.0 2024-08-13 18:49:05,165 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.411e+01 2.657e+01 3.042e+01 5.076e+01, threshold=5.314e+01, percent-clipped=0.0 2024-08-13 18:49:06,848 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 22 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-13 18:49:24,273 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2264510.0, ans=0.0 2024-08-13 18:49:29,566 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 18:49:30,811 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2264610.0, ans=0.125 2024-08-13 18:49:43,708 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2264710.0, ans=0.125 2024-08-13 18:49:46,715 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 25 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-13 18:49:50,071 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2024-08-13 18:49:57,096 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 9100, loss[loss=0.1244, beats_loss=0.01047, ecapa_loss=0.0001382, whisper_loss=0.1126, over 18324.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01078, ecapa_loss=0.0001628, whisper_loss=0.09176, over 3878805.08 frames. ], batch size: 69, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:49:57,188 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 37 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-13 18:50:00,277 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.19 vs. limit=15.0 2024-08-13 18:50:22,309 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2264910.0, ans=0.1 2024-08-13 18:50:25,255 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2265010.0, ans=0.125 2024-08-13 18:50:31,912 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 28 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-13 18:50:35,188 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2265010.0, ans=0.0 2024-08-13 18:50:40,964 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2265110.0, ans=0.125 2024-08-13 18:50:42,645 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2265110.0, ans=0.125 2024-08-13 18:50:54,634 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2265210.0, ans=0.125 2024-08-13 18:50:55,799 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2265210.0, ans=0.1 2024-08-13 18:50:58,141 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 25 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-13 18:50:58,408 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=5.129e+01 2024-08-13 18:50:59,848 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2265210.0, ans=0.0 2024-08-13 18:51:00,265 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.41 vs. limit=15.0 2024-08-13 18:51:02,834 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2265210.0, ans=0.0 2024-08-13 18:51:05,170 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 23 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-13 18:51:07,379 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 9150, loss[loss=0.105, beats_loss=0.01275, ecapa_loss=0.0001396, whisper_loss=0.0908, over 23406.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01081, ecapa_loss=0.0001628, whisper_loss=0.09077, over 3864394.28 frames. ], batch size: 93, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:51:07,494 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 27 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-13 18:51:12,118 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 25 from LS+wenet, 10 from Vox, 39 fro AS 2024-08-13 18:51:17,303 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2265310.0, ans=0.04949747468305833 2024-08-13 18:51:26,146 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.406e+01 2.793e+01 3.104e+01 4.161e+01, threshold=5.587e+01, percent-clipped=0.0 2024-08-13 18:51:33,117 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2265410.0, ans=0.0 2024-08-13 18:51:36,560 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2265510.0, ans=0.1 2024-08-13 18:51:45,595 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 22 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-13 18:51:46,912 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 20 from LS+wenet, 9 from Vox, 26 fro AS 2024-08-13 18:51:47,381 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2265510.0, ans=10.0 2024-08-13 18:51:49,732 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2265610.0, ans=0.125 2024-08-13 18:51:54,295 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2265610.0, ans=0.2 2024-08-13 18:52:07,407 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2265710.0, ans=0.015 2024-08-13 18:52:11,408 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.15 vs. limit=15.0 2024-08-13 18:52:12,257 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2265710.0, ans=0.1 2024-08-13 18:52:17,737 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 9200, loss[loss=0.1144, beats_loss=0.009505, ecapa_loss=0.0001575, whisper_loss=0.1034, over 18175.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01085, ecapa_loss=0.000162, whisper_loss=0.091, over 3880400.71 frames. ], batch size: 68, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:52:30,458 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.49 vs. limit=10.0 2024-08-13 18:52:40,164 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.88 vs. limit=15.0 2024-08-13 18:52:45,210 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2266010.0, ans=0.0 2024-08-13 18:52:59,015 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2266110.0, ans=0.125 2024-08-13 18:53:06,297 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2024-08-13 18:53:14,414 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-13 18:53:18,602 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.89 vs. limit=22.5 2024-08-13 18:53:21,025 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=2266210.0, ans=6.0 2024-08-13 18:53:24,388 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 9250, loss[loss=0.09104, beats_loss=0.01211, ecapa_loss=0.0001468, whisper_loss=0.07747, over 21601.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.0108, ecapa_loss=0.0001617, whisper_loss=0.09144, over 3880651.54 frames. ], batch size: 90, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:53:25,868 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 28 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-13 18:53:34,318 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.12 vs. limit=6.0 2024-08-13 18:53:37,756 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 14 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-13 18:53:41,512 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.790e+01 2.413e+01 2.566e+01 3.082e+01 5.176e+01, threshold=5.131e+01, percent-clipped=0.0 2024-08-13 18:53:42,262 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2266410.0, ans=0.2 2024-08-13 18:53:59,857 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-13 18:54:16,608 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.43 vs. limit=22.5 2024-08-13 18:54:28,701 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2266710.0, ans=0.1 2024-08-13 18:54:32,026 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 9300, loss[loss=0.1096, beats_loss=0.01072, ecapa_loss=0.0001426, whisper_loss=0.09741, over 19219.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01079, ecapa_loss=0.0001616, whisper_loss=0.09176, over 3869680.12 frames. ], batch size: 74, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:55:03,252 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2267010.0, ans=0.125 2024-08-13 18:55:11,794 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2267010.0, ans=0.2 2024-08-13 18:55:18,882 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2267110.0, ans=0.1 2024-08-13 18:55:34,822 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 25 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-13 18:55:41,680 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 9350, loss[loss=0.1192, beats_loss=0.006105, ecapa_loss=0.0002311, whisper_loss=0.1108, over 13865.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01074, ecapa_loss=0.0001609, whisper_loss=0.09221, over 3872104.68 frames. ], batch size: 54, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:55:48,917 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 30 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-13 18:55:52,222 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.31 vs. limit=15.0 2024-08-13 18:55:58,734 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.81 vs. limit=10.0 2024-08-13 18:55:58,982 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.958e+01 2.391e+01 2.659e+01 2.911e+01 1.966e+02, threshold=5.317e+01, percent-clipped=2.0 2024-08-13 18:56:14,752 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 20 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-13 18:56:32,506 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2267610.0, ans=0.0 2024-08-13 18:56:36,167 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2267710.0, ans=0.0 2024-08-13 18:56:48,858 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 9400, loss[loss=0.1214, beats_loss=0.008938, ecapa_loss=0.0001792, whisper_loss=0.1107, over 21896.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01074, ecapa_loss=0.0001607, whisper_loss=0.0923, over 3870661.25 frames. ], batch size: 83, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:56:55,756 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 18:57:00,773 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2267910.0, ans=0.2 2024-08-13 18:57:01,826 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 21 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-13 18:57:04,869 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.822e-02 2024-08-13 18:57:10,098 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 34 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-13 18:57:10,654 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.98 vs. limit=15.0 2024-08-13 18:57:28,947 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2268110.0, ans=0.125 2024-08-13 18:57:30,184 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2268110.0, ans=0.05 2024-08-13 18:57:31,280 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 19 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-13 18:57:37,775 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 15 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-13 18:57:38,198 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2268110.0, ans=0.0 2024-08-13 18:57:42,098 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2268210.0, ans=0.125 2024-08-13 18:57:46,511 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2268210.0, ans=0.125 2024-08-13 18:57:54,042 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 24 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-13 18:57:54,971 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 9450, loss[loss=0.094, beats_loss=0.01155, ecapa_loss=0.0001471, whisper_loss=0.08098, over 22368.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01075, ecapa_loss=0.0001625, whisper_loss=0.09195, over 3830668.05 frames. ], batch size: 88, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:57:58,141 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2268310.0, ans=0.2 2024-08-13 18:58:02,546 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 21 from LS+wenet, 27 from Vox, 23 fro AS 2024-08-13 18:58:08,800 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.63 vs. limit=15.0 2024-08-13 18:58:12,049 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.364e+01 2.605e+01 2.951e+01 9.303e+01, threshold=5.211e+01, percent-clipped=2.0 2024-08-13 18:58:13,330 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 26 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-13 18:58:14,758 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2268410.0, ans=0.0 2024-08-13 18:58:22,131 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 22 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-13 18:58:36,979 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2268610.0, ans=0.125 2024-08-13 18:58:47,521 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2268710.0, ans=0.0 2024-08-13 18:58:54,255 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2268710.0, ans=0.1 2024-08-13 18:59:00,206 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 9500, loss[loss=0.08441, beats_loss=0.01389, ecapa_loss=0.0001397, whisper_loss=0.06912, over 18857.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01078, ecapa_loss=0.000162, whisper_loss=0.09162, over 3825380.73 frames. ], batch size: 75, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:59:04,061 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 29 from LS+wenet, 17 from Vox, 48 fro AS 2024-08-13 18:59:08,310 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 24 from LS+wenet, 33 from Vox, 35 fro AS 2024-08-13 18:59:12,018 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.32 vs. limit=10.0 2024-08-13 18:59:26,313 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2269010.0, ans=0.0 2024-08-13 18:59:27,980 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.78 vs. limit=10.0 2024-08-13 18:59:38,542 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2269010.0, ans=0.125 2024-08-13 18:59:58,638 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.78 vs. limit=15.0 2024-08-13 19:00:06,417 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 9550, loss[loss=0.1294, beats_loss=0.01053, ecapa_loss=0.0001778, whisper_loss=0.1171, over 23190.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01072, ecapa_loss=0.0001628, whisper_loss=0.09186, over 3839675.67 frames. ], batch size: 93, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 19:00:12,335 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2269310.0, ans=0.2 2024-08-13 19:00:20,457 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2269410.0, ans=0.0 2024-08-13 19:00:20,767 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.63 vs. limit=15.0 2024-08-13 19:00:23,770 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.741e+01 2.300e+01 2.521e+01 2.795e+01 4.846e+01, threshold=5.041e+01, percent-clipped=0.0 2024-08-13 19:00:37,313 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2269510.0, ans=0.2 2024-08-13 19:01:04,357 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2269710.0, ans=0.125 2024-08-13 19:01:07,819 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 16 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-13 19:01:11,687 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 9600, loss[loss=0.09866, beats_loss=0.009634, ecapa_loss=0.0001957, whisper_loss=0.08706, over 19572.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01073, ecapa_loss=0.0001615, whisper_loss=0.09162, over 3844973.12 frames. ], batch size: 80, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 19:01:13,589 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2269810.0, ans=0.125 2024-08-13 19:01:19,846 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2269810.0, ans=0.1 2024-08-13 19:01:23,699 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=2269910.0, ans=10.0 2024-08-13 19:01:28,498 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2269910.0, ans=0.125 2024-08-13 19:01:40,572 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-13 19:01:59,153 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.01 vs. limit=15.0 2024-08-13 19:02:02,494 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2270210.0, ans=0.125 2024-08-13 19:02:06,467 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2270210.0, ans=0.04949747468305833 2024-08-13 19:02:15,524 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 21 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-13 19:02:16,649 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 9650, loss[loss=0.1106, beats_loss=0.009588, ecapa_loss=0.0001848, whisper_loss=0.09916, over 14902.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0107, ecapa_loss=0.0001618, whisper_loss=0.09133, over 3837561.64 frames. ], batch size: 58, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 19:02:24,011 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.16 vs. limit=6.0 2024-08-13 19:02:24,933 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2270310.0, ans=0.025 2024-08-13 19:02:28,448 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 25 from LS+wenet, 20 from Vox, 50 fro AS 2024-08-13 19:02:29,652 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 26 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-13 19:02:31,172 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 20 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-13 19:02:31,491 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2270410.0, ans=0.125 2024-08-13 19:02:31,702 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.53 vs. limit=10.0 2024-08-13 19:02:33,502 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.348e+01 2.592e+01 2.887e+01 4.146e+01, threshold=5.184e+01, percent-clipped=0.0 2024-08-13 19:02:37,748 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 18 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-13 19:02:42,038 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2270510.0, ans=0.1 2024-08-13 19:03:03,286 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.44 vs. limit=15.0 2024-08-13 19:03:04,190 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2270610.0, ans=0.0 2024-08-13 19:03:21,777 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 9700, loss[loss=0.1177, beats_loss=0.009767, ecapa_loss=0.00012, whisper_loss=0.1067, over 23224.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01067, ecapa_loss=0.0001623, whisper_loss=0.09166, over 3834892.29 frames. ], batch size: 87, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 19:03:27,333 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2270810.0, ans=0.05 2024-08-13 19:03:43,729 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 35 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-13 19:03:50,845 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2271010.0, ans=0.035 2024-08-13 19:03:52,098 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2271010.0, ans=0.2 2024-08-13 19:04:14,014 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 23 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-13 19:04:16,925 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2271210.0, ans=0.125 2024-08-13 19:04:22,297 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2271210.0, ans=0.2 2024-08-13 19:04:26,014 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2271310.0, ans=0.125 2024-08-13 19:04:26,934 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 9750, loss[loss=0.1163, beats_loss=0.008833, ecapa_loss=0.0002158, whisper_loss=0.1053, over 21962.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0107, ecapa_loss=0.0001623, whisper_loss=0.09101, over 3838385.17 frames. ], batch size: 93, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 19:04:28,404 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 25 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-13 19:04:31,590 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.87 vs. limit=6.0 2024-08-13 19:04:43,864 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.462e+01 2.717e+01 3.058e+01 5.863e+01, threshold=5.433e+01, percent-clipped=1.0 2024-08-13 19:04:44,719 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.54 vs. limit=10.0 2024-08-13 19:04:47,828 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 30 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-13 19:04:55,045 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2271510.0, ans=0.1 2024-08-13 19:05:02,639 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2271510.0, ans=0.125 2024-08-13 19:05:17,052 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 34 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-13 19:05:32,147 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 9800, loss[loss=0.1114, beats_loss=0.01018, ecapa_loss=0.0001477, whisper_loss=0.09969, over 15693.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01072, ecapa_loss=0.0001623, whisper_loss=0.09111, over 3835925.96 frames. ], batch size: 60, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 19:05:36,449 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2271810.0, ans=0.125 2024-08-13 19:05:41,356 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2271810.0, ans=0.125 2024-08-13 19:06:01,310 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.16 vs. limit=15.0 2024-08-13 19:06:23,646 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2272210.0, ans=0.0 2024-08-13 19:06:24,547 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-13 19:06:25,880 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-13 19:06:29,553 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 25 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-13 19:06:37,287 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.95 vs. limit=15.0 2024-08-13 19:06:37,809 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 9850, loss[loss=0.1234, beats_loss=0.006451, ecapa_loss=0.0002151, whisper_loss=0.1148, over 15757.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01076, ecapa_loss=0.0001616, whisper_loss=0.09122, over 3837870.30 frames. ], batch size: 64, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 19:06:45,903 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.83 vs. limit=15.0 2024-08-13 19:06:47,046 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2272310.0, ans=0.125 2024-08-13 19:06:50,863 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2272410.0, ans=0.125 2024-08-13 19:06:54,558 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.864e+01 2.358e+01 2.690e+01 3.043e+01 6.098e+01, threshold=5.380e+01, percent-clipped=1.0 2024-08-13 19:07:01,648 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2272410.0, ans=0.125 2024-08-13 19:07:25,952 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 27 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-13 19:07:35,327 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2272710.0, ans=0.0 2024-08-13 19:07:42,035 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.54 vs. limit=15.0 2024-08-13 19:07:42,727 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 9900, loss[loss=0.08688, beats_loss=0.01387, ecapa_loss=0.000141, whisper_loss=0.07161, over 18492.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01082, ecapa_loss=0.0001619, whisper_loss=0.09093, over 3873382.96 frames. ], batch size: 73, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 19:07:48,306 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2272810.0, ans=0.125 2024-08-13 19:07:53,318 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 14 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-13 19:08:08,968 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2273010.0, ans=0.1 2024-08-13 19:08:31,769 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 15 from Vox, 48 fro AS 2024-08-13 19:08:39,408 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 21 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-13 19:08:42,355 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2273210.0, ans=0.015 2024-08-13 19:08:47,035 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 9950, loss[loss=0.09198, beats_loss=0.01223, ecapa_loss=0.0001571, whisper_loss=0.07818, over 22004.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01077, ecapa_loss=0.000162, whisper_loss=0.09106, over 3897391.48 frames. ], batch size: 90, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 19:08:51,087 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 29 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-13 19:09:03,324 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2273410.0, ans=0.125 2024-08-13 19:09:04,081 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.022e+01 2.452e+01 2.692e+01 3.113e+01 1.874e+02, threshold=5.385e+01, percent-clipped=3.0 2024-08-13 19:09:14,327 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2273510.0, ans=0.0 2024-08-13 19:09:23,696 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2273510.0, ans=0.125 2024-08-13 19:09:33,549 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 15 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-13 19:09:36,389 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2273610.0, ans=0.0 2024-08-13 19:09:52,228 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 10000, loss[loss=0.1214, beats_loss=0.008379, ecapa_loss=0.0001882, whisper_loss=0.1111, over 21618.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01081, ecapa_loss=0.0001625, whisper_loss=0.09077, over 3849504.52 frames. ], batch size: 85, lr: 3.89e-03, grad_scale: 1.152921504606847e+18 2024-08-13 19:09:56,449 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.77 vs. limit=22.5 2024-08-13 19:10:03,828 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2273910.0, ans=0.0 2024-08-13 19:10:12,736 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-13 19:10:14,566 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.17 vs. limit=22.5 2024-08-13 19:10:19,479 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2274010.0, ans=0.125 2024-08-13 19:10:23,614 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.91 vs. limit=10.0 2024-08-13 19:10:33,143 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2274110.0, ans=0.0 2024-08-13 19:10:34,189 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 18 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-13 19:10:41,850 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 17 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-13 19:10:47,316 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2274210.0, ans=0.0 2024-08-13 19:10:57,529 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 10050, loss[loss=0.09939, beats_loss=0.009159, ecapa_loss=0.0001929, whisper_loss=0.0883, over 21367.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01079, ecapa_loss=0.0001619, whisper_loss=0.09062, over 3836767.90 frames. ], batch size: 88, lr: 3.89e-03, grad_scale: 1.152921504606847e+18 2024-08-13 19:11:12,260 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2274410.0, ans=0.0 2024-08-13 19:11:14,245 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.462e+01 2.706e+01 3.125e+01 1.991e+02, threshold=5.413e+01, percent-clipped=1.0 2024-08-13 19:11:14,771 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2274410.0, ans=0.025 2024-08-13 19:11:22,023 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2274510.0, ans=0.125 2024-08-13 19:11:23,131 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 27 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-13 19:11:32,038 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 23 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-13 19:11:36,244 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2274610.0, ans=0.2 2024-08-13 19:11:44,841 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 23 from LS+wenet, 20 from Vox, 50 fro AS 2024-08-13 19:12:01,195 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 10100, loss[loss=0.1195, beats_loss=0.01092, ecapa_loss=0.0001327, whisper_loss=0.1072, over 21039.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01079, ecapa_loss=0.0001618, whisper_loss=0.0911, over 3850679.23 frames. ], batch size: 83, lr: 3.89e-03, grad_scale: 1.152921504606847e+18 2024-08-13 19:12:14,218 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 25 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-13 19:12:23,988 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.28 vs. limit=22.5 2024-08-13 19:12:25,965 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 22 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-13 19:12:40,111 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.15 vs. limit=15.0 2024-08-13 19:12:41,996 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 22 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-13 19:12:43,762 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2275110.0, ans=0.125 2024-08-13 19:12:50,065 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2275110.0, ans=0.0 2024-08-13 19:13:06,650 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 10150, loss[loss=0.1148, beats_loss=0.0104, ecapa_loss=0.000192, whisper_loss=0.1025, over 19891.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01075, ecapa_loss=0.0001634, whisper_loss=0.09189, over 3879621.91 frames. ], batch size: 83, lr: 3.89e-03, grad_scale: 1.152921504606847e+18 2024-08-13 19:13:18,713 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2275310.0, ans=0.09899494936611666 2024-08-13 19:13:24,606 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.810e+01 2.395e+01 2.644e+01 2.917e+01 4.595e+01, threshold=5.288e+01, percent-clipped=0.0 2024-08-13 19:13:44,859 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 29 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-13 19:14:15,978 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 10200, loss[loss=0.09481, beats_loss=0.0121, ecapa_loss=0.0001514, whisper_loss=0.08119, over 22737.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01076, ecapa_loss=0.0001624, whisper_loss=0.09168, over 3862756.32 frames. ], batch size: 94, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:14:20,580 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.31 vs. limit=22.5 2024-08-13 19:14:37,541 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2275910.0, ans=0.125 2024-08-13 19:14:51,877 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2276010.0, ans=0.125 2024-08-13 19:15:19,005 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 20 from LS+wenet, 23 from Vox, 18 fro AS 2024-08-13 19:15:26,047 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 25 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-13 19:15:31,277 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 10250, loss[loss=0.09938, beats_loss=0.01173, ecapa_loss=0.0001364, whisper_loss=0.08629, over 15230.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01071, ecapa_loss=0.0001633, whisper_loss=0.09221, over 3895515.29 frames. ], batch size: 58, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:15:36,420 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-13 19:15:48,287 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2276410.0, ans=0.0 2024-08-13 19:15:53,073 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.977e+01 2.373e+01 2.735e+01 3.155e+01 5.239e+01, threshold=5.471e+01, percent-clipped=0.0 2024-08-13 19:15:53,522 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2276410.0, ans=0.125 2024-08-13 19:16:00,939 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 30 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-13 19:16:11,960 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.34 vs. limit=6.0 2024-08-13 19:16:31,372 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.07 vs. limit=22.5 2024-08-13 19:16:42,976 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2276710.0, ans=0.125 2024-08-13 19:16:46,603 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 10300, loss[loss=0.08836, beats_loss=0.0144, ecapa_loss=0.0001339, whisper_loss=0.07262, over 22561.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.0108, ecapa_loss=0.000163, whisper_loss=0.09134, over 3919301.16 frames. ], batch size: 93, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:16:55,337 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=2276810.0, ans=10.0 2024-08-13 19:16:56,862 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2276810.0, ans=0.125 2024-08-13 19:17:15,398 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 25 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-13 19:17:30,936 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2277010.0, ans=0.2 2024-08-13 19:17:35,752 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2277110.0, ans=0.125 2024-08-13 19:17:36,675 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 25 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-13 19:17:51,409 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2277210.0, ans=0.125 2024-08-13 19:17:58,714 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 38 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-13 19:17:59,399 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2277210.0, ans=0.0 2024-08-13 19:18:04,492 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 10350, loss[loss=0.09612, beats_loss=0.01056, ecapa_loss=0.0001876, whisper_loss=0.08369, over 17629.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01082, ecapa_loss=0.000163, whisper_loss=0.09126, over 3946844.27 frames. ], batch size: 74, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:18:15,459 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2277310.0, ans=0.125 2024-08-13 19:18:15,509 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2277310.0, ans=0.125 2024-08-13 19:18:26,220 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.822e+01 2.410e+01 2.741e+01 3.127e+01 1.313e+02, threshold=5.482e+01, percent-clipped=3.0 2024-08-13 19:18:55,310 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2277610.0, ans=0.2 2024-08-13 19:19:02,045 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2277610.0, ans=0.1 2024-08-13 19:19:03,007 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 26 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-13 19:19:03,260 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2277610.0, ans=0.125 2024-08-13 19:19:07,231 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 19:19:11,337 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=2277710.0, ans=0.025 2024-08-13 19:19:15,868 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 28 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-13 19:19:18,941 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 24 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-13 19:19:20,760 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 10400, loss[loss=0.09364, beats_loss=0.01209, ecapa_loss=0.0001577, whisper_loss=0.07998, over 21699.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0108, ecapa_loss=0.0001629, whisper_loss=0.09077, over 3910018.28 frames. ], batch size: 89, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:19:39,211 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2277910.0, ans=0.125 2024-08-13 19:19:43,119 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 21 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-13 19:19:44,852 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2277910.0, ans=0.125 2024-08-13 19:19:46,396 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2277910.0, ans=0.125 2024-08-13 19:19:55,126 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2278010.0, ans=0.125 2024-08-13 19:20:02,091 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2278010.0, ans=0.125 2024-08-13 19:20:17,914 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.18 vs. limit=15.0 2024-08-13 19:20:33,584 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 10450, loss[loss=0.09795, beats_loss=0.009517, ecapa_loss=0.0001307, whisper_loss=0.08712, over 15459.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01075, ecapa_loss=0.000162, whisper_loss=0.09062, over 3883063.00 frames. ], batch size: 57, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:20:48,694 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.97 vs. limit=15.0 2024-08-13 19:20:51,746 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=2278410.0, ans=15.0 2024-08-13 19:20:55,258 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.879e+01 2.408e+01 2.686e+01 2.992e+01 7.083e+01, threshold=5.372e+01, percent-clipped=1.0 2024-08-13 19:20:55,399 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 22 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-13 19:21:10,037 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2278510.0, ans=0.0 2024-08-13 19:21:17,625 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 27 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-13 19:21:17,835 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2278610.0, ans=0.125 2024-08-13 19:21:23,525 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2278610.0, ans=0.2 2024-08-13 19:21:23,627 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2278610.0, ans=0.2 2024-08-13 19:21:49,442 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 10500, loss[loss=0.1073, beats_loss=0.01059, ecapa_loss=0.0001735, whisper_loss=0.09498, over 22244.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01072, ecapa_loss=0.0001627, whisper_loss=0.09076, over 3862093.87 frames. ], batch size: 90, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:21:56,256 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.61 vs. limit=15.0 2024-08-13 19:22:04,367 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2278910.0, ans=0.1 2024-08-13 19:22:14,560 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2278910.0, ans=0.125 2024-08-13 19:22:55,759 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2279210.0, ans=0.09899494936611666 2024-08-13 19:23:05,026 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2279310.0, ans=0.125 2024-08-13 19:23:05,943 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 10550, loss[loss=0.09189, beats_loss=0.008428, ecapa_loss=0.000251, whisper_loss=0.08096, over 19245.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01075, ecapa_loss=0.0001644, whisper_loss=0.09034, over 3850464.74 frames. ], batch size: 85, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:23:16,616 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2279310.0, ans=0.1 2024-08-13 19:23:17,822 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 16 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-13 19:23:29,192 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.960e+01 2.431e+01 2.823e+01 3.244e+01 7.825e+01, threshold=5.646e+01, percent-clipped=1.0 2024-08-13 19:23:38,786 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2279510.0, ans=0.0 2024-08-13 19:23:52,703 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 20 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-13 19:24:04,133 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2279610.0, ans=0.035 2024-08-13 19:24:13,102 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2279710.0, ans=0.2 2024-08-13 19:24:26,761 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 10600, loss[loss=0.09465, beats_loss=0.01268, ecapa_loss=0.0001496, whisper_loss=0.08047, over 22809.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01079, ecapa_loss=0.0001639, whisper_loss=0.09035, over 3896852.51 frames. ], batch size: 93, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:24:28,699 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-13 19:24:35,557 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 27 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-13 19:24:39,852 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2279810.0, ans=0.2 2024-08-13 19:24:43,423 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2279910.0, ans=0.0 2024-08-13 19:24:45,207 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.402e+01 2024-08-13 19:25:01,721 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.651e+01 2024-08-13 19:25:10,185 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2280010.0, ans=0.0 2024-08-13 19:25:26,143 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 23 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-13 19:25:51,503 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 10650, loss[loss=0.08616, beats_loss=0.01227, ecapa_loss=0.000181, whisper_loss=0.07208, over 21075.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01081, ecapa_loss=0.0001621, whisper_loss=0.09045, over 3918219.06 frames. ], batch size: 90, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:26:11,896 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 35 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-13 19:26:15,373 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.293e+01 2.581e+01 2.913e+01 4.333e+01, threshold=5.161e+01, percent-clipped=0.0 2024-08-13 19:26:28,046 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 21 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-13 19:27:07,858 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.56 vs. limit=15.0 2024-08-13 19:27:09,235 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.96 vs. limit=12.0 2024-08-13 19:27:09,342 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.77 vs. limit=15.0 2024-08-13 19:27:13,864 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 10700, loss[loss=0.1183, beats_loss=0.01005, ecapa_loss=0.0001346, whisper_loss=0.1069, over 17275.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01071, ecapa_loss=0.0001624, whisper_loss=0.09147, over 3903877.44 frames. ], batch size: 62, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:27:18,074 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2280810.0, ans=0.125 2024-08-13 19:27:30,783 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2280910.0, ans=0.125 2024-08-13 19:27:44,822 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2281010.0, ans=0.125 2024-08-13 19:28:01,400 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2281110.0, ans=0.0 2024-08-13 19:28:08,359 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 18 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-13 19:28:16,540 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2281210.0, ans=0.0 2024-08-13 19:28:19,991 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.10 vs. limit=22.5 2024-08-13 19:28:26,734 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2281210.0, ans=0.125 2024-08-13 19:28:34,653 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 10750, loss[loss=0.1037, beats_loss=0.009191, ecapa_loss=0.000172, whisper_loss=0.09277, over 21426.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01076, ecapa_loss=0.0001628, whisper_loss=0.0909, over 3911059.95 frames. ], batch size: 87, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:28:44,179 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 37 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-13 19:28:54,607 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 27 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-13 19:28:57,099 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.061e+01 2.479e+01 2.782e+01 3.163e+01 7.452e+01, threshold=5.564e+01, percent-clipped=1.0 2024-08-13 19:28:59,702 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 17 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-13 19:29:07,639 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2281510.0, ans=0.0 2024-08-13 19:29:18,429 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.66 vs. limit=22.5 2024-08-13 19:29:35,901 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=10.17 vs. limit=12.0 2024-08-13 19:29:36,256 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 29 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-13 19:29:37,714 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 23 from LS+wenet, 10 from Vox, 21 fro AS 2024-08-13 19:29:39,328 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 25 from LS+wenet, 27 from Vox, 25 fro AS 2024-08-13 19:29:55,877 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 10800, loss[loss=0.09563, beats_loss=0.01176, ecapa_loss=0.0001557, whisper_loss=0.08232, over 19543.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01071, ecapa_loss=0.0001627, whisper_loss=0.09161, over 3884463.04 frames. ], batch size: 79, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:29:56,089 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 26 from LS+wenet, 18 from Vox, 50 fro AS 2024-08-13 19:30:06,850 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 28 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-13 19:30:13,007 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2281910.0, ans=0.125 2024-08-13 19:30:23,218 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.12 vs. limit=15.0 2024-08-13 19:30:25,843 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-13 19:30:41,878 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 22 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-13 19:30:47,889 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=2282110.0, ans=0.5 2024-08-13 19:31:16,132 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 10850, loss[loss=0.1064, beats_loss=0.01327, ecapa_loss=0.0001629, whisper_loss=0.09152, over 23051.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01075, ecapa_loss=0.0001618, whisper_loss=0.09201, over 3905278.43 frames. ], batch size: 95, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:31:16,615 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=2282310.0, ans=0.025 2024-08-13 19:31:28,920 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-13 19:31:30,853 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 35 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-13 19:31:38,780 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.143e+01 2.579e+01 2.775e+01 3.149e+01 7.029e+01, threshold=5.550e+01, percent-clipped=1.0 2024-08-13 19:31:47,837 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.80 vs. limit=22.5 2024-08-13 19:31:51,236 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2282510.0, ans=0.125 2024-08-13 19:32:03,280 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2282510.0, ans=0.125 2024-08-13 19:32:40,033 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 10900, loss[loss=0.08535, beats_loss=0.009873, ecapa_loss=0.0001997, whisper_loss=0.07348, over 16646.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01079, ecapa_loss=0.0001622, whisper_loss=0.0919, over 3947561.97 frames. ], batch size: 73, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:32:41,053 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 33 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-13 19:32:56,803 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2282910.0, ans=0.0 2024-08-13 19:32:56,849 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2282910.0, ans=0.2 2024-08-13 19:32:59,098 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2282910.0, ans=0.125 2024-08-13 19:33:03,132 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 18 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-13 19:33:07,886 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2282910.0, ans=0.0 2024-08-13 19:33:11,402 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2283010.0, ans=0.0 2024-08-13 19:33:46,353 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.77 vs. limit=15.0 2024-08-13 19:33:58,312 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2283210.0, ans=0.125 2024-08-13 19:34:00,882 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 10950, loss[loss=0.1016, beats_loss=0.01254, ecapa_loss=0.0001716, whisper_loss=0.0873, over 21060.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01084, ecapa_loss=0.0001623, whisper_loss=0.09149, over 3950201.15 frames. ], batch size: 90, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:34:01,689 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2283310.0, ans=0.0 2024-08-13 19:34:04,853 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.37 vs. limit=15.0 2024-08-13 19:34:23,427 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.358e+01 2.590e+01 2.815e+01 3.849e+01, threshold=5.179e+01, percent-clipped=0.0 2024-08-13 19:34:30,870 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 31 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-13 19:34:40,488 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.57 vs. limit=12.0 2024-08-13 19:34:47,181 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 20 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-13 19:35:22,613 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 11000, loss[loss=0.1109, beats_loss=0.01064, ecapa_loss=0.0001595, whisper_loss=0.09867, over 18787.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01078, ecapa_loss=0.0001637, whisper_loss=0.09167, over 3970240.95 frames. ], batch size: 77, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:35:42,491 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 25 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-13 19:35:43,891 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2283910.0, ans=0.125 2024-08-13 19:35:52,544 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 36 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-13 19:35:57,361 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2284010.0, ans=0.0 2024-08-13 19:36:28,795 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2284210.0, ans=0.125 2024-08-13 19:36:32,654 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.17 vs. limit=22.5 2024-08-13 19:36:35,507 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 17 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-13 19:36:45,133 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 11050, loss[loss=0.09658, beats_loss=0.01114, ecapa_loss=0.0001581, whisper_loss=0.08386, over 22901.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01068, ecapa_loss=0.000164, whisper_loss=0.09193, over 3943744.38 frames. ], batch size: 92, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:36:50,337 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.76 vs. limit=6.0 2024-08-13 19:37:02,791 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2284410.0, ans=0.0 2024-08-13 19:37:04,165 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2284410.0, ans=0.125 2024-08-13 19:37:08,163 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.114e+01 2.460e+01 2.684e+01 3.016e+01 4.539e+01, threshold=5.368e+01, percent-clipped=0.0 2024-08-13 19:37:33,176 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 24 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-13 19:37:37,239 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2284610.0, ans=0.125 2024-08-13 19:37:45,857 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.40 vs. limit=22.5 2024-08-13 19:38:07,383 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 11100, loss[loss=0.09324, beats_loss=0.01518, ecapa_loss=0.0001272, whisper_loss=0.07679, over 22795.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01075, ecapa_loss=0.0001629, whisper_loss=0.09151, over 3915812.18 frames. ], batch size: 90, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:38:21,727 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2284810.0, ans=0.0 2024-08-13 19:38:44,610 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 19 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-13 19:38:54,154 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2285010.0, ans=0.1 2024-08-13 19:38:57,459 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 23 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-13 19:39:10,470 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.19 vs. limit=15.0 2024-08-13 19:39:18,013 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 22 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-13 19:39:22,098 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 30 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-13 19:39:23,774 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2285210.0, ans=0.125 2024-08-13 19:39:28,666 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2285310.0, ans=0.125 2024-08-13 19:39:29,456 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 11150, loss[loss=0.1019, beats_loss=0.01183, ecapa_loss=0.0001248, whisper_loss=0.08881, over 19997.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01078, ecapa_loss=0.0001618, whisper_loss=0.09132, over 3908032.27 frames. ], batch size: 76, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:39:32,997 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 16 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-13 19:39:46,391 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 20 from LS+wenet, 24 from Vox, 19 fro AS 2024-08-13 19:39:51,094 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2285410.0, ans=0.2 2024-08-13 19:39:52,319 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.107e+01 2.378e+01 2.624e+01 2.890e+01 4.520e+01, threshold=5.247e+01, percent-clipped=0.0 2024-08-13 19:39:53,216 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2285410.0, ans=0.0 2024-08-13 19:39:59,563 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 20 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-13 19:40:15,734 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 16 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-13 19:40:19,298 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2285610.0, ans=0.035 2024-08-13 19:40:22,528 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 23 from LS+wenet, 12 from Vox, 37 fro AS 2024-08-13 19:40:22,828 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2285610.0, ans=0.2 2024-08-13 19:40:51,604 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 16 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-13 19:40:51,901 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2285810.0, ans=0.0 2024-08-13 19:40:52,829 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 11200, loss[loss=0.07684, beats_loss=0.01289, ecapa_loss=0.0001686, whisper_loss=0.06227, over 16811.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01075, ecapa_loss=0.0001622, whisper_loss=0.09149, over 3868218.69 frames. ], batch size: 67, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:40:56,989 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.30 vs. limit=15.0 2024-08-13 19:41:09,893 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 26 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-13 19:41:15,502 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2285910.0, ans=0.2 2024-08-13 19:41:22,071 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.91 vs. limit=12.0 2024-08-13 19:41:25,039 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 24 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-13 19:41:43,544 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2286110.0, ans=0.1 2024-08-13 19:41:55,990 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 24 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-13 19:42:02,475 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 20 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-13 19:42:10,618 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2286210.0, ans=0.125 2024-08-13 19:42:13,130 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 11250, loss[loss=0.1179, beats_loss=0.009382, ecapa_loss=0.0001287, whisper_loss=0.1072, over 16466.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01073, ecapa_loss=0.0001622, whisper_loss=0.09183, over 3892760.01 frames. ], batch size: 62, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:42:27,175 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 19 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-13 19:42:34,638 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.042e+01 2.479e+01 2.663e+01 3.017e+01 9.282e+01, threshold=5.327e+01, percent-clipped=2.0 2024-08-13 19:42:34,842 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 38 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-13 19:42:43,626 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2286510.0, ans=0.125 2024-08-13 19:42:55,929 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2286510.0, ans=0.0 2024-08-13 19:43:26,258 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2286710.0, ans=0.125 2024-08-13 19:43:26,403 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.89 vs. limit=15.0 2024-08-13 19:43:31,214 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 11300, loss[loss=0.1022, beats_loss=0.0123, ecapa_loss=0.0002033, whisper_loss=0.08791, over 20449.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01073, ecapa_loss=0.0001605, whisper_loss=0.09205, over 3885134.58 frames. ], batch size: 84, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:43:35,072 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.863e+01 2024-08-13 19:43:58,613 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.66 vs. limit=10.0 2024-08-13 19:44:09,783 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.57 vs. limit=22.5 2024-08-13 19:44:19,968 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-13 19:44:22,073 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.33 vs. limit=15.0 2024-08-13 19:44:23,328 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2287110.0, ans=0.1 2024-08-13 19:44:24,348 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2287110.0, ans=0.125 2024-08-13 19:44:27,211 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 23 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-13 19:44:27,555 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2287110.0, ans=0.125 2024-08-13 19:44:48,898 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2287210.0, ans=0.125 2024-08-13 19:44:53,130 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 11350, loss[loss=0.09518, beats_loss=0.01256, ecapa_loss=0.0001395, whisper_loss=0.08122, over 18343.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.0107, ecapa_loss=0.0001604, whisper_loss=0.09184, over 3892268.86 frames. ], batch size: 73, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:44:57,053 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=2287310.0, ans=0.025 2024-08-13 19:45:06,274 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 16 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-13 19:45:09,648 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.66 vs. limit=12.0 2024-08-13 19:45:15,599 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.993e+01 2.325e+01 2.679e+01 3.004e+01 6.399e+01, threshold=5.358e+01, percent-clipped=2.0 2024-08-13 19:45:44,288 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 28 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-13 19:45:54,592 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.63 vs. limit=10.0 2024-08-13 19:46:02,147 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 13 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-13 19:46:08,298 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2287710.0, ans=0.0 2024-08-13 19:46:14,164 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 11400, loss[loss=0.1223, beats_loss=0.01126, ecapa_loss=0.0001295, whisper_loss=0.1097, over 23695.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01077, ecapa_loss=0.0001607, whisper_loss=0.0919, over 3905857.96 frames. ], batch size: 92, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:46:22,764 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2287810.0, ans=0.1 2024-08-13 19:47:08,010 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2288110.0, ans=0.125 2024-08-13 19:47:18,609 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2288210.0, ans=0.0 2024-08-13 19:47:30,052 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2288210.0, ans=0.1 2024-08-13 19:47:31,544 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2288210.0, ans=0.125 2024-08-13 19:47:33,927 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 11450, loss[loss=0.117, beats_loss=0.008324, ecapa_loss=0.0001708, whisper_loss=0.107, over 16985.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01073, ecapa_loss=0.00016, whisper_loss=0.09257, over 3911145.57 frames. ], batch size: 68, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:47:37,749 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.80 vs. limit=15.0 2024-08-13 19:47:46,517 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 23 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-13 19:47:46,808 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2288310.0, ans=0.0 2024-08-13 19:47:49,339 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 25 from LS+wenet, 18 from Vox, 50 fro AS 2024-08-13 19:47:55,110 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.013e+01 2.447e+01 2.680e+01 2.957e+01 5.322e+01, threshold=5.359e+01, percent-clipped=0.0 2024-08-13 19:48:03,923 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2288510.0, ans=0.2 2024-08-13 19:48:06,405 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2288510.0, ans=0.1 2024-08-13 19:48:07,419 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 15 from LS+wenet, 25 from Vox, 17 fro AS 2024-08-13 19:48:15,547 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2288510.0, ans=0.125 2024-08-13 19:48:26,772 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 19 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-13 19:48:36,707 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-13 19:48:42,970 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2288710.0, ans=0.0 2024-08-13 19:48:48,929 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-13 19:48:51,798 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 11500, loss[loss=0.1213, beats_loss=0.01008, ecapa_loss=0.0001809, whisper_loss=0.1094, over 22544.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01072, ecapa_loss=0.000161, whisper_loss=0.09222, over 3940389.83 frames. ], batch size: 92, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:48:57,052 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 23 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-13 19:49:20,015 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2288910.0, ans=0.2 2024-08-13 19:49:21,126 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2288910.0, ans=0.125 2024-08-13 19:49:31,529 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-13 19:49:34,361 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 24 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-13 19:49:38,165 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2289110.0, ans=0.0 2024-08-13 19:49:43,186 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2289110.0, ans=0.2 2024-08-13 19:49:47,339 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2289110.0, ans=0.0 2024-08-13 19:49:47,386 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2289110.0, ans=0.1 2024-08-13 19:49:53,791 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2289110.0, ans=0.04949747468305833 2024-08-13 19:49:54,998 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2289210.0, ans=0.125 2024-08-13 19:49:56,914 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2289210.0, ans=0.125 2024-08-13 19:50:13,196 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 11550, loss[loss=0.1303, beats_loss=0.008059, ecapa_loss=0.0002038, whisper_loss=0.1202, over 18818.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01072, ecapa_loss=0.0001611, whisper_loss=0.09189, over 3932367.38 frames. ], batch size: 72, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:50:26,557 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 22 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-13 19:50:36,972 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.552e+01 2.829e+01 3.234e+01 6.675e+01, threshold=5.658e+01, percent-clipped=2.0 2024-08-13 19:50:46,997 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2289510.0, ans=0.1 2024-08-13 19:51:11,482 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 18 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-13 19:51:26,888 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2024-08-13 19:51:34,541 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2289810.0, ans=0.0 2024-08-13 19:51:35,566 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 11600, loss[loss=0.08495, beats_loss=0.01092, ecapa_loss=0.0001597, whisper_loss=0.07244, over 16002.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01073, ecapa_loss=0.0001608, whisper_loss=0.09199, over 3912267.23 frames. ], batch size: 68, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:51:35,719 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 27 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-13 19:52:27,510 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 27 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-13 19:52:33,389 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2290110.0, ans=0.125 2024-08-13 19:52:42,250 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 25 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-13 19:52:49,771 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.00 vs. limit=15.0 2024-08-13 19:52:51,061 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 23 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-13 19:52:59,411 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 11650, loss[loss=0.1144, beats_loss=0.01025, ecapa_loss=0.0001567, whisper_loss=0.1026, over 22554.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01076, ecapa_loss=0.0001606, whisper_loss=0.09153, over 3908167.72 frames. ], batch size: 91, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:53:21,933 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2290410.0, ans=0.125 2024-08-13 19:53:22,680 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.043e+01 2.408e+01 2.632e+01 2.967e+01 4.953e+01, threshold=5.264e+01, percent-clipped=0.0 2024-08-13 19:53:31,996 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2290510.0, ans=0.125 2024-08-13 19:53:40,297 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-13 19:53:49,421 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2290610.0, ans=0.0 2024-08-13 19:53:56,020 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.63 vs. limit=15.0 2024-08-13 19:54:06,657 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2290710.0, ans=0.125 2024-08-13 19:54:08,047 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 22 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-13 19:54:19,487 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2290710.0, ans=0.0 2024-08-13 19:54:23,579 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 11700, loss[loss=0.1026, beats_loss=0.01047, ecapa_loss=0.000135, whisper_loss=0.09078, over 16391.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01088, ecapa_loss=0.0001603, whisper_loss=0.09161, over 3933564.04 frames. ], batch size: 62, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:54:25,733 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 26 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-13 19:54:42,957 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2290910.0, ans=0.125 2024-08-13 19:54:48,093 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2290910.0, ans=0.0 2024-08-13 19:54:57,017 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 20 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-13 19:55:09,713 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 21 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-13 19:55:17,770 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2291110.0, ans=0.125 2024-08-13 19:55:29,484 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.94 vs. limit=6.0 2024-08-13 19:55:35,923 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 19:55:46,664 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 11750, loss[loss=0.1035, beats_loss=0.01186, ecapa_loss=0.0001639, whisper_loss=0.08998, over 19174.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.011, ecapa_loss=0.00016, whisper_loss=0.09089, over 3917807.40 frames. ], batch size: 81, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:55:49,513 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.85 vs. limit=15.0 2024-08-13 19:56:06,672 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 17 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-13 19:56:06,977 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2291410.0, ans=0.125 2024-08-13 19:56:11,473 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.921e+01 2.398e+01 2.617e+01 2.949e+01 4.150e+01, threshold=5.234e+01, percent-clipped=0.0 2024-08-13 19:56:23,505 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 14 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-13 19:56:37,142 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 22 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-13 19:56:41,571 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 32 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-13 19:56:45,117 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2291610.0, ans=0.1 2024-08-13 19:57:04,997 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 20 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-13 19:57:09,096 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 11800, loss[loss=0.09611, beats_loss=0.01254, ecapa_loss=0.0001954, whisper_loss=0.08162, over 14460.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01105, ecapa_loss=0.00016, whisper_loss=0.09039, over 3900417.40 frames. ], batch size: 63, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:58:06,879 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2292110.0, ans=0.2 2024-08-13 19:58:32,697 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 11850, loss[loss=0.09611, beats_loss=0.01159, ecapa_loss=0.0001337, whisper_loss=0.08318, over 16932.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01101, ecapa_loss=0.0001602, whisper_loss=0.09083, over 3876997.59 frames. ], batch size: 64, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:58:34,435 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 24 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-13 19:58:39,501 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.92 vs. limit=10.0 2024-08-13 19:58:41,394 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2292310.0, ans=0.125 2024-08-13 19:58:44,667 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.84 vs. limit=22.5 2024-08-13 19:58:45,343 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 29 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-13 19:58:53,364 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2292410.0, ans=0.1 2024-08-13 19:58:55,608 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.839e+01 2.456e+01 2.721e+01 2.965e+01 7.443e+01, threshold=5.443e+01, percent-clipped=2.0 2024-08-13 19:59:07,201 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.33 vs. limit=22.5 2024-08-13 19:59:17,140 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.47 vs. limit=22.5 2024-08-13 19:59:17,876 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 23 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-13 19:59:34,295 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.95 vs. limit=22.5 2024-08-13 19:59:39,069 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.59 vs. limit=22.5 2024-08-13 19:59:53,009 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 11900, loss[loss=0.1073, beats_loss=0.01097, ecapa_loss=0.0001563, whisper_loss=0.09474, over 22408.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01096, ecapa_loss=0.0001607, whisper_loss=0.09101, over 3892633.62 frames. ], batch size: 92, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:59:53,202 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 21 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-13 20:00:06,574 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2292810.0, ans=0.0 2024-08-13 20:00:24,735 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2293010.0, ans=0.0 2024-08-13 20:00:32,482 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 16 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-13 20:00:39,381 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.29 vs. limit=10.0 2024-08-13 20:01:03,034 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 28 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-13 20:01:13,031 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 11950, loss[loss=0.09778, beats_loss=0.01141, ecapa_loss=0.0001673, whisper_loss=0.0847, over 21855.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01092, ecapa_loss=0.000161, whisper_loss=0.09023, over 3838873.24 frames. ], batch size: 92, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:01:17,470 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 24 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-13 20:01:33,264 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2293410.0, ans=0.0 2024-08-13 20:01:35,778 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.982e+01 2.292e+01 2.621e+01 2.963e+01 5.710e+01, threshold=5.241e+01, percent-clipped=1.0 2024-08-13 20:01:36,327 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2293410.0, ans=0.125 2024-08-13 20:01:51,081 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2293510.0, ans=0.125 2024-08-13 20:01:52,676 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.17 vs. limit=15.0 2024-08-13 20:01:54,169 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2293510.0, ans=0.0 2024-08-13 20:02:10,597 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.71 vs. limit=10.0 2024-08-13 20:02:12,588 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 24 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-13 20:02:20,161 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2293710.0, ans=0.0 2024-08-13 20:02:20,686 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=12.15 vs. limit=15.0 2024-08-13 20:02:23,791 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2293710.0, ans=0.0 2024-08-13 20:02:28,727 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2293710.0, ans=0.0 2024-08-13 20:02:28,810 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2293710.0, ans=0.125 2024-08-13 20:02:32,097 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 12000, loss[loss=0.09085, beats_loss=0.01228, ecapa_loss=0.0001513, whisper_loss=0.07706, over 22594.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01093, ecapa_loss=0.000161, whisper_loss=0.08951, over 3818275.66 frames. ], batch size: 91, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:02:32,098 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-13 20:03:01,229 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.5265, 4.2923, 3.4379, 3.7522], device='cuda:1') 2024-08-13 20:03:12,724 INFO [train_multi_KD3.py:1149] (1/4) Epoch 16, validation on ASR_libri: loss=0.2535, beats_loss=0, ecapa_loss=0.0005542, whisper_loss=0.248, over 922467.00 frames. 2024-08-13 20:03:33,658 INFO [train_multi_KD3.py:1149] (1/4) Epoch 16, validation on SV_voxceleb1: loss=0.004415, beats_loss=0, ecapa_loss=0.0004415, whisper_loss=0, over 939242.00 frames. 2024-08-13 20:03:58,297 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.3582, 2.7956, 2.2327, 3.0420], device='cuda:1') 2024-08-13 20:05:21,748 INFO [train_multi_KD3.py:1149] (1/4) Epoch 16, validation on AT_audioset: loss=0.02371, beats_loss=0.02371, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 20:05:21,752 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 30639MB 2024-08-13 20:05:34,078 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 30 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-13 20:05:37,687 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2293910.0, ans=0.125 2024-08-13 20:05:57,580 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2294010.0, ans=0.2 2024-08-13 20:06:22,861 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 23 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-13 20:06:36,157 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-13 20:06:37,479 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 33 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-13 20:06:40,870 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-13 20:06:44,305 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 12050, loss[loss=0.1086, beats_loss=0.01186, ecapa_loss=0.0001538, whisper_loss=0.0952, over 21978.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01093, ecapa_loss=0.0001617, whisper_loss=0.08958, over 3819927.74 frames. ], batch size: 87, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:06:59,451 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 22 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-13 20:07:07,485 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.929e+01 2.498e+01 2.752e+01 3.060e+01 1.760e+02, threshold=5.504e+01, percent-clipped=2.0 2024-08-13 20:07:18,056 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-13 20:07:20,149 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.58 vs. limit=15.0 2024-08-13 20:07:24,619 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 23 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-13 20:07:27,247 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 25 from LS+wenet, 19 from Vox, 50 fro AS 2024-08-13 20:07:55,233 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2294710.0, ans=0.125 2024-08-13 20:07:56,999 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2294710.0, ans=0.125 2024-08-13 20:08:08,960 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 12100, loss[loss=0.1078, beats_loss=0.01144, ecapa_loss=0.0001409, whisper_loss=0.09495, over 21750.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01092, ecapa_loss=0.0001623, whisper_loss=0.08932, over 3805801.59 frames. ], batch size: 86, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:08:10,566 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 16 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-13 20:08:24,779 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 36 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-13 20:08:30,229 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2294910.0, ans=0.0 2024-08-13 20:08:33,290 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2294910.0, ans=0.2 2024-08-13 20:08:34,725 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2294910.0, ans=0.1 2024-08-13 20:08:37,534 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2294910.0, ans=0.0 2024-08-13 20:08:40,836 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.01 vs. limit=22.5 2024-08-13 20:08:53,145 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2295010.0, ans=0.1 2024-08-13 20:09:11,370 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2295110.0, ans=0.1 2024-08-13 20:09:26,326 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 18 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-13 20:09:29,297 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 12150, loss[loss=0.07232, beats_loss=0.01254, ecapa_loss=0.0001327, whisper_loss=0.05845, over 18778.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0109, ecapa_loss=0.0001619, whisper_loss=0.08925, over 3806703.90 frames. ], batch size: 77, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:09:45,189 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.02 vs. limit=15.0 2024-08-13 20:09:52,645 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.856e+01 2.320e+01 2.600e+01 2.810e+01 5.391e+01, threshold=5.201e+01, percent-clipped=0.0 2024-08-13 20:09:59,191 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2295410.0, ans=0.0 2024-08-13 20:10:14,006 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.911e+05 2024-08-13 20:10:18,443 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-13 20:10:29,581 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-13 20:10:29,826 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2295610.0, ans=0.0 2024-08-13 20:10:32,838 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 32 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-13 20:10:48,127 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 17 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-13 20:10:49,620 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 24 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-13 20:10:55,513 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 12200, loss[loss=0.1004, beats_loss=0.01099, ecapa_loss=0.0001377, whisper_loss=0.08799, over 21223.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01089, ecapa_loss=0.0001608, whisper_loss=0.08946, over 3826687.32 frames. ], batch size: 84, lr: 3.88e-03, grad_scale: 1.152921504606847e+18 2024-08-13 20:10:58,458 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 22 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-13 20:11:01,506 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.050e+01 2024-08-13 20:11:15,325 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-13 20:11:30,462 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 19 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-13 20:11:31,109 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2296010.0, ans=0.125 2024-08-13 20:11:33,710 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 19 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-13 20:11:55,085 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 19 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-13 20:12:10,888 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2296210.0, ans=0.125 2024-08-13 20:12:16,119 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2296310.0, ans=0.125 2024-08-13 20:12:16,966 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 12250, loss[loss=0.1052, beats_loss=0.01109, ecapa_loss=0.0001528, whisper_loss=0.09263, over 21386.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01086, ecapa_loss=0.0001612, whisper_loss=0.08989, over 3830981.50 frames. ], batch size: 84, lr: 3.88e-03, grad_scale: 1.152921504606847e+18 2024-08-13 20:12:20,376 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 26 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-13 20:12:38,783 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.422e+01 2.679e+01 2.912e+01 4.424e+01, threshold=5.358e+01, percent-clipped=0.0 2024-08-13 20:12:41,401 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2296410.0, ans=0.0 2024-08-13 20:12:46,541 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 22 from LS+wenet, 30 from Vox, 22 fro AS 2024-08-13 20:12:58,870 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 12 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-13 20:13:06,553 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2296610.0, ans=0.5 2024-08-13 20:13:18,718 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 15 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-13 20:13:31,929 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2296710.0, ans=0.125 2024-08-13 20:13:36,240 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 12300, loss[loss=0.1139, beats_loss=0.009775, ecapa_loss=0.0001644, whisper_loss=0.1024, over 22041.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01079, ecapa_loss=0.0001617, whisper_loss=0.09069, over 3842633.56 frames. ], batch size: 88, lr: 3.88e-03, grad_scale: 1.152921504606847e+18 2024-08-13 20:13:43,167 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.62 vs. limit=15.0 2024-08-13 20:13:55,454 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 34 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-13 20:14:42,340 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 29 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-13 20:14:43,606 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-13 20:14:45,191 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-13 20:14:55,177 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 12350, loss[loss=0.07571, beats_loss=0.01342, ecapa_loss=0.0001407, whisper_loss=0.06089, over 19430.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01088, ecapa_loss=0.0001622, whisper_loss=0.09019, over 3853049.23 frames. ], batch size: 81, lr: 3.87e-03, grad_scale: 1.152921504606847e+18 2024-08-13 20:15:12,295 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2297410.0, ans=0.2 2024-08-13 20:15:15,003 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 21 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-13 20:15:15,165 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2297410.0, ans=0.125 2024-08-13 20:15:18,217 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.796e+01 2.416e+01 2.631e+01 3.029e+01 4.449e+01, threshold=5.262e+01, percent-clipped=0.0 2024-08-13 20:15:31,586 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2297510.0, ans=0.1 2024-08-13 20:15:56,876 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-13 20:16:18,952 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 12400, loss[loss=0.1169, beats_loss=0.01019, ecapa_loss=0.0001811, whisper_loss=0.1049, over 17984.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01087, ecapa_loss=0.0001618, whisper_loss=0.08993, over 3839721.47 frames. ], batch size: 71, lr: 3.87e-03, grad_scale: 1.152921504606847e+18 2024-08-13 20:16:22,305 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 28 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-13 20:16:26,148 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2297810.0, ans=0.0 2024-08-13 20:16:29,190 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2297810.0, ans=0.125 2024-08-13 20:16:35,071 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2297910.0, ans=0.125 2024-08-13 20:17:10,554 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-13 20:17:17,287 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2298110.0, ans=0.125 2024-08-13 20:17:22,200 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 26 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-13 20:17:23,799 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 23 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-13 20:17:32,312 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2298210.0, ans=0.0 2024-08-13 20:17:36,443 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 12450, loss[loss=0.09332, beats_loss=0.01328, ecapa_loss=0.0001695, whisper_loss=0.07835, over 18684.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01086, ecapa_loss=0.0001618, whisper_loss=0.08993, over 3851249.72 frames. ], batch size: 80, lr: 3.87e-03, grad_scale: 1.152921504606847e+18 2024-08-13 20:17:38,712 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2298310.0, ans=0.0 2024-08-13 20:17:51,583 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-13 20:17:57,288 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.913e+01 2.445e+01 2.805e+01 3.307e+01 1.075e+02, threshold=5.611e+01, percent-clipped=1.0 2024-08-13 20:18:05,408 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2298510.0, ans=0.125 2024-08-13 20:18:12,300 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 21 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-13 20:18:19,720 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.46 vs. limit=10.0 2024-08-13 20:18:51,233 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2298710.0, ans=0.125 2024-08-13 20:18:53,366 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 12500, loss[loss=0.1101, beats_loss=0.01048, ecapa_loss=0.0001947, whisper_loss=0.09763, over 22522.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01078, ecapa_loss=0.0001618, whisper_loss=0.09049, over 3832271.32 frames. ], batch size: 94, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:19:02,051 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=2298810.0, ans=0.5 2024-08-13 20:19:08,670 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 33 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-13 20:19:13,431 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2298910.0, ans=0.125 2024-08-13 20:19:13,754 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.38 vs. limit=15.0 2024-08-13 20:19:18,827 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2298910.0, ans=0.125 2024-08-13 20:19:45,576 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 17 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-13 20:20:13,657 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 12550, loss[loss=0.1047, beats_loss=0.009491, ecapa_loss=0.0001802, whisper_loss=0.09338, over 20946.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01075, ecapa_loss=0.0001622, whisper_loss=0.09056, over 3863153.81 frames. ], batch size: 85, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:20:31,679 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.06 vs. limit=15.0 2024-08-13 20:20:37,942 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.461e+01 2.791e+01 3.123e+01 5.243e+01, threshold=5.581e+01, percent-clipped=0.0 2024-08-13 20:20:38,397 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2299410.0, ans=0.125 2024-08-13 20:20:43,121 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 17 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-13 20:20:55,999 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2299510.0, ans=0.0 2024-08-13 20:21:19,627 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2299710.0, ans=0.0 2024-08-13 20:21:32,914 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 12600, loss[loss=0.09914, beats_loss=0.01132, ecapa_loss=0.0001306, whisper_loss=0.08651, over 22933.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01076, ecapa_loss=0.0001626, whisper_loss=0.09111, over 3839127.13 frames. ], batch size: 89, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:22:07,821 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2300010.0, ans=0.04949747468305833 2024-08-13 20:22:12,567 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2300010.0, ans=0.125 2024-08-13 20:22:29,548 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2300110.0, ans=0.0 2024-08-13 20:22:30,489 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 16 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-13 20:22:38,719 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 27 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-13 20:22:50,734 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 12650, loss[loss=0.09572, beats_loss=0.009986, ecapa_loss=0.0002007, whisper_loss=0.08373, over 13940.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01091, ecapa_loss=0.0001622, whisper_loss=0.09057, over 3843076.10 frames. ], batch size: 60, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:22:59,939 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 35 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-13 20:23:13,514 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.379e+01 2.634e+01 2.946e+01 5.512e+01, threshold=5.269e+01, percent-clipped=0.0 2024-08-13 20:23:15,012 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 19 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-13 20:23:22,612 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2300510.0, ans=0.07 2024-08-13 20:23:35,679 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2300610.0, ans=0.0 2024-08-13 20:23:51,566 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2300710.0, ans=0.0 2024-08-13 20:23:59,952 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 26 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-13 20:24:07,552 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 12700, loss[loss=0.1071, beats_loss=0.01067, ecapa_loss=0.0001559, whisper_loss=0.09488, over 16588.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01085, ecapa_loss=0.000161, whisper_loss=0.09164, over 3843988.99 frames. ], batch size: 68, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:24:07,687 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 19 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-13 20:24:17,360 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 18 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-13 20:24:21,643 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.86 vs. limit=15.0 2024-08-13 20:24:31,979 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2300910.0, ans=0.125 2024-08-13 20:24:43,700 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 22 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-13 20:24:53,461 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.48 vs. limit=15.0 2024-08-13 20:25:12,614 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 16 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-13 20:25:21,003 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2301210.0, ans=0.125 2024-08-13 20:25:26,882 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 12750, loss[loss=0.1051, beats_loss=0.01184, ecapa_loss=0.0001613, whisper_loss=0.09167, over 22720.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01079, ecapa_loss=0.0001634, whisper_loss=0.09179, over 3828557.28 frames. ], batch size: 90, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:25:31,687 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2301310.0, ans=0.125 2024-08-13 20:25:50,291 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.027e+01 2.318e+01 2.587e+01 2.901e+01 2.435e+02, threshold=5.175e+01, percent-clipped=0.0 2024-08-13 20:25:57,142 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2301510.0, ans=0.0 2024-08-13 20:25:59,419 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=6.0 2024-08-13 20:26:00,485 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2301510.0, ans=0.125 2024-08-13 20:26:04,508 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 23 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-13 20:26:04,817 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2301510.0, ans=10.0 2024-08-13 20:26:19,310 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=2301610.0, ans=0.1 2024-08-13 20:26:31,268 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2301710.0, ans=0.1 2024-08-13 20:26:31,373 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2301710.0, ans=0.2 2024-08-13 20:26:31,564 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.65 vs. limit=12.0 2024-08-13 20:26:45,272 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 12800, loss[loss=0.112, beats_loss=0.01047, ecapa_loss=0.000178, whisper_loss=0.09975, over 17778.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01087, ecapa_loss=0.0001642, whisper_loss=0.09106, over 3817777.99 frames. ], batch size: 73, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:26:55,942 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2301810.0, ans=0.0 2024-08-13 20:26:57,364 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 19 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-13 20:27:02,100 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 28 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-13 20:27:19,203 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 14 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-13 20:27:40,227 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2302110.0, ans=0.0 2024-08-13 20:27:44,231 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 15 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-13 20:28:03,095 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 12850, loss[loss=0.087, beats_loss=0.01149, ecapa_loss=0.0002067, whisper_loss=0.07344, over 14169.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01092, ecapa_loss=0.0001633, whisper_loss=0.09094, over 3805281.89 frames. ], batch size: 63, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:28:26,589 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.352e+01 2.567e+01 2.932e+01 5.459e+01, threshold=5.134e+01, percent-clipped=2.0 2024-08-13 20:29:02,966 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2302710.0, ans=0.1 2024-08-13 20:29:20,477 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 12900, loss[loss=0.1073, beats_loss=0.01218, ecapa_loss=0.0001221, whisper_loss=0.09393, over 23410.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01091, ecapa_loss=0.0001637, whisper_loss=0.09048, over 3814935.20 frames. ], batch size: 90, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:29:20,893 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2302810.0, ans=0.125 2024-08-13 20:29:48,206 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2302910.0, ans=0.1 2024-08-13 20:29:49,504 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 22 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-13 20:30:01,071 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-13 20:30:12,769 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.91 vs. limit=15.0 2024-08-13 20:30:26,777 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.52 vs. limit=15.0 2024-08-13 20:30:39,619 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 12950, loss[loss=0.06698, beats_loss=0.01421, ecapa_loss=0.0001108, whisper_loss=0.05166, over 16169.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01085, ecapa_loss=0.0001633, whisper_loss=0.09049, over 3853247.05 frames. ], batch size: 63, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:30:41,456 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2303310.0, ans=0.125 2024-08-13 20:30:50,539 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 22 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-13 20:30:58,179 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 22 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-13 20:30:59,880 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2303410.0, ans=0.125 2024-08-13 20:31:01,903 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.283e+01 2.671e+01 2.992e+01 6.489e+01, threshold=5.342e+01, percent-clipped=3.0 2024-08-13 20:31:06,809 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.22 vs. limit=22.5 2024-08-13 20:31:09,347 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-13 20:31:19,264 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 22 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-13 20:31:22,851 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-13 20:31:28,034 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2303610.0, ans=0.125 2024-08-13 20:31:52,443 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2303710.0, ans=0.125 2024-08-13 20:31:59,094 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 13000, loss[loss=0.1156, beats_loss=0.01035, ecapa_loss=0.0001619, whisper_loss=0.1036, over 23024.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01082, ecapa_loss=0.0001648, whisper_loss=0.09111, over 3877893.67 frames. ], batch size: 89, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:32:13,904 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 20 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-13 20:32:16,226 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2303810.0, ans=0.125 2024-08-13 20:32:29,215 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 33 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-13 20:32:46,615 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2304010.0, ans=0.125 2024-08-13 20:32:53,555 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2304110.0, ans=0.125 2024-08-13 20:33:00,846 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=2304110.0, ans=0.5 2024-08-13 20:33:03,176 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2304110.0, ans=0.2 2024-08-13 20:33:04,717 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 20 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-13 20:33:09,243 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 37 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-13 20:33:11,105 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 23 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-13 20:33:15,591 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-13 20:33:15,880 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2304210.0, ans=0.2 2024-08-13 20:33:17,389 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2304210.0, ans=0.0 2024-08-13 20:33:23,569 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 13050, loss[loss=0.1125, beats_loss=0.009211, ecapa_loss=0.0001868, whisper_loss=0.1014, over 21370.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01072, ecapa_loss=0.0001651, whisper_loss=0.09137, over 3888982.97 frames. ], batch size: 86, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:33:32,682 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2304310.0, ans=0.125 2024-08-13 20:33:46,815 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 27 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-13 20:33:55,490 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.312e+01 2.609e+01 2.956e+01 5.975e+01, threshold=5.219e+01, percent-clipped=1.0 2024-08-13 20:34:08,335 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2304510.0, ans=0.125 2024-08-13 20:34:49,849 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.58 vs. limit=15.0 2024-08-13 20:35:02,499 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-13 20:35:16,077 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 13100, loss[loss=0.1388, beats_loss=0.006372, ecapa_loss=0.0001767, whisper_loss=0.1307, over 16338.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.0108, ecapa_loss=0.0001635, whisper_loss=0.0911, over 3908688.49 frames. ], batch size: 63, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:35:16,509 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2304810.0, ans=0.125 2024-08-13 20:35:16,527 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2304810.0, ans=0.125 2024-08-13 20:35:38,645 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2304910.0, ans=0.0 2024-08-13 20:36:11,901 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-13 20:36:12,167 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2305010.0, ans=0.2 2024-08-13 20:36:25,108 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.61 vs. limit=15.0 2024-08-13 20:36:27,940 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.35 vs. limit=22.5 2024-08-13 20:36:32,466 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-13 20:36:42,429 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=13.00 vs. limit=15.0 2024-08-13 20:37:02,505 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 13150, loss[loss=0.1052, beats_loss=0.0111, ecapa_loss=0.0001797, whisper_loss=0.0923, over 18047.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01077, ecapa_loss=0.0001628, whisper_loss=0.09082, over 3890649.48 frames. ], batch size: 73, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:37:19,407 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2305310.0, ans=0.04949747468305833 2024-08-13 20:37:37,970 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2305410.0, ans=0.125 2024-08-13 20:37:39,115 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.958e+01 2.460e+01 2.677e+01 2.995e+01 4.365e+01, threshold=5.353e+01, percent-clipped=0.0 2024-08-13 20:37:39,283 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 25 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-13 20:37:53,029 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 25 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-13 20:38:10,803 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 18 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-13 20:39:04,298 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 13200, loss[loss=0.08518, beats_loss=0.01351, ecapa_loss=0.0001227, whisper_loss=0.07044, over 21390.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01075, ecapa_loss=0.0001623, whisper_loss=0.09072, over 3854260.92 frames. ], batch size: 89, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:39:26,949 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.02 vs. limit=10.0 2024-08-13 20:39:32,972 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 24 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-13 20:39:37,405 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 34 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-13 20:40:08,985 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2306010.0, ans=0.125 2024-08-13 20:40:18,884 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 41 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-13 20:40:27,163 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2306110.0, ans=0.0 2024-08-13 20:40:41,093 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2306110.0, ans=0.125 2024-08-13 20:41:12,219 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 13250, loss[loss=0.1099, beats_loss=0.01111, ecapa_loss=0.0001409, whisper_loss=0.09735, over 23443.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01063, ecapa_loss=0.0001623, whisper_loss=0.09198, over 3874204.41 frames. ], batch size: 91, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:41:43,584 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 32 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-13 20:41:46,603 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2306410.0, ans=0.125 2024-08-13 20:41:48,135 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.969e+01 2.383e+01 2.626e+01 2.998e+01 4.392e+01, threshold=5.251e+01, percent-clipped=0.0 2024-08-13 20:42:19,849 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2306510.0, ans=0.07 2024-08-13 20:42:32,230 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2306610.0, ans=0.0 2024-08-13 20:42:39,779 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2306710.0, ans=0.2 2024-08-13 20:42:53,573 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 13300, loss[loss=0.1073, beats_loss=0.01263, ecapa_loss=0.0001211, whisper_loss=0.09342, over 23803.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01068, ecapa_loss=0.0001616, whisper_loss=0.09165, over 3822694.53 frames. ], batch size: 91, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:42:55,401 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2306810.0, ans=0.125 2024-08-13 20:43:01,655 WARNING [optim.py:496] (1/4) Scaling gradients by 0.08281490951776505, model_norm_threshold=52.51310729980469 2024-08-13 20:43:01,876 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.17, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.983e+04, grad_sumsq=6.983e+04, orig_rms_sq=1.000e+00 2024-08-13 20:43:09,632 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2306910.0, ans=0.125 2024-08-13 20:43:09,646 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2306910.0, ans=0.125 2024-08-13 20:43:21,765 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=2306910.0, ans=0.2 2024-08-13 20:43:30,962 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.07 vs. limit=15.0 2024-08-13 20:43:36,909 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2307010.0, ans=0.125 2024-08-13 20:44:13,895 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 13350, loss[loss=0.1205, beats_loss=0.01215, ecapa_loss=0.00013, whisper_loss=0.107, over 23714.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01068, ecapa_loss=0.0001616, whisper_loss=0.09161, over 3850633.69 frames. ], batch size: 91, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:44:26,382 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 29 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-13 20:44:38,200 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.043e+01 2.444e+01 2.749e+01 3.154e+01 6.341e+02, threshold=5.498e+01, percent-clipped=1.0 2024-08-13 20:44:39,023 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2307410.0, ans=0.125 2024-08-13 20:44:50,334 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 16 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-13 20:44:53,417 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2307510.0, ans=0.1 2024-08-13 20:45:01,704 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2307610.0, ans=0.0 2024-08-13 20:45:02,770 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 25 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-13 20:45:04,735 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.03 vs. limit=15.0 2024-08-13 20:45:17,672 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2307710.0, ans=0.0 2024-08-13 20:45:30,162 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2307710.0, ans=0.1 2024-08-13 20:45:34,188 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 13400, loss[loss=0.0918, beats_loss=0.01032, ecapa_loss=0.0001526, whisper_loss=0.07995, over 16577.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01071, ecapa_loss=0.000161, whisper_loss=0.0914, over 3840583.50 frames. ], batch size: 66, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:45:35,002 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2307810.0, ans=0.2 2024-08-13 20:45:49,491 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.91 vs. limit=15.0 2024-08-13 20:46:04,518 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2307910.0, ans=0.125 2024-08-13 20:46:12,060 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2308010.0, ans=0.1 2024-08-13 20:46:13,581 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 26 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-13 20:46:25,059 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 21 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-13 20:46:29,919 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 21 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-13 20:46:30,207 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.196e+01 2024-08-13 20:46:33,746 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.52 vs. limit=15.0 2024-08-13 20:46:50,636 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 20 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-13 20:46:54,599 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 13450, loss[loss=0.1087, beats_loss=0.009227, ecapa_loss=0.0001635, whisper_loss=0.09786, over 15596.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01069, ecapa_loss=0.0001615, whisper_loss=0.0913, over 3820050.53 frames. ], batch size: 58, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:47:02,743 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.609e-03 2024-08-13 20:47:17,874 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.358e+01 2.676e+01 3.282e+01 1.336e+02, threshold=5.353e+01, percent-clipped=2.0 2024-08-13 20:47:20,158 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-13 20:47:29,865 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 14 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-13 20:47:34,996 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2308510.0, ans=0.2 2024-08-13 20:47:36,414 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 18 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-13 20:48:02,362 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2308710.0, ans=0.125 2024-08-13 20:48:02,452 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2308710.0, ans=0.0 2024-08-13 20:48:02,456 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2308710.0, ans=0.0 2024-08-13 20:48:13,714 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 13500, loss[loss=0.0794, beats_loss=0.01583, ecapa_loss=0.0001397, whisper_loss=0.06218, over 20845.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01075, ecapa_loss=0.0001611, whisper_loss=0.09093, over 3849060.73 frames. ], batch size: 89, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:48:24,823 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.46 vs. limit=10.0 2024-08-13 20:48:27,519 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 33 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-13 20:48:32,139 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2308910.0, ans=0.0 2024-08-13 20:48:50,971 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2309010.0, ans=0.125 2024-08-13 20:48:54,811 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2309010.0, ans=0.125 2024-08-13 20:49:00,520 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2309010.0, ans=0.1 2024-08-13 20:49:05,133 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 23 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-13 20:49:11,886 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2309110.0, ans=0.125 2024-08-13 20:49:13,451 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2309110.0, ans=0.0 2024-08-13 20:49:15,574 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.36 vs. limit=10.0 2024-08-13 20:49:36,489 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 13550, loss[loss=0.09599, beats_loss=0.0129, ecapa_loss=0.000155, whisper_loss=0.08154, over 20887.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01081, ecapa_loss=0.0001604, whisper_loss=0.0905, over 3868331.49 frames. ], batch size: 85, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:49:50,027 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 24 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-13 20:49:58,455 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2309410.0, ans=0.125 2024-08-13 20:50:00,956 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.845e+01 2.439e+01 2.648e+01 3.076e+01 1.090e+02, threshold=5.296e+01, percent-clipped=1.0 2024-08-13 20:50:23,731 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2309610.0, ans=0.1 2024-08-13 20:50:32,081 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.66 vs. limit=15.0 2024-08-13 20:50:48,547 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 21 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-13 20:50:56,501 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 13600, loss[loss=0.08299, beats_loss=0.01092, ecapa_loss=0.0001686, whisper_loss=0.07038, over 19398.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01089, ecapa_loss=0.0001599, whisper_loss=0.09056, over 3885132.11 frames. ], batch size: 81, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:51:32,281 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2310010.0, ans=0.125 2024-08-13 20:51:33,121 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 20 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-13 20:51:33,335 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2310010.0, ans=0.1 2024-08-13 20:51:36,691 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2310010.0, ans=0.125 2024-08-13 20:51:44,491 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2310110.0, ans=0.2 2024-08-13 20:51:54,368 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2310110.0, ans=0.125 2024-08-13 20:51:58,863 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2310210.0, ans=0.0 2024-08-13 20:52:15,404 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 13650, loss[loss=0.09931, beats_loss=0.01064, ecapa_loss=0.0001655, whisper_loss=0.08701, over 18972.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01097, ecapa_loss=0.0001611, whisper_loss=0.0902, over 3878398.70 frames. ], batch size: 74, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:52:20,821 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2310310.0, ans=0.2 2024-08-13 20:52:37,727 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.402e+01 2.676e+01 3.034e+01 5.771e+01, threshold=5.352e+01, percent-clipped=1.0 2024-08-13 20:52:56,850 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2310510.0, ans=0.07 2024-08-13 20:53:07,179 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 17 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-13 20:53:30,196 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 13700, loss[loss=0.1057, beats_loss=0.00913, ecapa_loss=0.0001727, whisper_loss=0.09487, over 21698.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01097, ecapa_loss=0.000161, whisper_loss=0.09033, over 3871535.56 frames. ], batch size: 88, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:53:33,479 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-13 20:53:39,673 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2310810.0, ans=0.125 2024-08-13 20:53:51,094 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.48 vs. limit=22.5 2024-08-13 20:53:59,567 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2311010.0, ans=0.125 2024-08-13 20:54:00,865 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.577e+01 2024-08-13 20:54:15,786 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2311110.0, ans=0.0 2024-08-13 20:54:26,580 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 19 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-13 20:54:35,741 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.17 vs. limit=15.0 2024-08-13 20:54:43,313 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 13750, loss[loss=0.1303, beats_loss=0.009526, ecapa_loss=0.0001557, whisper_loss=0.1192, over 17432.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01088, ecapa_loss=0.0001612, whisper_loss=0.09065, over 3860979.97 frames. ], batch size: 66, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:54:49,359 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 23 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-13 20:54:56,738 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2311410.0, ans=0.125 2024-08-13 20:55:05,060 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.898e+01 2.299e+01 2.662e+01 2.929e+01 4.195e+01, threshold=5.323e+01, percent-clipped=0.0 2024-08-13 20:55:12,318 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2311510.0, ans=0.07 2024-08-13 20:55:34,036 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2311610.0, ans=0.1 2024-08-13 20:55:41,063 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2311710.0, ans=0.1 2024-08-13 20:55:45,483 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.94 vs. limit=6.0 2024-08-13 20:55:51,012 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2311710.0, ans=0.2 2024-08-13 20:55:57,181 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 13800, loss[loss=0.1121, beats_loss=0.009376, ecapa_loss=0.0002005, whisper_loss=0.1007, over 18350.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01091, ecapa_loss=0.0001597, whisper_loss=0.09051, over 3861734.78 frames. ], batch size: 75, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:56:11,187 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=2311910.0, ans=0.5 2024-08-13 20:56:16,360 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 16 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-13 20:56:39,947 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2312110.0, ans=0.125 2024-08-13 20:56:45,485 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2312110.0, ans=0.0 2024-08-13 20:56:56,177 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.99 vs. limit=12.0 2024-08-13 20:57:04,586 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 17 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-13 20:57:08,959 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 13850, loss[loss=0.1024, beats_loss=0.01071, ecapa_loss=0.0001794, whisper_loss=0.08987, over 20129.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01086, ecapa_loss=0.0001598, whisper_loss=0.09035, over 3838114.26 frames. ], batch size: 83, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:57:13,748 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2312310.0, ans=0.2 2024-08-13 20:57:14,111 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=2312310.0, ans=15.0 2024-08-13 20:57:29,309 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.47 vs. limit=15.0 2024-08-13 20:57:31,126 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.957e+01 2.385e+01 2.789e+01 3.325e+01 4.881e+01, threshold=5.578e+01, percent-clipped=0.0 2024-08-13 20:57:51,394 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 35 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-13 20:57:53,230 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2312610.0, ans=0.95 2024-08-13 20:57:58,811 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2312610.0, ans=0.1 2024-08-13 20:57:58,863 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2312610.0, ans=0.0 2024-08-13 20:58:01,231 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 26 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-13 20:58:08,474 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2312710.0, ans=0.2 2024-08-13 20:58:11,641 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2312710.0, ans=0.125 2024-08-13 20:58:11,744 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2312710.0, ans=0.0 2024-08-13 20:58:21,957 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 13900, loss[loss=0.1035, beats_loss=0.01118, ecapa_loss=0.0001842, whisper_loss=0.09044, over 17099.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01078, ecapa_loss=0.0001604, whisper_loss=0.09073, over 3825602.81 frames. ], batch size: 66, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:58:50,909 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.45 vs. limit=15.0 2024-08-13 20:58:59,187 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2313010.0, ans=0.0 2024-08-13 20:59:19,947 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.59 vs. limit=15.0 2024-08-13 20:59:22,562 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2313210.0, ans=0.125 2024-08-13 20:59:34,193 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 13950, loss[loss=0.1007, beats_loss=0.01138, ecapa_loss=0.0001806, whisper_loss=0.08752, over 15990.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01077, ecapa_loss=0.0001604, whisper_loss=0.09174, over 3847945.08 frames. ], batch size: 68, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:59:40,315 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2313310.0, ans=0.0 2024-08-13 20:59:46,447 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 14 from Vox, 46 fro AS 2024-08-13 20:59:52,703 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2313410.0, ans=0.2 2024-08-13 20:59:56,271 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.112e+01 2.456e+01 2.773e+01 3.183e+01 5.275e+01, threshold=5.547e+01, percent-clipped=0.0 2024-08-13 20:59:58,551 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2313410.0, ans=0.125 2024-08-13 21:00:11,541 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2313510.0, ans=0.125 2024-08-13 21:00:16,787 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2313510.0, ans=0.0 2024-08-13 21:00:17,837 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 34 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-13 21:00:20,923 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2313610.0, ans=0.125 2024-08-13 21:00:48,824 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 14000, loss[loss=0.1191, beats_loss=0.00884, ecapa_loss=0.0001589, whisper_loss=0.1087, over 21741.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01073, ecapa_loss=0.000159, whisper_loss=0.09177, over 3849930.23 frames. ], batch size: 84, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:00:52,532 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2313810.0, ans=0.125 2024-08-13 21:00:55,048 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 20 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-13 21:01:01,897 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2313910.0, ans=0.0 2024-08-13 21:01:02,941 INFO [train_multi_KD3.py:844] (1/4) A total of 96 cuts. 29 from LS+wenet, 18 from Vox, 49 fro AS 2024-08-13 21:01:03,180 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2313910.0, ans=0.2 2024-08-13 21:01:14,746 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2313910.0, ans=0.5 2024-08-13 21:01:34,465 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2314110.0, ans=0.0 2024-08-13 21:01:38,710 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 34 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-13 21:01:40,303 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 18 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-13 21:01:45,009 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 24 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-13 21:02:02,958 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 14050, loss[loss=0.1023, beats_loss=0.01254, ecapa_loss=0.0001361, whisper_loss=0.08836, over 22840.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01076, ecapa_loss=0.0001591, whisper_loss=0.09138, over 3835853.40 frames. ], batch size: 91, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:02:23,407 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.70 vs. limit=6.0 2024-08-13 21:02:24,087 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.943e+01 2.387e+01 2.626e+01 2.873e+01 4.572e+01, threshold=5.251e+01, percent-clipped=0.0 2024-08-13 21:02:44,710 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2314610.0, ans=0.1 2024-08-13 21:02:53,124 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-13 21:03:15,021 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 14100, loss[loss=0.1106, beats_loss=0.01122, ecapa_loss=0.0001449, whisper_loss=0.09798, over 23560.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01081, ecapa_loss=0.0001604, whisper_loss=0.09108, over 3847530.06 frames. ], batch size: 93, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:03:25,894 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2314810.0, ans=0.125 2024-08-13 21:03:30,565 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-13 21:03:37,222 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 16 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-13 21:03:53,718 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2315010.0, ans=0.0 2024-08-13 21:03:58,123 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2315110.0, ans=0.5 2024-08-13 21:04:01,702 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 16 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-13 21:04:04,704 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 39 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-13 21:04:12,473 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2315210.0, ans=0.125 2024-08-13 21:04:13,292 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 16 from LS+wenet, 8 from Vox, 43 fro AS 2024-08-13 21:04:22,156 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2315210.0, ans=0.125 2024-08-13 21:04:24,752 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2315210.0, ans=0.0 2024-08-13 21:04:26,862 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 14150, loss[loss=0.09613, beats_loss=0.01264, ecapa_loss=0.0001578, whisper_loss=0.08191, over 21396.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01078, ecapa_loss=0.000159, whisper_loss=0.09142, over 3811001.80 frames. ], batch size: 88, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:04:39,340 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.741e+01 2024-08-13 21:04:46,928 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2315410.0, ans=0.125 2024-08-13 21:04:49,562 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.891e+01 2.465e+01 2.633e+01 2.994e+01 4.985e+01, threshold=5.265e+01, percent-clipped=0.0 2024-08-13 21:05:36,826 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-13 21:05:38,582 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 23 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-13 21:05:41,760 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 14200, loss[loss=0.1107, beats_loss=0.01089, ecapa_loss=0.0001321, whisper_loss=0.09845, over 23580.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01079, ecapa_loss=0.0001582, whisper_loss=0.09129, over 3861945.98 frames. ], batch size: 91, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:05:42,206 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2315810.0, ans=0.0 2024-08-13 21:06:00,677 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 22 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-13 21:06:07,569 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 19 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-13 21:06:18,421 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2316010.0, ans=0.125 2024-08-13 21:06:24,843 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-13 21:06:31,714 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=2316110.0, ans=10.0 2024-08-13 21:06:33,959 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2316110.0, ans=10.0 2024-08-13 21:07:00,903 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 14250, loss[loss=0.1105, beats_loss=0.01053, ecapa_loss=0.0001862, whisper_loss=0.09816, over 20800.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01075, ecapa_loss=0.0001583, whisper_loss=0.09182, over 3869714.28 frames. ], batch size: 85, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:07:01,292 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-13 21:07:01,488 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2316310.0, ans=0.1 2024-08-13 21:07:05,842 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 16 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-13 21:07:09,197 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2316310.0, ans=0.125 2024-08-13 21:07:13,733 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2316310.0, ans=0.05 2024-08-13 21:07:24,716 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.814e+01 2.489e+01 2.737e+01 3.188e+01 4.877e+01, threshold=5.475e+01, percent-clipped=0.0 2024-08-13 21:07:41,588 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2316510.0, ans=0.125 2024-08-13 21:08:02,061 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 21 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-13 21:08:17,093 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 14300, loss[loss=0.07527, beats_loss=0.01518, ecapa_loss=9.046e-05, whisper_loss=0.05918, over 23393.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01082, ecapa_loss=0.0001577, whisper_loss=0.09132, over 3892637.41 frames. ], batch size: 91, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:08:52,033 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-13 21:08:53,471 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.30 vs. limit=15.0 2024-08-13 21:09:22,652 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2317210.0, ans=0.125 2024-08-13 21:09:33,237 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 14350, loss[loss=0.09796, beats_loss=0.01293, ecapa_loss=0.0001378, whisper_loss=0.08365, over 22698.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01082, ecapa_loss=0.0001589, whisper_loss=0.09146, over 3904564.22 frames. ], batch size: 91, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:09:39,508 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.23 vs. limit=10.0 2024-08-13 21:09:45,684 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2317310.0, ans=0.125 2024-08-13 21:09:50,432 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2317410.0, ans=0.125 2024-08-13 21:09:56,039 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.901e+01 2.382e+01 2.716e+01 3.017e+01 1.009e+02, threshold=5.432e+01, percent-clipped=2.0 2024-08-13 21:10:03,800 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2317510.0, ans=0.125 2024-08-13 21:10:18,460 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2317610.0, ans=0.2 2024-08-13 21:10:20,954 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 23 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-13 21:10:32,114 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.43 vs. limit=15.0 2024-08-13 21:10:44,345 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2317710.0, ans=0.125 2024-08-13 21:10:47,497 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2317710.0, ans=0.125 2024-08-13 21:10:48,685 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2317810.0, ans=0.125 2024-08-13 21:10:49,417 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 14400, loss[loss=0.1095, beats_loss=0.01408, ecapa_loss=0.000151, whisper_loss=0.09388, over 22302.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01078, ecapa_loss=0.0001597, whisper_loss=0.09155, over 3926487.27 frames. ], batch size: 91, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:10:52,008 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.79 vs. limit=12.0 2024-08-13 21:10:56,732 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.61 vs. limit=15.0 2024-08-13 21:10:59,577 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2317810.0, ans=0.0 2024-08-13 21:11:02,359 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2317810.0, ans=0.125 2024-08-13 21:11:08,926 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=2317910.0, ans=15.0 2024-08-13 21:11:14,323 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2317910.0, ans=0.0 2024-08-13 21:11:18,180 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2318010.0, ans=0.07 2024-08-13 21:11:40,157 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.99 vs. limit=15.0 2024-08-13 21:11:52,155 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2318210.0, ans=0.125 2024-08-13 21:11:53,376 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2318210.0, ans=0.05 2024-08-13 21:11:55,587 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2318210.0, ans=0.0 2024-08-13 21:12:04,869 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2318310.0, ans=0.125 2024-08-13 21:12:05,861 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 14450, loss[loss=0.09678, beats_loss=0.01061, ecapa_loss=0.0001715, whisper_loss=0.08445, over 20961.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0108, ecapa_loss=0.0001621, whisper_loss=0.09076, over 3909013.12 frames. ], batch size: 85, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:12:19,484 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 25 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-13 21:12:24,506 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2318410.0, ans=0.2 2024-08-13 21:12:28,336 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.918e+01 2.376e+01 2.680e+01 3.028e+01 6.046e+01, threshold=5.360e+01, percent-clipped=1.0 2024-08-13 21:12:35,771 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2318510.0, ans=0.0 2024-08-13 21:12:41,259 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.49 vs. limit=22.5 2024-08-13 21:12:57,128 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2318610.0, ans=0.0 2024-08-13 21:13:47,643 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 0, loss[loss=0.1012, beats_loss=0.01092, ecapa_loss=0.0001498, whisper_loss=0.08883, over 17545.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01092, ecapa_loss=0.0001498, whisper_loss=0.08883, over 17545.00 frames. ], batch size: 67, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:13:47,644 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-13 21:14:29,953 INFO [train_multi_KD3.py:1149] (1/4) Epoch 17, validation on ASR_libri: loss=0.2533, beats_loss=0, ecapa_loss=0.0005592, whisper_loss=0.2478, over 922467.00 frames. 2024-08-13 21:14:46,181 INFO [train_multi_KD3.py:1149] (1/4) Epoch 17, validation on SV_voxceleb1: loss=0.004509, beats_loss=0, ecapa_loss=0.0004509, whisper_loss=0, over 939242.00 frames. 2024-08-13 21:15:19,681 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.7808, 2.1434, 2.1535, 1.9568], device='cuda:1') 2024-08-13 21:16:46,345 INFO [train_multi_KD3.py:1149] (1/4) Epoch 17, validation on AT_audioset: loss=0.02361, beats_loss=0.02361, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 21:16:46,348 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 30639MB 2024-08-13 21:16:46,567 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 21 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-13 21:16:48,189 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2318730.0, ans=0.0 2024-08-13 21:17:01,136 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2318730.0, ans=0.125 2024-08-13 21:17:04,113 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2318730.0, ans=0.0 2024-08-13 21:17:28,790 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 15 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-13 21:18:07,109 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.24 vs. limit=12.0 2024-08-13 21:18:23,600 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-13 21:18:27,274 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2319030.0, ans=0.125 2024-08-13 21:18:42,066 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 35 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-13 21:18:45,374 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2319130.0, ans=0.125 2024-08-13 21:18:55,185 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2319130.0, ans=0.0 2024-08-13 21:18:58,532 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 50, loss[loss=0.1017, beats_loss=0.009921, ecapa_loss=0.0001677, whisper_loss=0.09007, over 17298.00 frames. ], tot_loss[loss=0.102, beats_loss=0.009684, ecapa_loss=0.0001687, whisper_loss=0.09065, over 897794.98 frames. ], batch size: 68, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:19:10,771 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.19 vs. limit=15.0 2024-08-13 21:19:14,457 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.79 vs. limit=15.0 2024-08-13 21:19:32,163 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2319330.0, ans=0.125 2024-08-13 21:19:54,543 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2319430.0, ans=0.035 2024-08-13 21:19:55,546 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.055e+01 2.701e+01 3.109e+01 3.430e+01 6.788e+01, threshold=6.217e+01, percent-clipped=2.0 2024-08-13 21:20:12,261 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 22 from LS+wenet, 26 from Vox, 25 fro AS 2024-08-13 21:20:20,375 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2319530.0, ans=0.0 2024-08-13 21:20:31,384 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 17 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-13 21:20:44,601 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.00 vs. limit=12.0 2024-08-13 21:20:45,196 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 18 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-13 21:20:57,404 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 30 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-13 21:20:57,546 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2319730.0, ans=0.125 2024-08-13 21:21:00,583 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 100, loss[loss=0.1204, beats_loss=0.01045, ecapa_loss=0.0001778, whisper_loss=0.1082, over 22268.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.00962, ecapa_loss=0.000167, whisper_loss=0.09059, over 1540757.82 frames. ], batch size: 87, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:21:26,758 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2319830.0, ans=0.125 2024-08-13 21:22:04,857 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2319930.0, ans=0.125 2024-08-13 21:22:12,479 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 22 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-13 21:22:16,540 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 19 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-13 21:22:24,184 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.551e-02 2024-08-13 21:22:26,652 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2320030.0, ans=0.125 2024-08-13 21:22:32,233 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0 2024-08-13 21:22:35,588 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 23 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-13 21:22:36,835 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 36 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-13 21:22:38,984 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 29 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-13 21:22:41,538 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2320130.0, ans=0.0 2024-08-13 21:22:52,632 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 150, loss[loss=0.1021, beats_loss=0.01253, ecapa_loss=0.00014, whisper_loss=0.08821, over 22802.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.009777, ecapa_loss=0.0001623, whisper_loss=0.09131, over 2050943.94 frames. ], batch size: 88, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:23:01,703 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 24 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-13 21:23:10,805 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 37 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-13 21:23:31,783 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.05 vs. limit=12.0 2024-08-13 21:23:32,374 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.074e+01 2.591e+01 2.910e+01 3.226e+01 4.259e+01, threshold=5.820e+01, percent-clipped=0.0 2024-08-13 21:23:34,170 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2320430.0, ans=0.2 2024-08-13 21:23:38,207 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2320430.0, ans=0.2 2024-08-13 21:23:45,982 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2320530.0, ans=0.1 2024-08-13 21:23:49,181 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 21 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-13 21:23:49,483 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2320530.0, ans=0.95 2024-08-13 21:24:03,907 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2320630.0, ans=0.1 2024-08-13 21:24:15,192 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2320730.0, ans=0.0 2024-08-13 21:24:16,136 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 200, loss[loss=0.09145, beats_loss=0.00899, ecapa_loss=0.0001512, whisper_loss=0.08095, over 22195.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.00995, ecapa_loss=0.0001614, whisper_loss=0.09129, over 2441319.18 frames. ], batch size: 88, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:24:19,249 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 25 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-13 21:24:20,945 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2320730.0, ans=0.125 2024-08-13 21:24:29,554 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2320730.0, ans=0.2 2024-08-13 21:24:35,656 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2320830.0, ans=0.125 2024-08-13 21:24:54,310 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.04 vs. limit=22.5 2024-08-13 21:25:05,156 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.07 vs. limit=22.5 2024-08-13 21:25:19,168 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2321030.0, ans=0.125 2024-08-13 21:25:31,316 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 30 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-13 21:25:38,710 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 250, loss[loss=0.1186, beats_loss=0.009867, ecapa_loss=0.0001406, whisper_loss=0.1074, over 20935.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01015, ecapa_loss=0.0001606, whisper_loss=0.0921, over 2785321.99 frames. ], batch size: 79, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:25:55,583 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2321330.0, ans=0.1 2024-08-13 21:25:58,291 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-13 21:26:07,385 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2321330.0, ans=0.0 2024-08-13 21:26:16,243 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 36 from LS+wenet, 29 from Vox, 23 fro AS 2024-08-13 21:26:17,151 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.792e+01 2.338e+01 2.625e+01 3.056e+01 3.496e+02, threshold=5.250e+01, percent-clipped=1.0 2024-08-13 21:26:42,813 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=2321530.0, ans=0.05 2024-08-13 21:26:55,921 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=2321630.0, ans=0.5 2024-08-13 21:27:02,168 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 300, loss[loss=0.1263, beats_loss=0.009761, ecapa_loss=0.0001493, whisper_loss=0.115, over 19862.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01027, ecapa_loss=0.0001612, whisper_loss=0.09196, over 3020479.88 frames. ], batch size: 73, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:27:04,154 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 25 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-13 21:27:21,480 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2321830.0, ans=0.0 2024-08-13 21:27:25,675 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.62 vs. limit=15.0 2024-08-13 21:27:44,192 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2321930.0, ans=0.0 2024-08-13 21:27:59,403 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 10 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-13 21:28:04,743 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 29 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-13 21:28:12,505 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 18 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-13 21:28:14,563 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=2322130.0, ans=0.95 2024-08-13 21:28:24,486 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 16 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-13 21:28:29,140 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 350, loss[loss=0.1054, beats_loss=0.01251, ecapa_loss=0.0001442, whisper_loss=0.09148, over 22696.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01042, ecapa_loss=0.0001614, whisper_loss=0.09136, over 3206904.49 frames. ], batch size: 91, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:28:51,730 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2322330.0, ans=0.1 2024-08-13 21:28:58,756 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.99 vs. limit=6.0 2024-08-13 21:29:08,367 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.409e+01 2.739e+01 3.112e+01 5.763e+01, threshold=5.479e+01, percent-clipped=3.0 2024-08-13 21:29:19,014 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 18 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-13 21:29:21,317 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.55 vs. limit=22.5 2024-08-13 21:29:23,463 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.42 vs. limit=15.0 2024-08-13 21:29:32,605 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2322530.0, ans=0.0 2024-08-13 21:29:42,391 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2322630.0, ans=0.0 2024-08-13 21:29:54,351 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-13 21:29:57,537 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 400, loss[loss=0.09962, beats_loss=0.01143, ecapa_loss=0.0001723, whisper_loss=0.08646, over 22040.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0105, ecapa_loss=0.0001621, whisper_loss=0.09103, over 3361096.64 frames. ], batch size: 88, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:30:06,201 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 21 from LS+wenet, 25 from Vox, 47 fro AS 2024-08-13 21:30:08,481 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-13 21:30:13,748 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 19 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-13 21:30:23,739 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 12 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-13 21:30:32,343 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 18 from LS+wenet, 27 from Vox, 14 fro AS 2024-08-13 21:30:32,601 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2322930.0, ans=0.1 2024-08-13 21:30:37,859 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 35 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-13 21:30:43,999 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2322930.0, ans=0.0 2024-08-13 21:30:45,870 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.00 vs. limit=12.0 2024-08-13 21:30:59,161 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.70 vs. limit=15.0 2024-08-13 21:31:00,186 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 15 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-13 21:31:09,755 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.80 vs. limit=15.0 2024-08-13 21:31:16,047 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2323130.0, ans=0.0 2024-08-13 21:31:25,631 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 450, loss[loss=0.09131, beats_loss=0.01132, ecapa_loss=0.000177, whisper_loss=0.07822, over 17993.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01055, ecapa_loss=0.0001617, whisper_loss=0.09001, over 3473348.22 frames. ], batch size: 73, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:31:33,506 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2323230.0, ans=0.015 2024-08-13 21:31:37,629 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 19 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-13 21:31:41,410 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.02 vs. limit=15.0 2024-08-13 21:32:05,262 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 34 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-13 21:32:06,911 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.385e+01 2.600e+01 2.999e+01 5.733e+01, threshold=5.200e+01, percent-clipped=1.0 2024-08-13 21:32:07,061 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-13 21:32:15,390 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2323430.0, ans=0.125 2024-08-13 21:32:15,403 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2323430.0, ans=0.125 2024-08-13 21:32:19,990 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 30 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-13 21:32:21,759 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 18 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-13 21:32:28,957 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.28 vs. limit=10.0 2024-08-13 21:32:39,923 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2323630.0, ans=0.2 2024-08-13 21:32:49,395 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 26 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-13 21:32:52,058 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 500, loss[loss=0.09142, beats_loss=0.01057, ecapa_loss=0.0001501, whisper_loss=0.07935, over 22006.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01053, ecapa_loss=0.0001629, whisper_loss=0.08948, over 3554823.92 frames. ], batch size: 87, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:32:53,892 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-13 21:33:10,145 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 23 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-13 21:33:10,332 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2323830.0, ans=0.125 2024-08-13 21:33:16,541 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 34 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-13 21:33:19,028 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.32 vs. limit=6.0 2024-08-13 21:33:57,650 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2324130.0, ans=0.125 2024-08-13 21:34:05,930 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2324130.0, ans=0.125 2024-08-13 21:34:07,624 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 29 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-13 21:34:13,713 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 550, loss[loss=0.1062, beats_loss=0.01024, ecapa_loss=0.0001389, whisper_loss=0.09454, over 18291.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01056, ecapa_loss=0.0001626, whisper_loss=0.08971, over 3630216.34 frames. ], batch size: 69, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:34:20,425 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2324230.0, ans=0.125 2024-08-13 21:34:23,488 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2324230.0, ans=0.5 2024-08-13 21:34:39,637 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2324330.0, ans=0.125 2024-08-13 21:34:44,645 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 36 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-13 21:34:48,240 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=5.236e-01 2024-08-13 21:34:48,279 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2324430.0, ans=0.2 2024-08-13 21:34:51,303 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.730e+01 2.278e+01 2.510e+01 2.744e+01 4.092e+01, threshold=5.020e+01, percent-clipped=0.0 2024-08-13 21:34:51,439 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-13 21:35:31,583 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 600, loss[loss=0.1171, beats_loss=0.01077, ecapa_loss=0.0001908, whisper_loss=0.1045, over 19768.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01053, ecapa_loss=0.0001622, whisper_loss=0.09068, over 3686952.55 frames. ], batch size: 81, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:35:39,464 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2324730.0, ans=0.1 2024-08-13 21:35:43,478 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 31 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-13 21:35:54,402 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2324830.0, ans=0.1 2024-08-13 21:35:54,758 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.42 vs. limit=15.0 2024-08-13 21:36:08,709 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 22 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-13 21:36:09,262 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.62 vs. limit=22.5 2024-08-13 21:36:14,182 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 36 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-13 21:36:16,734 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2325030.0, ans=0.0 2024-08-13 21:36:27,250 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2325130.0, ans=0.0 2024-08-13 21:36:28,261 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 22 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-13 21:36:29,008 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=14.96 vs. limit=15.0 2024-08-13 21:36:34,798 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 26 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-13 21:36:38,417 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 650, loss[loss=0.1029, beats_loss=0.01404, ecapa_loss=0.0001459, whisper_loss=0.08745, over 19103.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01053, ecapa_loss=0.0001624, whisper_loss=0.0912, over 3735662.37 frames. ], batch size: 78, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:36:52,126 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.96 vs. limit=22.5 2024-08-13 21:37:02,069 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2325330.0, ans=0.125 2024-08-13 21:37:09,460 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.961e+01 2.389e+01 2.704e+01 3.013e+01 8.978e+01, threshold=5.408e+01, percent-clipped=2.0 2024-08-13 21:37:13,593 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2325430.0, ans=0.1 2024-08-13 21:37:35,928 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-13 21:37:42,561 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=2325730.0, ans=0.2 2024-08-13 21:37:43,676 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 700, loss[loss=0.1101, beats_loss=0.007568, ecapa_loss=0.0001748, whisper_loss=0.1008, over 18706.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01057, ecapa_loss=0.0001611, whisper_loss=0.09117, over 3772131.74 frames. ], batch size: 71, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:38:06,405 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2325830.0, ans=0.125 2024-08-13 21:38:18,984 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 27 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-13 21:38:27,513 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.39 vs. limit=15.0 2024-08-13 21:38:35,678 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 15 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-13 21:38:47,761 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 19 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-13 21:38:48,950 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 750, loss[loss=0.09492, beats_loss=0.01139, ecapa_loss=0.0001324, whisper_loss=0.08221, over 16350.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01059, ecapa_loss=0.00016, whisper_loss=0.09042, over 3778205.63 frames. ], batch size: 64, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:38:54,247 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 20 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-13 21:39:12,893 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.90 vs. limit=15.0 2024-08-13 21:39:19,871 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.349e+01 2.525e+01 2.805e+01 4.000e+01, threshold=5.049e+01, percent-clipped=0.0 2024-08-13 21:39:25,562 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2326430.0, ans=0.1 2024-08-13 21:39:33,716 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2326530.0, ans=0.0 2024-08-13 21:39:54,219 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 800, loss[loss=0.08991, beats_loss=0.0104, ecapa_loss=0.0001803, whisper_loss=0.07771, over 17524.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01061, ecapa_loss=0.0001582, whisper_loss=0.09022, over 3781255.72 frames. ], batch size: 71, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:39:58,644 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2326730.0, ans=0.5 2024-08-13 21:40:01,400 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.32 vs. limit=22.5 2024-08-13 21:40:05,252 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.23 vs. limit=15.0 2024-08-13 21:40:06,415 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.06 vs. limit=15.0 2024-08-13 21:40:29,536 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2326930.0, ans=0.125 2024-08-13 21:40:30,842 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 24 from LS+wenet, 18 from Vox, 51 fro AS 2024-08-13 21:40:35,744 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 18 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-13 21:40:40,098 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2327030.0, ans=0.0 2024-08-13 21:40:50,872 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.17 vs. limit=22.5 2024-08-13 21:40:53,035 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2327130.0, ans=0.2 2024-08-13 21:40:59,055 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 850, loss[loss=0.09495, beats_loss=0.01241, ecapa_loss=0.0001445, whisper_loss=0.08109, over 15506.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01059, ecapa_loss=0.0001577, whisper_loss=0.0901, over 3750320.40 frames. ], batch size: 63, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:41:13,758 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 25 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-13 21:41:23,160 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2327330.0, ans=0.0 2024-08-13 21:41:30,582 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.619e+01 2.387e+01 2.596e+01 3.123e+01 5.757e+01, threshold=5.192e+01, percent-clipped=1.0 2024-08-13 21:41:38,541 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 16 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-13 21:42:02,465 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2327630.0, ans=0.0 2024-08-13 21:42:04,918 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 900, loss[loss=0.09188, beats_loss=0.0112, ecapa_loss=0.0001442, whisper_loss=0.07924, over 20370.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01054, ecapa_loss=0.000157, whisper_loss=0.09042, over 3794015.72 frames. ], batch size: 78, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:42:06,608 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2327730.0, ans=0.0 2024-08-13 21:42:15,532 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 21 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-13 21:42:24,835 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2327830.0, ans=0.05 2024-08-13 21:42:30,877 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 13 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-13 21:42:39,131 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.555e+01 2024-08-13 21:42:47,726 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 22 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-13 21:42:47,999 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=2328030.0, ans=0.05 2024-08-13 21:43:04,719 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2328130.0, ans=0.125 2024-08-13 21:43:09,704 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 950, loss[loss=0.08869, beats_loss=0.01222, ecapa_loss=0.0001689, whisper_loss=0.07478, over 15312.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01053, ecapa_loss=0.0001572, whisper_loss=0.09008, over 3771080.29 frames. ], batch size: 63, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:43:10,171 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2328230.0, ans=0.125 2024-08-13 21:43:15,290 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 22 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-13 21:43:41,097 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.841e+01 2.367e+01 2.632e+01 3.016e+01 5.732e+01, threshold=5.263e+01, percent-clipped=3.0 2024-08-13 21:44:12,496 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 34 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-13 21:44:15,219 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 1000, loss[loss=0.1067, beats_loss=0.01214, ecapa_loss=0.0001223, whisper_loss=0.09333, over 19444.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01057, ecapa_loss=0.0001564, whisper_loss=0.09071, over 3774512.56 frames. ], batch size: 74, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:44:16,941 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2328730.0, ans=0.1 2024-08-13 21:44:19,256 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 35 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-13 21:44:36,741 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2328830.0, ans=0.125 2024-08-13 21:44:42,119 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2328930.0, ans=0.125 2024-08-13 21:44:53,446 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 20 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-13 21:45:01,240 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 20 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-13 21:45:03,458 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.65 vs. limit=5.0 2024-08-13 21:45:04,126 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2329030.0, ans=0.0 2024-08-13 21:45:10,396 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2329130.0, ans=0.125 2024-08-13 21:45:20,989 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 1050, loss[loss=0.08512, beats_loss=0.01001, ecapa_loss=0.0001786, whisper_loss=0.07332, over 14866.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01064, ecapa_loss=0.0001561, whisper_loss=0.08999, over 3763230.07 frames. ], batch size: 62, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:45:26,357 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2329230.0, ans=0.2 2024-08-13 21:45:28,598 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 34 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-13 21:45:42,377 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2329330.0, ans=0.0 2024-08-13 21:45:42,464 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2329330.0, ans=0.125 2024-08-13 21:45:49,767 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 27 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-13 21:45:52,007 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.865e+01 2.420e+01 2.664e+01 2.978e+01 4.899e+01, threshold=5.328e+01, percent-clipped=0.0 2024-08-13 21:46:23,745 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 17 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-13 21:46:26,198 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 1100, loss[loss=0.1176, beats_loss=0.01197, ecapa_loss=0.0001243, whisper_loss=0.1044, over 23360.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01056, ecapa_loss=0.0001555, whisper_loss=0.09069, over 3789381.23 frames. ], batch size: 93, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:46:37,287 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.55 vs. limit=15.0 2024-08-13 21:46:47,943 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2329830.0, ans=0.1 2024-08-13 21:46:48,894 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 19 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-13 21:46:58,245 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2329930.0, ans=0.0 2024-08-13 21:46:59,571 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2329930.0, ans=0.025 2024-08-13 21:47:02,161 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 29 from LS+wenet, 12 from Vox, 20 fro AS 2024-08-13 21:47:02,383 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2329930.0, ans=0.0 2024-08-13 21:47:10,357 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2330030.0, ans=0.5 2024-08-13 21:47:24,868 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2330130.0, ans=0.0 2024-08-13 21:47:32,340 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 1150, loss[loss=0.06077, beats_loss=0.01306, ecapa_loss=0.0001718, whisper_loss=0.04599, over 17266.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01066, ecapa_loss=0.0001556, whisper_loss=0.09049, over 3776988.59 frames. ], batch size: 73, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:47:44,586 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2330330.0, ans=0.125 2024-08-13 21:47:44,631 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2330330.0, ans=0.125 2024-08-13 21:47:47,097 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 15 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-13 21:47:53,750 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 16 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-13 21:47:59,640 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=15.34 vs. limit=15.0 2024-08-13 21:48:01,480 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2330430.0, ans=0.0 2024-08-13 21:48:03,678 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.427e+01 2.716e+01 3.117e+01 1.034e+02, threshold=5.432e+01, percent-clipped=2.0 2024-08-13 21:48:34,157 INFO [train_multi_KD3.py:844] (1/4) A total of 98 cuts. 25 from LS+wenet, 30 from Vox, 43 fro AS 2024-08-13 21:48:34,772 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.73 vs. limit=10.0 2024-08-13 21:48:37,920 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 1200, loss[loss=0.08683, beats_loss=0.01235, ecapa_loss=0.0001325, whisper_loss=0.07316, over 14434.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01074, ecapa_loss=0.0001559, whisper_loss=0.08964, over 3748956.00 frames. ], batch size: 56, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:48:47,606 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2330730.0, ans=0.1 2024-08-13 21:49:05,974 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 31 from LS+wenet, 32 from Vox, 31 fro AS 2024-08-13 21:49:08,517 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 17 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-13 21:49:23,054 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.48 vs. limit=15.0 2024-08-13 21:49:31,180 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.15 vs. limit=22.5 2024-08-13 21:49:34,456 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2331130.0, ans=0.1 2024-08-13 21:49:43,295 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 1250, loss[loss=0.09522, beats_loss=0.01232, ecapa_loss=0.0001368, whisper_loss=0.08153, over 22523.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01078, ecapa_loss=0.0001552, whisper_loss=0.08999, over 3789488.72 frames. ], batch size: 88, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:49:45,861 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 29 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-13 21:49:56,177 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 17 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-13 21:49:58,516 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 25 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-13 21:50:14,256 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.232e+01 2.491e+01 2.766e+01 6.956e+01, threshold=4.983e+01, percent-clipped=1.0 2024-08-13 21:50:14,459 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 30 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-13 21:50:17,267 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2331430.0, ans=0.2 2024-08-13 21:50:19,677 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2331430.0, ans=0.0 2024-08-13 21:50:23,889 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2331530.0, ans=0.125 2024-08-13 21:50:36,644 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 27 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-13 21:50:43,165 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 26 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-13 21:50:44,514 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 28 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-13 21:50:48,660 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 1300, loss[loss=0.09963, beats_loss=0.01235, ecapa_loss=0.0001507, whisper_loss=0.08577, over 17375.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01078, ecapa_loss=0.0001561, whisper_loss=0.08949, over 3776990.76 frames. ], batch size: 69, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:50:53,114 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2331730.0, ans=0.0 2024-08-13 21:51:02,009 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2331830.0, ans=0.0 2024-08-13 21:51:39,316 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2332030.0, ans=0.1 2024-08-13 21:51:43,224 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2332130.0, ans=0.2 2024-08-13 21:51:47,603 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2332130.0, ans=0.0 2024-08-13 21:51:54,816 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-13 21:51:56,375 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 1350, loss[loss=0.1051, beats_loss=0.01247, ecapa_loss=0.0001534, whisper_loss=0.09112, over 22623.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01081, ecapa_loss=0.000155, whisper_loss=0.08994, over 3791963.33 frames. ], batch size: 89, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:51:59,486 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2332230.0, ans=0.125 2024-08-13 21:52:04,013 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2332230.0, ans=0.125 2024-08-13 21:52:22,571 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2332330.0, ans=0.07 2024-08-13 21:52:27,805 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 31 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-13 21:52:30,692 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.910e+01 2.343e+01 2.685e+01 2.934e+01 4.089e+01, threshold=5.369e+01, percent-clipped=0.0 2024-08-13 21:52:31,689 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2332430.0, ans=0.2 2024-08-13 21:52:32,666 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 17 from LS+wenet, 22 from Vox, 18 fro AS 2024-08-13 21:52:39,525 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2332530.0, ans=0.125 2024-08-13 21:52:43,604 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.54 vs. limit=10.0 2024-08-13 21:53:10,592 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 1400, loss[loss=0.0906, beats_loss=0.01229, ecapa_loss=0.0001215, whisper_loss=0.0771, over 20305.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01074, ecapa_loss=0.0001542, whisper_loss=0.09017, over 3771523.35 frames. ], batch size: 79, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:53:14,080 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2332730.0, ans=0.125 2024-08-13 21:53:25,267 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 17 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-13 21:53:28,119 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2332830.0, ans=0.2 2024-08-13 21:53:33,832 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 36 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-13 21:53:36,917 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 38 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-13 21:53:41,243 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2332930.0, ans=0.0 2024-08-13 21:53:45,054 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 18 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-13 21:54:08,023 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 38 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-13 21:54:14,594 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2333130.0, ans=0.1 2024-08-13 21:54:24,786 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 1450, loss[loss=0.1146, beats_loss=0.006467, ecapa_loss=0.0002066, whisper_loss=0.1061, over 13748.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01077, ecapa_loss=0.0001538, whisper_loss=0.09037, over 3807785.54 frames. ], batch size: 57, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:55:01,678 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 19 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-13 21:55:21,930 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.338e+01 2.604e+01 2.874e+01 4.710e+01, threshold=5.208e+01, percent-clipped=0.0 2024-08-13 21:55:22,412 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2333430.0, ans=0.1 2024-08-13 21:55:26,507 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.49 vs. limit=10.0 2024-08-13 21:55:36,429 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2333530.0, ans=0.1 2024-08-13 21:55:44,102 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 21 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-13 21:55:44,818 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.72 vs. limit=6.0 2024-08-13 21:56:00,012 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2333730.0, ans=0.125 2024-08-13 21:56:01,152 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 1500, loss[loss=0.1054, beats_loss=0.01082, ecapa_loss=0.0001597, whisper_loss=0.09299, over 20262.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01087, ecapa_loss=0.000153, whisper_loss=0.08949, over 3784451.96 frames. ], batch size: 81, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:56:01,942 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.35 vs. limit=22.5 2024-08-13 21:56:05,819 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2333730.0, ans=0.0 2024-08-13 21:56:19,142 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2333830.0, ans=0.2 2024-08-13 21:56:47,181 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2334030.0, ans=0.0 2024-08-13 21:56:52,643 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 19 from LS+wenet, 28 from Vox, 45 fro AS 2024-08-13 21:56:55,697 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2334030.0, ans=0.0 2024-08-13 21:56:55,806 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2334030.0, ans=0.125 2024-08-13 21:57:09,814 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2334130.0, ans=0.125 2024-08-13 21:57:14,640 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 1550, loss[loss=0.1193, beats_loss=0.01014, ecapa_loss=0.0001546, whisper_loss=0.1077, over 21344.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01092, ecapa_loss=0.0001527, whisper_loss=0.08937, over 3826412.27 frames. ], batch size: 87, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:57:18,585 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 21 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-13 21:57:20,754 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.46 vs. limit=6.0 2024-08-13 21:57:22,096 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2334230.0, ans=0.125 2024-08-13 21:57:27,609 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.79 vs. limit=15.0 2024-08-13 21:57:31,874 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 24 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-13 21:57:41,367 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2334330.0, ans=0.125 2024-08-13 21:57:51,451 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.314e+01 2.571e+01 2.868e+01 3.932e+01, threshold=5.142e+01, percent-clipped=0.0 2024-08-13 21:57:54,626 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2334430.0, ans=0.125 2024-08-13 21:58:03,953 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.41 vs. limit=15.0 2024-08-13 21:58:27,202 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 19 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-13 21:58:29,409 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.09 vs. limit=15.0 2024-08-13 21:58:29,846 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 1600, loss[loss=0.105, beats_loss=0.008986, ecapa_loss=0.0001413, whisper_loss=0.09461, over 17344.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01081, ecapa_loss=0.0001531, whisper_loss=0.09004, over 3830195.18 frames. ], batch size: 64, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:59:00,157 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2334930.0, ans=0.125 2024-08-13 21:59:09,385 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2334930.0, ans=0.2 2024-08-13 21:59:13,443 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2335030.0, ans=0.125 2024-08-13 21:59:14,463 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-13 21:59:16,067 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2335030.0, ans=0.125 2024-08-13 21:59:26,632 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 33 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-13 21:59:41,593 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 1650, loss[loss=0.1056, beats_loss=0.009078, ecapa_loss=0.0001811, whisper_loss=0.0947, over 21322.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01076, ecapa_loss=0.0001543, whisper_loss=0.09048, over 3830601.11 frames. ], batch size: 85, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:59:50,312 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 24 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-13 21:59:52,694 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2335230.0, ans=0.0 2024-08-13 21:59:53,936 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2335230.0, ans=0.125 2024-08-13 21:59:55,166 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 26 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-13 22:00:05,705 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2335330.0, ans=0.125 2024-08-13 22:00:15,684 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.351e+01 2.606e+01 2.894e+01 4.343e+01, threshold=5.211e+01, percent-clipped=0.0 2024-08-13 22:00:34,925 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 22 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-13 22:00:41,649 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 23 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-13 22:00:41,970 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2335630.0, ans=0.0 2024-08-13 22:00:44,250 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 29 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-13 22:00:48,243 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-13 22:00:52,860 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 1700, loss[loss=0.1108, beats_loss=0.01027, ecapa_loss=0.0001482, whisper_loss=0.09901, over 22441.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01069, ecapa_loss=0.000154, whisper_loss=0.09032, over 3818682.94 frames. ], batch size: 88, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:01:00,107 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2335730.0, ans=0.0 2024-08-13 22:01:19,821 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.360e-02 2024-08-13 22:01:30,570 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 12 from LS+wenet, 24 from Vox, 19 fro AS 2024-08-13 22:01:37,619 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=2336030.0, ans=15.0 2024-08-13 22:01:40,542 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 31 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-13 22:01:42,254 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2336030.0, ans=0.1 2024-08-13 22:01:50,322 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2336130.0, ans=0.2 2024-08-13 22:01:50,402 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=5.762e+00 2024-08-13 22:01:58,698 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2336130.0, ans=0.04949747468305833 2024-08-13 22:02:01,912 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2336230.0, ans=0.125 2024-08-13 22:02:02,561 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 1750, loss[loss=0.101, beats_loss=0.009488, ecapa_loss=0.0001688, whisper_loss=0.08983, over 20042.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01062, ecapa_loss=0.0001556, whisper_loss=0.09041, over 3839802.71 frames. ], batch size: 82, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:02:12,149 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2336230.0, ans=0.0 2024-08-13 22:02:13,045 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 19 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-13 22:02:24,698 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=2336330.0, ans=10.0 2024-08-13 22:02:24,731 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2336330.0, ans=0.1 2024-08-13 22:02:35,024 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.877e+01 2.335e+01 2.606e+01 3.098e+01 1.901e+02, threshold=5.212e+01, percent-clipped=3.0 2024-08-13 22:02:38,963 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.94 vs. limit=22.5 2024-08-13 22:02:42,298 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-13 22:02:43,654 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 25 from LS+wenet, 18 from Vox, 49 fro AS 2024-08-13 22:02:51,315 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2336530.0, ans=0.0 2024-08-13 22:02:55,035 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 14 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-13 22:03:03,106 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-13 22:03:04,605 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 16 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-13 22:03:11,804 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 1800, loss[loss=0.1122, beats_loss=0.00934, ecapa_loss=0.0001758, whisper_loss=0.1011, over 18927.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01054, ecapa_loss=0.0001561, whisper_loss=0.09055, over 3834833.79 frames. ], batch size: 75, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:03:16,233 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 25 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-13 22:03:28,051 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-13 22:03:38,505 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 33 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-13 22:03:38,810 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2336930.0, ans=0.125 2024-08-13 22:03:51,616 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2337030.0, ans=0.1 2024-08-13 22:03:54,839 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.78 vs. limit=15.0 2024-08-13 22:04:03,782 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.10 vs. limit=10.0 2024-08-13 22:04:12,790 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2337130.0, ans=0.125 2024-08-13 22:04:19,989 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2337230.0, ans=0.1 2024-08-13 22:04:20,859 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 1850, loss[loss=0.1182, beats_loss=0.009088, ecapa_loss=0.0001517, whisper_loss=0.1076, over 22040.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01048, ecapa_loss=0.0001562, whisper_loss=0.09159, over 3851053.34 frames. ], batch size: 85, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:04:21,243 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2337230.0, ans=0.1 2024-08-13 22:04:25,923 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=19.15 vs. limit=22.5 2024-08-13 22:04:36,417 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2337330.0, ans=0.125 2024-08-13 22:04:52,884 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.324e+01 2.518e+01 2.718e+01 4.142e+01, threshold=5.036e+01, percent-clipped=0.0 2024-08-13 22:05:28,230 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-13 22:05:30,318 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 1900, loss[loss=0.105, beats_loss=0.008919, ecapa_loss=0.0001942, whisper_loss=0.09414, over 21350.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01051, ecapa_loss=0.0001574, whisper_loss=0.09104, over 3820189.04 frames. ], batch size: 88, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:05:35,784 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2337730.0, ans=0.0 2024-08-13 22:05:54,660 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 24 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-13 22:06:00,237 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 13 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-13 22:06:01,919 INFO [train_multi_KD3.py:844] (1/4) A total of 96 cuts. 30 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-13 22:06:02,472 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.92 vs. limit=15.0 2024-08-13 22:06:05,999 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2337930.0, ans=0.0 2024-08-13 22:06:22,365 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.63 vs. limit=15.0 2024-08-13 22:06:22,413 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.05 vs. limit=12.0 2024-08-13 22:06:24,161 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2338030.0, ans=0.125 2024-08-13 22:06:36,249 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2338130.0, ans=0.0 2024-08-13 22:06:47,862 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2338130.0, ans=0.125 2024-08-13 22:06:52,658 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 1950, loss[loss=0.1075, beats_loss=0.01109, ecapa_loss=0.000178, whisper_loss=0.09462, over 21208.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0105, ecapa_loss=0.0001578, whisper_loss=0.09112, over 3821647.64 frames. ], batch size: 87, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:06:55,601 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 21 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-13 22:06:59,526 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2338230.0, ans=0.125 2024-08-13 22:07:19,835 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2338330.0, ans=0.0 2024-08-13 22:07:30,587 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.995e+01 2.359e+01 2.594e+01 2.893e+01 6.920e+01, threshold=5.188e+01, percent-clipped=1.0 2024-08-13 22:08:00,050 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.91 vs. limit=10.0 2024-08-13 22:08:00,987 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2338630.0, ans=0.95 2024-08-13 22:08:02,638 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2338630.0, ans=0.2 2024-08-13 22:08:10,052 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 14 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-13 22:08:13,700 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 2000, loss[loss=0.1145, beats_loss=0.01161, ecapa_loss=0.0001726, whisper_loss=0.1011, over 22941.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.0105, ecapa_loss=0.0001579, whisper_loss=0.09206, over 3820431.46 frames. ], batch size: 92, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:08:29,730 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 26 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-13 22:08:33,598 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.77 vs. limit=8.0 2024-08-13 22:08:40,848 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2338830.0, ans=0.1 2024-08-13 22:08:41,104 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.38 vs. limit=15.0 2024-08-13 22:08:45,685 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2338930.0, ans=0.2 2024-08-13 22:09:00,394 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-13 22:09:06,182 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.68 vs. limit=15.0 2024-08-13 22:09:15,919 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2339030.0, ans=0.125 2024-08-13 22:09:35,314 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 2050, loss[loss=0.0941, beats_loss=0.0131, ecapa_loss=0.0001432, whisper_loss=0.07957, over 21588.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01053, ecapa_loss=0.0001575, whisper_loss=0.09161, over 3830595.14 frames. ], batch size: 84, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:10:13,082 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.784e+01 2.291e+01 2.646e+01 3.086e+01 1.043e+02, threshold=5.292e+01, percent-clipped=1.0 2024-08-13 22:10:20,667 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2339430.0, ans=0.05 2024-08-13 22:10:57,117 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 2100, loss[loss=0.1117, beats_loss=0.01035, ecapa_loss=0.0001731, whisper_loss=0.09959, over 21771.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01064, ecapa_loss=0.0001563, whisper_loss=0.09082, over 3841097.56 frames. ], batch size: 91, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:11:08,636 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-13 22:11:10,556 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2339730.0, ans=0.125 2024-08-13 22:11:22,982 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2339830.0, ans=0.09899494936611666 2024-08-13 22:11:24,032 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2339830.0, ans=0.0 2024-08-13 22:11:24,435 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.73 vs. limit=15.0 2024-08-13 22:11:30,137 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 28 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-13 22:11:44,163 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2340030.0, ans=0.0 2024-08-13 22:11:46,225 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.65 vs. limit=15.0 2024-08-13 22:12:01,659 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2340130.0, ans=0.2 2024-08-13 22:12:04,955 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2340130.0, ans=0.0 2024-08-13 22:12:08,936 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 16 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-13 22:12:11,709 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 24 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-13 22:12:14,758 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 2150, loss[loss=0.08339, beats_loss=0.01436, ecapa_loss=0.0001287, whisper_loss=0.06775, over 22637.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01064, ecapa_loss=0.0001569, whisper_loss=0.09086, over 3816119.50 frames. ], batch size: 91, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:12:15,884 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 38 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-13 22:12:20,888 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.58 vs. limit=15.0 2024-08-13 22:12:29,529 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.96 vs. limit=10.0 2024-08-13 22:12:36,651 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2340330.0, ans=0.1 2024-08-13 22:12:51,105 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-13 22:12:54,183 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.910e+01 2.324e+01 2.581e+01 2.963e+01 1.302e+02, threshold=5.163e+01, percent-clipped=1.0 2024-08-13 22:13:06,378 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2340530.0, ans=0.0 2024-08-13 22:13:11,248 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.52 vs. limit=12.0 2024-08-13 22:13:12,577 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 31 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-13 22:13:25,878 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2340630.0, ans=0.125 2024-08-13 22:13:30,289 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2340630.0, ans=0.1 2024-08-13 22:13:31,390 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 32 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-13 22:13:33,312 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.40 vs. limit=15.0 2024-08-13 22:13:36,595 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 2200, loss[loss=0.07095, beats_loss=0.009417, ecapa_loss=0.0001359, whisper_loss=0.06018, over 14106.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01074, ecapa_loss=0.0001565, whisper_loss=0.09053, over 3835463.52 frames. ], batch size: 55, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:13:45,808 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2340730.0, ans=0.1 2024-08-13 22:14:12,153 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2340930.0, ans=0.125 2024-08-13 22:14:16,640 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2340930.0, ans=0.125 2024-08-13 22:14:34,137 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2341030.0, ans=0.125 2024-08-13 22:14:39,358 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2341130.0, ans=0.0 2024-08-13 22:14:57,127 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 2250, loss[loss=0.1368, beats_loss=0.00762, ecapa_loss=0.0002397, whisper_loss=0.1268, over 21086.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01077, ecapa_loss=0.0001568, whisper_loss=0.0908, over 3822452.63 frames. ], batch size: 89, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:14:58,749 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 21 from LS+wenet, 27 from Vox, 21 fro AS 2024-08-13 22:15:22,837 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2341330.0, ans=0.125 2024-08-13 22:15:29,661 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2341430.0, ans=0.2 2024-08-13 22:15:35,330 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.396e+01 2.660e+01 2.938e+01 1.173e+02, threshold=5.320e+01, percent-clipped=2.0 2024-08-13 22:16:01,635 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-13 22:16:02,018 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.90 vs. limit=15.0 2024-08-13 22:16:10,729 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2341630.0, ans=0.125 2024-08-13 22:16:16,112 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.51 vs. limit=15.0 2024-08-13 22:16:18,221 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=2341730.0, ans=0.02 2024-08-13 22:16:18,965 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 2300, loss[loss=0.1258, beats_loss=0.009539, ecapa_loss=0.0001119, whisper_loss=0.1151, over 19447.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01074, ecapa_loss=0.0001573, whisper_loss=0.09171, over 3866922.89 frames. ], batch size: 70, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:16:33,730 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2341730.0, ans=0.125 2024-08-13 22:16:35,801 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.86 vs. limit=22.5 2024-08-13 22:16:36,552 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 19 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-13 22:16:38,579 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2341830.0, ans=0.0 2024-08-13 22:17:08,882 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2342030.0, ans=0.2 2024-08-13 22:17:24,894 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2342130.0, ans=0.0 2024-08-13 22:17:32,977 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2342130.0, ans=0.125 2024-08-13 22:17:38,886 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2342230.0, ans=0.1 2024-08-13 22:17:39,676 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 2350, loss[loss=0.09404, beats_loss=0.01311, ecapa_loss=0.0001313, whisper_loss=0.07962, over 18658.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01071, ecapa_loss=0.0001577, whisper_loss=0.09224, over 3891171.00 frames. ], batch size: 72, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:18:03,609 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2342330.0, ans=0.0 2024-08-13 22:18:19,698 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.977e+01 2.389e+01 2.636e+01 2.881e+01 1.786e+02, threshold=5.272e+01, percent-clipped=1.0 2024-08-13 22:18:32,945 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 20 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-13 22:18:55,215 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-13 22:18:56,560 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 20 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-13 22:19:00,005 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2342730.0, ans=0.125 2024-08-13 22:19:01,120 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 2400, loss[loss=0.1156, beats_loss=0.01097, ecapa_loss=0.0001158, whisper_loss=0.1034, over 23303.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01072, ecapa_loss=0.0001574, whisper_loss=0.09205, over 3901387.17 frames. ], batch size: 90, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:19:07,722 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 20 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-13 22:19:07,893 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2342730.0, ans=0.1 2024-08-13 22:19:14,543 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 28 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-13 22:19:24,706 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2342830.0, ans=0.1 2024-08-13 22:19:37,824 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2342930.0, ans=0.2 2024-08-13 22:19:45,005 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.13 vs. limit=22.5 2024-08-13 22:19:49,672 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 32 from Vox, 31 fro AS 2024-08-13 22:19:51,311 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2343030.0, ans=0.035 2024-08-13 22:19:59,234 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2343030.0, ans=0.125 2024-08-13 22:20:14,186 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2343130.0, ans=0.0 2024-08-13 22:20:17,146 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-13 22:20:23,388 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 11 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-13 22:20:24,390 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 2450, loss[loss=0.06868, beats_loss=0.01166, ecapa_loss=0.0001786, whisper_loss=0.05523, over 14149.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01065, ecapa_loss=0.0001572, whisper_loss=0.09233, over 3913268.96 frames. ], batch size: 59, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:20:25,446 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2343230.0, ans=0.125 2024-08-13 22:20:39,668 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 24 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-13 22:20:43,224 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.51 vs. limit=15.0 2024-08-13 22:20:53,113 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 23 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-13 22:21:04,105 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.16 vs. limit=22.5 2024-08-13 22:21:05,821 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.291e+01 2.587e+01 2.997e+01 1.554e+02, threshold=5.173e+01, percent-clipped=3.0 2024-08-13 22:21:42,933 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 32 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-13 22:21:47,762 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 2500, loss[loss=0.08689, beats_loss=0.01301, ecapa_loss=0.000149, whisper_loss=0.07239, over 22533.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01064, ecapa_loss=0.0001563, whisper_loss=0.09214, over 3900793.83 frames. ], batch size: 93, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:21:49,547 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 20 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-13 22:22:02,300 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.08 vs. limit=15.0 2024-08-13 22:22:04,794 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2343830.0, ans=0.0 2024-08-13 22:22:13,431 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2343830.0, ans=0.2 2024-08-13 22:22:15,430 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2343830.0, ans=0.1 2024-08-13 22:22:35,752 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2344030.0, ans=0.125 2024-08-13 22:22:45,069 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 31 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-13 22:22:48,939 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2344030.0, ans=0.2 2024-08-13 22:22:57,654 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2344130.0, ans=0.125 2024-08-13 22:23:12,902 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 2550, loss[loss=0.1186, beats_loss=0.009018, ecapa_loss=0.0001618, whisper_loss=0.108, over 20360.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01055, ecapa_loss=0.0001569, whisper_loss=0.09294, over 3892056.40 frames. ], batch size: 80, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:23:27,611 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2344230.0, ans=0.125 2024-08-13 22:23:33,623 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 15 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-13 22:23:53,769 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.921e+01 2.329e+01 2.677e+01 3.229e+01 5.510e+01, threshold=5.353e+01, percent-clipped=1.0 2024-08-13 22:23:55,040 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.50 vs. limit=15.0 2024-08-13 22:23:57,630 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2344430.0, ans=0.125 2024-08-13 22:24:01,900 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-13 22:24:11,931 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2344530.0, ans=0.2 2024-08-13 22:24:13,047 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2344530.0, ans=0.035 2024-08-13 22:24:16,954 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2344530.0, ans=0.125 2024-08-13 22:24:35,880 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 2600, loss[loss=0.1031, beats_loss=0.011, ecapa_loss=0.0001828, whisper_loss=0.09032, over 18626.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01059, ecapa_loss=0.0001567, whisper_loss=0.09287, over 3911572.19 frames. ], batch size: 73, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:24:51,672 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 17 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-13 22:25:00,847 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2344830.0, ans=0.0 2024-08-13 22:25:20,538 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2344930.0, ans=0.125 2024-08-13 22:25:32,221 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2345030.0, ans=0.125 2024-08-13 22:25:34,804 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 23 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-13 22:25:42,666 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 31 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-13 22:25:43,396 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.38 vs. limit=15.0 2024-08-13 22:25:54,393 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 2650, loss[loss=0.1029, beats_loss=0.00965, ecapa_loss=0.0001445, whisper_loss=0.0918, over 15500.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01067, ecapa_loss=0.0001575, whisper_loss=0.09126, over 3903820.76 frames. ], batch size: 59, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:26:26,088 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 17 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-13 22:26:31,859 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.731e+01 2.316e+01 2.514e+01 2.879e+01 4.241e+01, threshold=5.029e+01, percent-clipped=0.0 2024-08-13 22:26:37,822 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2345430.0, ans=0.2 2024-08-13 22:26:39,018 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2345430.0, ans=0.1 2024-08-13 22:26:43,299 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 23 from LS+wenet, 13 from Vox, 43 fro AS 2024-08-13 22:26:58,640 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 22:27:04,700 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 27 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-13 22:27:05,002 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2345630.0, ans=0.2 2024-08-13 22:27:13,546 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 2700, loss[loss=0.1101, beats_loss=0.01127, ecapa_loss=0.0001333, whisper_loss=0.09745, over 20281.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01073, ecapa_loss=0.0001575, whisper_loss=0.09091, over 3889465.41 frames. ], batch size: 79, lr: 3.72e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:27:14,085 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2345730.0, ans=0.125 2024-08-13 22:27:21,674 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 25 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-13 22:27:25,354 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2345730.0, ans=0.125 2024-08-13 22:27:26,522 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 26 from LS+wenet, 22 from Vox, 16 fro AS 2024-08-13 22:27:41,489 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2345830.0, ans=0.125 2024-08-13 22:27:50,923 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2345930.0, ans=0.0 2024-08-13 22:27:56,114 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2345930.0, ans=0.125 2024-08-13 22:27:57,385 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 21 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-13 22:28:07,220 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2346030.0, ans=10.0 2024-08-13 22:28:27,126 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2346130.0, ans=0.125 2024-08-13 22:28:28,559 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2346130.0, ans=0.1 2024-08-13 22:28:32,292 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 2750, loss[loss=0.07678, beats_loss=0.0126, ecapa_loss=0.0001043, whisper_loss=0.06313, over 15363.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0107, ecapa_loss=0.0001582, whisper_loss=0.09069, over 3883257.91 frames. ], batch size: 56, lr: 3.72e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:28:37,378 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-13 22:28:39,900 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2346230.0, ans=0.2 2024-08-13 22:28:48,565 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2346330.0, ans=0.1 2024-08-13 22:28:51,050 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2346330.0, ans=0.125 2024-08-13 22:29:00,646 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2346330.0, ans=0.125 2024-08-13 22:29:11,727 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.971e+01 2.410e+01 2.665e+01 3.029e+01 5.908e+01, threshold=5.329e+01, percent-clipped=1.0 2024-08-13 22:29:12,129 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-13 22:29:17,313 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2346430.0, ans=0.125 2024-08-13 22:29:26,248 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2346530.0, ans=0.0 2024-08-13 22:29:28,968 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-13 22:29:32,482 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2346530.0, ans=0.0 2024-08-13 22:29:50,600 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 2800, loss[loss=0.1036, beats_loss=0.01009, ecapa_loss=0.0001655, whisper_loss=0.09188, over 22487.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01066, ecapa_loss=0.0001578, whisper_loss=0.0908, over 3898341.68 frames. ], batch size: 87, lr: 3.72e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:29:58,296 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.81 vs. limit=22.5 2024-08-13 22:30:01,169 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2346730.0, ans=0.1 2024-08-13 22:30:34,753 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2346930.0, ans=0.125 2024-08-13 22:30:54,360 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2347030.0, ans=0.2 2024-08-13 22:30:59,459 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2347130.0, ans=0.1 2024-08-13 22:31:02,790 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 18 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-13 22:31:15,270 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 2850, loss[loss=0.1261, beats_loss=0.008458, ecapa_loss=0.0001486, whisper_loss=0.1161, over 18061.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01066, ecapa_loss=0.0001591, whisper_loss=0.091, over 3862208.30 frames. ], batch size: 67, lr: 3.72e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:31:17,833 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2347230.0, ans=0.125 2024-08-13 22:31:28,304 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2347230.0, ans=0.1 2024-08-13 22:31:51,613 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 21 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-13 22:31:52,550 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.905e+01 2.368e+01 2.681e+01 3.083e+01 7.841e+01, threshold=5.363e+01, percent-clipped=3.0 2024-08-13 22:31:56,584 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=2347430.0, ans=10.0 2024-08-13 22:32:03,452 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2347530.0, ans=0.0 2024-08-13 22:32:43,691 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 2900, loss[loss=0.08345, beats_loss=0.01223, ecapa_loss=0.0001384, whisper_loss=0.06983, over 23852.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01077, ecapa_loss=0.0001587, whisper_loss=0.09109, over 3897828.37 frames. ], batch size: 96, lr: 3.72e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:32:45,731 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.88 vs. limit=22.5 2024-08-13 22:32:54,024 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 21 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-13 22:33:21,691 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.176e+00 2024-08-13 22:34:09,684 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-13 22:34:13,589 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2348130.0, ans=0.125 2024-08-13 22:34:19,870 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2348130.0, ans=0.1 2024-08-13 22:34:31,249 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 2950, loss[loss=0.09855, beats_loss=0.01135, ecapa_loss=0.0001528, whisper_loss=0.08568, over 21186.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01081, ecapa_loss=0.000159, whisper_loss=0.09122, over 3928769.68 frames. ], batch size: 85, lr: 3.72e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:35:29,767 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.877e+01 2.427e+01 2.649e+01 3.118e+01 1.077e+02, threshold=5.298e+01, percent-clipped=4.0 2024-08-13 22:36:01,248 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.64 vs. limit=22.5 2024-08-13 22:36:27,965 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 27 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-13 22:36:37,577 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 3000, loss[loss=0.08977, beats_loss=0.01063, ecapa_loss=0.0001917, whisper_loss=0.07723, over 17287.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01078, ecapa_loss=0.0001592, whisper_loss=0.09107, over 3927298.54 frames. ], batch size: 72, lr: 3.72e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:36:37,577 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-13 22:37:40,797 INFO [train_multi_KD3.py:1149] (1/4) Epoch 17, validation on ASR_libri: loss=0.2526, beats_loss=0, ecapa_loss=0.0005533, whisper_loss=0.2471, over 922467.00 frames. 2024-08-13 22:38:04,690 INFO [train_multi_KD3.py:1149] (1/4) Epoch 17, validation on SV_voxceleb1: loss=0.004391, beats_loss=0, ecapa_loss=0.0004391, whisper_loss=0, over 939242.00 frames. 2024-08-13 22:40:23,770 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.3047, 2.6389, 2.8574, 2.7706], device='cuda:1') 2024-08-13 22:41:12,721 INFO [train_multi_KD3.py:1149] (1/4) Epoch 17, validation on AT_audioset: loss=0.02357, beats_loss=0.02357, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 22:41:12,724 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 30639MB 2024-08-13 22:41:31,965 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-13 22:41:32,186 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2348830.0, ans=0.125 2024-08-13 22:42:04,046 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 24 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-13 22:42:17,092 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 22 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-13 22:42:27,052 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 25 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-13 22:42:35,591 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 21 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-13 22:42:41,660 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 3050, loss[loss=0.1005, beats_loss=0.01062, ecapa_loss=0.0001745, whisper_loss=0.08816, over 20431.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01072, ecapa_loss=0.0001606, whisper_loss=0.09196, over 3926841.39 frames. ], batch size: 85, lr: 3.72e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:43:02,508 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.41 vs. limit=22.5 2024-08-13 22:43:23,082 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2349430.0, ans=0.0 2024-08-13 22:43:25,583 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.934e+01 2.432e+01 2.716e+01 3.181e+01 1.148e+02, threshold=5.433e+01, percent-clipped=2.0 2024-08-13 22:43:36,706 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 20 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-13 22:43:37,903 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 26 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-13 22:43:48,586 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.14 vs. limit=15.0 2024-08-13 22:43:50,436 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2349530.0, ans=0.125 2024-08-13 22:43:50,634 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.04 vs. limit=12.0 2024-08-13 22:44:11,973 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 3100, loss[loss=0.1038, beats_loss=0.01111, ecapa_loss=0.0001426, whisper_loss=0.09121, over 17115.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01063, ecapa_loss=0.0001622, whisper_loss=0.09222, over 3921269.06 frames. ], batch size: 65, lr: 3.72e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:44:21,396 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2349730.0, ans=0.0 2024-08-13 22:45:03,066 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 22:45:13,152 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 30 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-13 22:45:19,145 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 18 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-13 22:45:37,925 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 3150, loss[loss=0.1138, beats_loss=0.0109, ecapa_loss=0.0001732, whisper_loss=0.1012, over 22663.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01072, ecapa_loss=0.0001612, whisper_loss=0.09175, over 3921896.37 frames. ], batch size: 90, lr: 3.72e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:45:43,617 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 25 from LS+wenet, 12 from Vox, 38 fro AS 2024-08-13 22:45:52,046 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2350230.0, ans=0.0 2024-08-13 22:45:58,202 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2350330.0, ans=0.04949747468305833 2024-08-13 22:46:05,162 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 22 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-13 22:46:08,601 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-13 22:46:08,851 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2350330.0, ans=0.125 2024-08-13 22:46:13,591 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2350430.0, ans=0.125 2024-08-13 22:46:17,364 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-13 22:46:20,839 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.358e+01 2.601e+01 2.838e+01 4.154e+01, threshold=5.202e+01, percent-clipped=0.0 2024-08-13 22:46:37,230 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2350530.0, ans=0.1 2024-08-13 22:47:02,079 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-13 22:47:07,221 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 3200, loss[loss=0.09177, beats_loss=0.01156, ecapa_loss=0.000136, whisper_loss=0.07885, over 22397.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01074, ecapa_loss=0.000161, whisper_loss=0.09145, over 3911005.78 frames. ], batch size: 90, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:47:10,975 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 23 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-13 22:47:11,335 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2350730.0, ans=0.1 2024-08-13 22:47:13,453 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-13 22:47:13,716 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2350730.0, ans=0.125 2024-08-13 22:47:15,871 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.178e+00 2024-08-13 22:47:17,332 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2350730.0, ans=0.0 2024-08-13 22:47:29,608 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.06 vs. limit=15.0 2024-08-13 22:47:44,838 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 18 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-13 22:48:33,482 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 16 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-13 22:48:33,814 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2351130.0, ans=0.125 2024-08-13 22:48:37,597 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 3250, loss[loss=0.09959, beats_loss=0.01189, ecapa_loss=0.0001537, whisper_loss=0.08616, over 21551.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01067, ecapa_loss=0.0001608, whisper_loss=0.09163, over 3875371.27 frames. ], batch size: 88, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:48:49,119 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 31 from Vox, 26 fro AS 2024-08-13 22:49:00,021 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 16 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-13 22:49:14,296 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 26 from LS+wenet, 28 from Vox, 24 fro AS 2024-08-13 22:49:17,382 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2351430.0, ans=0.125 2024-08-13 22:49:19,544 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.385e+01 2.597e+01 2.999e+01 7.217e+01, threshold=5.195e+01, percent-clipped=0.0 2024-08-13 22:49:30,093 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2351530.0, ans=0.0 2024-08-13 22:49:48,790 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 19 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-13 22:50:05,127 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 3300, loss[loss=0.11, beats_loss=0.00908, ecapa_loss=0.0001656, whisper_loss=0.09928, over 21896.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0108, ecapa_loss=0.0001601, whisper_loss=0.09059, over 3849725.69 frames. ], batch size: 88, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:50:25,281 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2351830.0, ans=0.1 2024-08-13 22:50:29,061 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.69 vs. limit=10.0 2024-08-13 22:50:32,111 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2351830.0, ans=0.125 2024-08-13 22:50:32,223 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2351830.0, ans=0.125 2024-08-13 22:50:33,317 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 23 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-13 22:50:39,281 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.13 vs. limit=15.0 2024-08-13 22:50:49,871 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2351930.0, ans=0.0 2024-08-13 22:50:51,237 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2351930.0, ans=0.1 2024-08-13 22:51:07,974 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2352030.0, ans=0.1 2024-08-13 22:51:11,708 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2352030.0, ans=0.125 2024-08-13 22:51:30,149 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 3350, loss[loss=0.1002, beats_loss=0.01218, ecapa_loss=0.0001787, whisper_loss=0.08628, over 21634.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01078, ecapa_loss=0.0001595, whisper_loss=0.09111, over 3866988.33 frames. ], batch size: 94, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:51:43,427 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2352230.0, ans=0.125 2024-08-13 22:51:55,859 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2352330.0, ans=0.0 2024-08-13 22:52:10,457 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2352430.0, ans=0.1 2024-08-13 22:52:11,080 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 2.332e+01 2.587e+01 3.048e+01 7.749e+01, threshold=5.173e+01, percent-clipped=3.0 2024-08-13 22:52:32,726 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2352530.0, ans=0.125 2024-08-13 22:52:35,472 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-13 22:52:54,066 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.97 vs. limit=6.0 2024-08-13 22:52:56,428 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 3400, loss[loss=0.09547, beats_loss=0.01117, ecapa_loss=0.0001574, whisper_loss=0.08272, over 17726.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01079, ecapa_loss=0.0001585, whisper_loss=0.09138, over 3910876.41 frames. ], batch size: 70, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:52:59,767 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.73 vs. limit=15.0 2024-08-13 22:53:11,592 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2352730.0, ans=0.125 2024-08-13 22:53:19,117 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2352830.0, ans=0.0 2024-08-13 22:53:27,394 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2352830.0, ans=0.125 2024-08-13 22:53:43,021 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2352930.0, ans=0.125 2024-08-13 22:53:55,528 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2353030.0, ans=0.1 2024-08-13 22:53:57,371 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.38 vs. limit=12.0 2024-08-13 22:53:58,190 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 22:54:02,231 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2353030.0, ans=0.04949747468305833 2024-08-13 22:54:07,522 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2353130.0, ans=0.125 2024-08-13 22:54:19,845 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2353130.0, ans=0.04949747468305833 2024-08-13 22:54:26,405 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 3450, loss[loss=0.1167, beats_loss=0.009965, ecapa_loss=0.0001869, whisper_loss=0.1049, over 22878.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01081, ecapa_loss=0.0001592, whisper_loss=0.09048, over 3918151.63 frames. ], batch size: 93, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:54:31,267 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=2353230.0, ans=10.0 2024-08-13 22:54:42,186 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 18 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-13 22:55:02,759 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2353430.0, ans=0.125 2024-08-13 22:55:09,099 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.899e+01 2.393e+01 2.606e+01 2.901e+01 5.659e+01, threshold=5.211e+01, percent-clipped=1.0 2024-08-13 22:55:09,510 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 18 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-13 22:55:11,052 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=2353430.0, ans=0.1 2024-08-13 22:55:16,371 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2353430.0, ans=0.125 2024-08-13 22:55:30,875 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2353530.0, ans=0.125 2024-08-13 22:55:48,038 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2353630.0, ans=0.125 2024-08-13 22:55:52,621 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 3500, loss[loss=0.1182, beats_loss=0.01093, ecapa_loss=0.000142, whisper_loss=0.1059, over 22655.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01086, ecapa_loss=0.0001601, whisper_loss=0.08991, over 3902762.13 frames. ], batch size: 90, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:55:56,854 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 15 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-13 22:56:00,390 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 15 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-13 22:56:24,638 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 19 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-13 22:56:31,171 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 12 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-13 22:56:37,224 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 21 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-13 22:57:12,410 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 17 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-13 22:57:15,883 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 3550, loss[loss=0.1062, beats_loss=0.008621, ecapa_loss=0.0001643, whisper_loss=0.09598, over 16902.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01081, ecapa_loss=0.000159, whisper_loss=0.08981, over 3871444.32 frames. ], batch size: 66, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:57:29,000 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2354230.0, ans=0.0 2024-08-13 22:57:29,305 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.57 vs. limit=22.5 2024-08-13 22:57:45,066 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 19 from LS+wenet, 25 from Vox, 49 fro AS 2024-08-13 22:57:45,345 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2354330.0, ans=0.2 2024-08-13 22:57:54,855 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.111e+01 2.375e+01 2.617e+01 2.958e+01 4.205e+01, threshold=5.234e+01, percent-clipped=0.0 2024-08-13 22:57:55,617 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2354430.0, ans=0.2 2024-08-13 22:57:57,213 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2354430.0, ans=0.2 2024-08-13 22:58:05,878 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.41 vs. limit=15.0 2024-08-13 22:58:30,654 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-13 22:58:30,830 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2354630.0, ans=0.125 2024-08-13 22:58:36,908 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 3600, loss[loss=0.1037, beats_loss=0.01058, ecapa_loss=0.0001463, whisper_loss=0.09163, over 19340.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0108, ecapa_loss=0.0001595, whisper_loss=0.08964, over 3855712.62 frames. ], batch size: 76, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:58:42,227 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 22 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-13 22:58:44,012 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2354730.0, ans=0.0 2024-08-13 22:58:50,361 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2354730.0, ans=0.1 2024-08-13 22:58:55,928 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.94 vs. limit=6.0 2024-08-13 22:59:14,991 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 26 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-13 22:59:16,708 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2354930.0, ans=0.2 2024-08-13 22:59:38,304 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2355030.0, ans=0.0 2024-08-13 22:59:43,616 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 30 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-13 22:59:46,798 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 21 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-13 22:59:48,629 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 22 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-13 22:59:56,857 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 3650, loss[loss=0.08403, beats_loss=0.01287, ecapa_loss=0.0001658, whisper_loss=0.0695, over 17690.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01078, ecapa_loss=0.0001604, whisper_loss=0.08981, over 3816797.64 frames. ], batch size: 73, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 23:00:12,353 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 17 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-13 23:00:16,576 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 20 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-13 23:00:18,423 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2355330.0, ans=0.0 2024-08-13 23:00:34,822 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.876e+01 2.445e+01 2.700e+01 3.239e+01 5.632e+01, threshold=5.401e+01, percent-clipped=1.0 2024-08-13 23:00:36,791 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2355430.0, ans=0.125 2024-08-13 23:00:54,097 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2355530.0, ans=0.125 2024-08-13 23:00:57,094 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2355530.0, ans=0.125 2024-08-13 23:00:57,248 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2355530.0, ans=0.125 2024-08-13 23:01:05,899 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2355630.0, ans=0.0 2024-08-13 23:01:15,933 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 3700, loss[loss=0.1197, beats_loss=0.009854, ecapa_loss=0.0001885, whisper_loss=0.1079, over 22876.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01083, ecapa_loss=0.0001601, whisper_loss=0.08956, over 3819589.64 frames. ], batch size: 92, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 23:01:16,627 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.49 vs. limit=12.0 2024-08-13 23:01:17,782 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2355730.0, ans=0.0 2024-08-13 23:01:21,926 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 28 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-13 23:01:44,874 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.408e+01 2024-08-13 23:01:45,105 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.29 vs. limit=22.5 2024-08-13 23:01:46,490 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2355930.0, ans=0.0 2024-08-13 23:01:51,558 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2355930.0, ans=0.125 2024-08-13 23:01:55,333 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 33 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-13 23:02:24,348 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 19 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-13 23:02:34,223 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 3750, loss[loss=0.08986, beats_loss=0.01087, ecapa_loss=0.0001901, whisper_loss=0.07709, over 19391.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01075, ecapa_loss=0.0001601, whisper_loss=0.09037, over 3818188.52 frames. ], batch size: 85, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 23:02:43,731 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-13 23:03:00,363 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2356330.0, ans=0.2 2024-08-13 23:03:07,547 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 31 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-13 23:03:09,562 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=2356430.0, ans=15.0 2024-08-13 23:03:10,242 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.800e+01 2.349e+01 2.622e+01 2.917e+01 8.940e+01, threshold=5.244e+01, percent-clipped=1.0 2024-08-13 23:03:21,453 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.61 vs. limit=6.0 2024-08-13 23:03:31,167 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.12 vs. limit=15.0 2024-08-13 23:03:35,779 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2356630.0, ans=0.125 2024-08-13 23:03:43,456 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2356630.0, ans=0.125 2024-08-13 23:03:49,149 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 3800, loss[loss=0.1002, beats_loss=0.01122, ecapa_loss=0.0001284, whisper_loss=0.08773, over 15680.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01083, ecapa_loss=0.0001599, whisper_loss=0.08969, over 3803249.48 frames. ], batch size: 58, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 23:04:06,488 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2356830.0, ans=0.125 2024-08-13 23:04:11,033 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2356830.0, ans=0.07 2024-08-13 23:04:22,048 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 18 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-13 23:04:38,919 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 31 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-13 23:04:42,432 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2357030.0, ans=0.125 2024-08-13 23:04:43,693 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-13 23:04:47,102 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-13 23:04:47,319 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2357030.0, ans=0.125 2024-08-13 23:04:47,398 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2357030.0, ans=0.0 2024-08-13 23:04:50,703 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 35 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-13 23:05:01,850 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2357130.0, ans=0.125 2024-08-13 23:05:07,154 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 3850, loss[loss=0.1169, beats_loss=0.008688, ecapa_loss=0.0002027, whisper_loss=0.1062, over 21357.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01085, ecapa_loss=0.0001598, whisper_loss=0.08984, over 3818354.15 frames. ], batch size: 88, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 23:05:28,992 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2357330.0, ans=0.0 2024-08-13 23:05:33,729 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2357330.0, ans=0.125 2024-08-13 23:05:35,287 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2357330.0, ans=0.125 2024-08-13 23:05:44,057 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.901e+01 2.323e+01 2.537e+01 2.804e+01 4.147e+01, threshold=5.073e+01, percent-clipped=0.0 2024-08-13 23:05:47,259 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-13 23:05:53,252 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2357530.0, ans=0.1 2024-08-13 23:05:53,305 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2357530.0, ans=0.125 2024-08-13 23:06:10,688 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2357630.0, ans=0.1 2024-08-13 23:06:16,714 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 20 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-13 23:06:23,324 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 3900, loss[loss=0.1261, beats_loss=0.008813, ecapa_loss=0.0001584, whisper_loss=0.1157, over 22782.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01077, ecapa_loss=0.0001612, whisper_loss=0.09101, over 3859916.67 frames. ], batch size: 90, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 23:07:24,431 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2358130.0, ans=0.0 2024-08-13 23:07:26,700 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-13 23:07:26,900 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2358130.0, ans=0.125 2024-08-13 23:07:32,937 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 20 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-13 23:07:39,928 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 27 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-13 23:07:41,453 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 3950, loss[loss=0.1136, beats_loss=0.008949, ecapa_loss=0.0001799, whisper_loss=0.1029, over 17744.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01073, ecapa_loss=0.0001623, whisper_loss=0.09135, over 3862252.67 frames. ], batch size: 73, lr: 3.71e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:07:43,695 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.96 vs. limit=22.5 2024-08-13 23:07:57,170 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2358330.0, ans=0.0 2024-08-13 23:08:09,423 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 22 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-13 23:08:16,642 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 30 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-13 23:08:20,321 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.910e+01 2.511e+01 2.750e+01 3.070e+01 4.670e+01, threshold=5.499e+01, percent-clipped=0.0 2024-08-13 23:08:22,072 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2358430.0, ans=0.125 2024-08-13 23:08:33,485 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.15 vs. limit=10.0 2024-08-13 23:08:48,421 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 25 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-13 23:08:57,430 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 4000, loss[loss=0.1211, beats_loss=0.008888, ecapa_loss=0.0001806, whisper_loss=0.1104, over 22435.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01064, ecapa_loss=0.000163, whisper_loss=0.09233, over 3853341.67 frames. ], batch size: 90, lr: 3.71e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:09:26,582 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 17 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-13 23:09:26,822 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2358930.0, ans=0.2 2024-08-13 23:09:37,012 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-13 23:09:53,148 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2359030.0, ans=0.0 2024-08-13 23:10:15,205 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 4050, loss[loss=0.1066, beats_loss=0.008977, ecapa_loss=0.0001675, whisper_loss=0.09596, over 20092.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01065, ecapa_loss=0.0001622, whisper_loss=0.0921, over 3839052.04 frames. ], batch size: 78, lr: 3.71e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:10:15,602 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2359230.0, ans=0.125 2024-08-13 23:10:15,914 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.92 vs. limit=22.5 2024-08-13 23:10:30,205 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-13 23:10:41,770 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 23 from LS+wenet, 36 from Vox, 34 fro AS 2024-08-13 23:10:44,778 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2359430.0, ans=0.2 2024-08-13 23:10:51,744 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.046e+01 2.408e+01 2.659e+01 2.975e+01 6.287e+01, threshold=5.318e+01, percent-clipped=1.0 2024-08-13 23:10:58,479 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.455e+05 2024-08-13 23:11:09,909 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 16 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-13 23:11:29,846 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 4100, loss[loss=0.1089, beats_loss=0.01115, ecapa_loss=0.0001806, whisper_loss=0.096, over 22450.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01068, ecapa_loss=0.0001644, whisper_loss=0.09199, over 3843007.56 frames. ], batch size: 94, lr: 3.71e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:11:41,963 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-13 23:11:43,183 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 30 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-13 23:12:02,651 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 21 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-13 23:12:18,319 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 19 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-13 23:12:31,568 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 23 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-13 23:12:48,432 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 4150, loss[loss=0.09102, beats_loss=0.011, ecapa_loss=0.0001516, whisper_loss=0.0785, over 20566.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.0107, ecapa_loss=0.0001634, whisper_loss=0.09227, over 3848524.89 frames. ], batch size: 83, lr: 3.71e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:12:54,817 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2360230.0, ans=0.07 2024-08-13 23:12:54,822 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2360230.0, ans=0.125 2024-08-13 23:12:57,406 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 29 from LS+wenet, 31 from Vox, 34 fro AS 2024-08-13 23:13:01,009 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2360230.0, ans=10.0 2024-08-13 23:13:14,263 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2360330.0, ans=0.2 2024-08-13 23:13:18,713 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2360430.0, ans=0.0 2024-08-13 23:13:25,973 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.970e+01 2.420e+01 2.616e+01 2.987e+01 7.044e+01, threshold=5.231e+01, percent-clipped=1.0 2024-08-13 23:13:31,134 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 21 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-13 23:13:32,387 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 33 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-13 23:13:49,360 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2360630.0, ans=0.0 2024-08-13 23:14:02,980 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 4200, loss[loss=0.09237, beats_loss=0.009555, ecapa_loss=0.0001967, whisper_loss=0.08085, over 21488.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01071, ecapa_loss=0.000163, whisper_loss=0.0924, over 3888453.82 frames. ], batch size: 91, lr: 3.71e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:14:10,607 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 23 from LS+wenet, 34 from Vox, 38 fro AS 2024-08-13 23:14:11,937 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 25 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-13 23:14:18,592 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.98 vs. limit=15.0 2024-08-13 23:14:34,481 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2360930.0, ans=10.0 2024-08-13 23:14:39,617 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2360930.0, ans=0.125 2024-08-13 23:14:44,806 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 18 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-13 23:15:12,167 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 4250, loss[loss=0.09198, beats_loss=0.01278, ecapa_loss=0.0001648, whisper_loss=0.07756, over 15253.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01074, ecapa_loss=0.0001615, whisper_loss=0.09232, over 3910292.84 frames. ], batch size: 63, lr: 3.71e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:15:17,953 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2361230.0, ans=0.125 2024-08-13 23:15:28,558 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2361330.0, ans=0.1 2024-08-13 23:15:44,825 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.995e+01 2.294e+01 2.587e+01 2.870e+01 6.296e+01, threshold=5.174e+01, percent-clipped=1.0 2024-08-13 23:15:46,305 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 13 from Vox, 47 fro AS 2024-08-13 23:15:47,091 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.71 vs. limit=15.0 2024-08-13 23:15:50,255 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 19 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-13 23:15:52,084 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.82 vs. limit=15.0 2024-08-13 23:15:57,955 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2361530.0, ans=0.2 2024-08-13 23:16:09,637 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=2361630.0, ans=0.05 2024-08-13 23:16:12,394 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2361630.0, ans=0.1 2024-08-13 23:16:14,978 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2361630.0, ans=0.2 2024-08-13 23:16:17,377 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 4300, loss[loss=0.08158, beats_loss=0.009911, ecapa_loss=0.0001689, whisper_loss=0.06998, over 14849.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01078, ecapa_loss=0.0001605, whisper_loss=0.09177, over 3906237.36 frames. ], batch size: 60, lr: 3.71e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:17:09,846 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.27 vs. limit=15.0 2024-08-13 23:17:20,932 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 23 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 23:17:45,652 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 30 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-13 23:17:50,894 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.35 vs. limit=15.0 2024-08-13 23:17:53,994 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-13 23:17:56,403 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2362130.0, ans=0.0 2024-08-13 23:17:57,696 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.165e+05 2024-08-13 23:18:03,266 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2362130.0, ans=0.125 2024-08-13 23:18:13,047 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 4350, loss[loss=0.08649, beats_loss=0.01403, ecapa_loss=0.0001681, whisper_loss=0.07078, over 21712.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01073, ecapa_loss=0.0001608, whisper_loss=0.09186, over 3884397.11 frames. ], batch size: 93, lr: 3.71e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:18:17,250 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 27 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-13 23:18:29,517 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 20 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-13 23:18:38,433 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2362330.0, ans=0.5 2024-08-13 23:18:52,364 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.708e+01 2.337e+01 2.576e+01 3.012e+01 4.056e+01, threshold=5.151e+01, percent-clipped=0.0 2024-08-13 23:18:56,273 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.53 vs. limit=22.5 2024-08-13 23:19:06,788 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.85 vs. limit=22.5 2024-08-13 23:19:21,748 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2362630.0, ans=0.1 2024-08-13 23:19:32,550 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 22 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-13 23:19:33,837 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 4400, loss[loss=0.1096, beats_loss=0.007783, ecapa_loss=0.0001762, whisper_loss=0.1001, over 16372.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01076, ecapa_loss=0.0001614, whisper_loss=0.09075, over 3863812.23 frames. ], batch size: 65, lr: 3.71e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:19:35,403 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 28 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-13 23:19:53,934 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2362830.0, ans=0.125 2024-08-13 23:20:04,258 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 23 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-13 23:20:05,928 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2362930.0, ans=0.0 2024-08-13 23:20:08,651 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2362930.0, ans=0.1 2024-08-13 23:20:13,203 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2362930.0, ans=0.125 2024-08-13 23:20:29,595 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2363030.0, ans=0.0 2024-08-13 23:20:33,180 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2024-08-13 23:20:43,229 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 20 from LS+wenet, 25 from Vox, 46 fro AS 2024-08-13 23:20:44,256 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 22 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-13 23:20:48,658 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 4450, loss[loss=0.08917, beats_loss=0.0122, ecapa_loss=0.0001103, whisper_loss=0.07587, over 16092.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01074, ecapa_loss=0.000161, whisper_loss=0.09037, over 3861425.45 frames. ], batch size: 61, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:20:48,849 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 32 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-13 23:20:55,616 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2363230.0, ans=0.0 2024-08-13 23:21:15,739 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 16 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-13 23:21:28,618 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.990e+01 2.409e+01 2.664e+01 2.942e+01 4.100e+01, threshold=5.327e+01, percent-clipped=0.0 2024-08-13 23:21:29,163 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2363430.0, ans=0.125 2024-08-13 23:21:33,836 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 34 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-13 23:22:09,831 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 4500, loss[loss=0.1166, beats_loss=0.008989, ecapa_loss=0.0001753, whisper_loss=0.1059, over 15042.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01078, ecapa_loss=0.0001601, whisper_loss=0.09027, over 3845339.60 frames. ], batch size: 58, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:22:10,276 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2363730.0, ans=0.1 2024-08-13 23:22:10,408 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.28 vs. limit=15.0 2024-08-13 23:22:24,197 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2363830.0, ans=0.125 2024-08-13 23:23:10,752 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2364130.0, ans=0.125 2024-08-13 23:23:11,674 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 26 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-13 23:23:13,370 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2364130.0, ans=0.0 2024-08-13 23:23:18,040 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2364130.0, ans=0.2 2024-08-13 23:23:24,678 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 4550, loss[loss=0.09541, beats_loss=0.01155, ecapa_loss=0.0001189, whisper_loss=0.08267, over 16809.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0108, ecapa_loss=0.0001609, whisper_loss=0.09062, over 3862950.80 frames. ], batch size: 62, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:23:29,453 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2364230.0, ans=0.1 2024-08-13 23:24:00,082 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.794e+01 2.368e+01 2.686e+01 2.952e+01 5.692e+01, threshold=5.373e+01, percent-clipped=1.0 2024-08-13 23:24:01,980 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2364430.0, ans=0.0 2024-08-13 23:24:07,597 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 26 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-13 23:24:18,408 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 21 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-13 23:24:23,709 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 22 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-13 23:24:25,243 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2364630.0, ans=0.1 2024-08-13 23:24:33,931 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 4600, loss[loss=0.07935, beats_loss=0.01303, ecapa_loss=0.0001595, whisper_loss=0.06472, over 18910.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01081, ecapa_loss=0.0001621, whisper_loss=0.08942, over 3846094.32 frames. ], batch size: 82, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:24:55,040 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2364830.0, ans=0.0 2024-08-13 23:24:58,845 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 17 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-13 23:25:04,420 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2364930.0, ans=0.125 2024-08-13 23:25:05,306 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 28 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-13 23:25:16,568 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 21 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-13 23:25:22,440 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2365030.0, ans=0.125 2024-08-13 23:25:28,294 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.33 vs. limit=15.0 2024-08-13 23:25:32,347 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2365130.0, ans=0.125 2024-08-13 23:25:33,354 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 23 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-13 23:25:41,629 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 4650, loss[loss=0.0865, beats_loss=0.01075, ecapa_loss=0.0001966, whisper_loss=0.07379, over 17414.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01073, ecapa_loss=0.0001627, whisper_loss=0.09002, over 3832127.57 frames. ], batch size: 71, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:25:42,123 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2365230.0, ans=0.0 2024-08-13 23:25:46,242 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2365230.0, ans=0.0 2024-08-13 23:25:54,384 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 21 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-13 23:26:00,073 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2365330.0, ans=0.1 2024-08-13 23:26:06,200 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2365330.0, ans=0.2 2024-08-13 23:26:09,750 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 23 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-13 23:26:15,240 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.809e+01 2.449e+01 2.734e+01 2.969e+01 1.115e+02, threshold=5.467e+01, percent-clipped=2.0 2024-08-13 23:26:15,676 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2365430.0, ans=0.125 2024-08-13 23:26:43,329 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.09 vs. limit=10.0 2024-08-13 23:26:44,039 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-13 23:26:47,421 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 4700, loss[loss=0.1052, beats_loss=0.01042, ecapa_loss=0.000142, whisper_loss=0.09334, over 22891.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01074, ecapa_loss=0.0001623, whisper_loss=0.0906, over 3840092.47 frames. ], batch size: 89, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:27:00,380 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.10 vs. limit=15.0 2024-08-13 23:27:12,594 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2365930.0, ans=0.07 2024-08-13 23:27:20,400 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 27 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-13 23:27:24,470 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 27 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-13 23:27:28,284 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 21 from LS+wenet, 23 from Vox, 19 fro AS 2024-08-13 23:27:44,249 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2366130.0, ans=0.025 2024-08-13 23:27:49,261 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2366130.0, ans=0.0 2024-08-13 23:27:52,739 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 4750, loss[loss=0.1044, beats_loss=0.008301, ecapa_loss=0.0002176, whisper_loss=0.0939, over 21383.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01073, ecapa_loss=0.0001618, whisper_loss=0.09049, over 3856845.74 frames. ], batch size: 92, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:27:56,351 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2366230.0, ans=0.0 2024-08-13 23:28:12,747 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2366330.0, ans=0.0 2024-08-13 23:28:23,434 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2366430.0, ans=0.0 2024-08-13 23:28:25,298 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.418e+01 2.670e+01 2.931e+01 4.166e+01, threshold=5.341e+01, percent-clipped=0.0 2024-08-13 23:28:27,036 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.41 vs. limit=22.5 2024-08-13 23:28:27,260 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.10 vs. limit=10.0 2024-08-13 23:28:33,005 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 30 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-13 23:28:34,276 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 29 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-13 23:28:52,894 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2366630.0, ans=0.1 2024-08-13 23:28:56,794 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 20 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-13 23:28:57,840 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 4800, loss[loss=0.09685, beats_loss=0.01249, ecapa_loss=0.0001958, whisper_loss=0.0824, over 18366.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01071, ecapa_loss=0.0001621, whisper_loss=0.09146, over 3883815.34 frames. ], batch size: 78, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:28:58,650 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2366730.0, ans=0.0 2024-08-13 23:29:01,067 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=2366730.0, ans=0.95 2024-08-13 23:29:06,963 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 17 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-13 23:29:22,649 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 18 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-13 23:29:32,959 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 27 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-13 23:29:45,679 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 32 from LS+wenet, 36 from Vox, 25 fro AS 2024-08-13 23:29:46,062 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=2367030.0, ans=0.02 2024-08-13 23:30:02,723 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 4850, loss[loss=0.1161, beats_loss=0.008625, ecapa_loss=0.0001913, whisper_loss=0.1056, over 20521.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0107, ecapa_loss=0.0001621, whisper_loss=0.09137, over 3902984.06 frames. ], batch size: 85, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:30:13,600 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2367230.0, ans=0.125 2024-08-13 23:30:21,415 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2367330.0, ans=0.125 2024-08-13 23:30:24,512 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 17 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-13 23:30:30,070 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2367430.0, ans=0.0 2024-08-13 23:30:35,047 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.351e+01 2.637e+01 2.912e+01 5.043e+01, threshold=5.273e+01, percent-clipped=0.0 2024-08-13 23:30:58,904 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2367630.0, ans=0.0 2024-08-13 23:31:02,921 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2367630.0, ans=0.1 2024-08-13 23:31:07,489 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 4900, loss[loss=0.08917, beats_loss=0.01129, ecapa_loss=0.0001207, whisper_loss=0.07667, over 21853.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01072, ecapa_loss=0.0001614, whisper_loss=0.09187, over 3909558.06 frames. ], batch size: 84, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:31:07,990 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2367730.0, ans=0.0 2024-08-13 23:31:23,782 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 20 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-13 23:31:34,194 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 24 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-13 23:31:35,684 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2367930.0, ans=0.1 2024-08-13 23:31:49,912 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2368030.0, ans=0.2 2024-08-13 23:31:50,912 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 20 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-13 23:32:01,903 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_na.min_abs, batch_count=2368130.0, ans=0.02 2024-08-13 23:32:04,511 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2368130.0, ans=0.125 2024-08-13 23:32:10,898 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 18 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-13 23:32:11,560 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.34 vs. limit=15.0 2024-08-13 23:32:12,857 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.19 vs. limit=15.0 2024-08-13 23:32:13,291 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 4950, loss[loss=0.1135, beats_loss=0.01071, ecapa_loss=0.0001906, whisper_loss=0.1009, over 23250.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01069, ecapa_loss=0.0001612, whisper_loss=0.09148, over 3882905.82 frames. ], batch size: 94, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:32:15,443 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2368230.0, ans=0.125 2024-08-13 23:32:16,885 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2368230.0, ans=0.125 2024-08-13 23:32:19,696 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2368230.0, ans=0.125 2024-08-13 23:32:19,719 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2368230.0, ans=0.1 2024-08-13 23:32:29,937 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 22 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-13 23:32:46,506 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.294e+01 2.547e+01 2.845e+01 3.862e+01, threshold=5.095e+01, percent-clipped=0.0 2024-08-13 23:32:55,103 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2368530.0, ans=0.125 2024-08-13 23:33:01,510 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.937e+01 2024-08-13 23:33:14,302 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2368630.0, ans=0.1 2024-08-13 23:33:18,053 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2368730.0, ans=0.125 2024-08-13 23:33:19,130 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 5000, loss[loss=0.1096, beats_loss=0.01081, ecapa_loss=0.0001363, whisper_loss=0.09747, over 15548.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01066, ecapa_loss=0.0001603, whisper_loss=0.09232, over 3887638.36 frames. ], batch size: 59, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:33:19,623 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2368730.0, ans=0.2 2024-08-13 23:33:21,654 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 20 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-13 23:33:25,698 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2368730.0, ans=0.0 2024-08-13 23:33:37,353 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 24 from LS+wenet, 27 from Vox, 22 fro AS 2024-08-13 23:33:43,793 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.518e-03 2024-08-13 23:33:45,885 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 30 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-13 23:34:00,261 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 18 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-13 23:34:10,741 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2369130.0, ans=0.125 2024-08-13 23:34:15,850 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-13 23:34:18,941 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.82 vs. limit=10.0 2024-08-13 23:34:20,060 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.29 vs. limit=12.0 2024-08-13 23:34:23,191 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 5050, loss[loss=0.1051, beats_loss=0.01052, ecapa_loss=0.0001707, whisper_loss=0.09288, over 18014.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01079, ecapa_loss=0.0001599, whisper_loss=0.09191, over 3905163.87 frames. ], batch size: 73, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:34:29,859 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2369230.0, ans=0.125 2024-08-13 23:34:55,450 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.793e+01 2.306e+01 2.530e+01 2.921e+01 5.103e+01, threshold=5.061e+01, percent-clipped=1.0 2024-08-13 23:34:57,414 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2369430.0, ans=0.5 2024-08-13 23:35:06,297 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.01 vs. limit=15.0 2024-08-13 23:35:17,882 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 17 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-13 23:35:27,800 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 5100, loss[loss=0.1034, beats_loss=0.0111, ecapa_loss=0.0001828, whisper_loss=0.09052, over 20822.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01084, ecapa_loss=0.0001596, whisper_loss=0.09181, over 3891843.43 frames. ], batch size: 88, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:35:35,043 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.15 vs. limit=10.0 2024-08-13 23:35:35,928 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2369730.0, ans=0.0 2024-08-13 23:36:08,938 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 19 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-13 23:36:32,310 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 5150, loss[loss=0.1268, beats_loss=0.007728, ecapa_loss=0.000192, whisper_loss=0.1171, over 20601.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01092, ecapa_loss=0.0001582, whisper_loss=0.09093, over 3901390.25 frames. ], batch size: 85, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:36:41,859 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 25 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-13 23:36:42,133 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2370230.0, ans=0.1 2024-08-13 23:36:56,649 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2370330.0, ans=0.2 2024-08-13 23:36:57,958 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2370430.0, ans=0.0 2024-08-13 23:37:05,043 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.950e+01 2.435e+01 2.636e+01 3.072e+01 5.034e+01, threshold=5.273e+01, percent-clipped=0.0 2024-08-13 23:37:10,552 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2370530.0, ans=0.0 2024-08-13 23:37:37,333 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 5200, loss[loss=0.1099, beats_loss=0.01299, ecapa_loss=0.0001725, whisper_loss=0.0952, over 19773.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01088, ecapa_loss=0.0001588, whisper_loss=0.0912, over 3908153.45 frames. ], batch size: 82, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:37:40,630 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.89 vs. limit=6.0 2024-08-13 23:37:41,497 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2370730.0, ans=0.2 2024-08-13 23:37:45,407 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2370730.0, ans=0.125 2024-08-13 23:37:46,617 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2370730.0, ans=0.125 2024-08-13 23:37:50,342 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2370830.0, ans=0.125 2024-08-13 23:37:56,362 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 25 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-13 23:38:07,839 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 33 from LS+wenet, 14 from Vox, 46 fro AS 2024-08-13 23:38:11,679 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2370930.0, ans=0.1 2024-08-13 23:38:23,067 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2371030.0, ans=0.1 2024-08-13 23:38:30,956 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2371130.0, ans=0.125 2024-08-13 23:38:32,781 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.01 vs. limit=15.0 2024-08-13 23:38:40,804 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 5250, loss[loss=0.1095, beats_loss=0.01138, ecapa_loss=0.0001248, whisper_loss=0.09686, over 20502.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.0109, ecapa_loss=0.0001567, whisper_loss=0.09099, over 3893575.63 frames. ], batch size: 81, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:38:45,675 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=2371230.0, ans=22.5 2024-08-13 23:38:51,491 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 38 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-13 23:39:13,103 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.686e+01 2.304e+01 2.584e+01 2.839e+01 8.080e+01, threshold=5.168e+01, percent-clipped=1.0 2024-08-13 23:39:17,199 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 26 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-13 23:39:19,822 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 17 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-13 23:39:27,762 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2371530.0, ans=0.1 2024-08-13 23:39:32,617 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2371630.0, ans=0.1 2024-08-13 23:39:37,739 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 23 from LS+wenet, 23 from Vox, 20 fro AS 2024-08-13 23:39:45,206 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 5300, loss[loss=0.1216, beats_loss=0.008403, ecapa_loss=0.0002092, whisper_loss=0.1111, over 21756.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01074, ecapa_loss=0.0001583, whisper_loss=0.09228, over 3881897.41 frames. ], batch size: 91, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:39:46,570 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 21 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-13 23:40:10,169 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2371930.0, ans=0.0 2024-08-13 23:40:24,021 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 17 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-13 23:40:36,559 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 16 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-13 23:40:36,761 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2372130.0, ans=0.125 2024-08-13 23:40:37,979 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 24 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-13 23:40:41,187 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.26 vs. limit=15.0 2024-08-13 23:40:43,225 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2372130.0, ans=0.0 2024-08-13 23:40:45,882 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.85 vs. limit=22.5 2024-08-13 23:40:48,939 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 5350, loss[loss=0.126, beats_loss=0.007392, ecapa_loss=0.0001377, whisper_loss=0.1172, over 17456.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01072, ecapa_loss=0.0001579, whisper_loss=0.09199, over 3854208.01 frames. ], batch size: 64, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:40:51,750 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 22 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-13 23:40:52,944 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.86 vs. limit=15.0 2024-08-13 23:41:01,125 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2372330.0, ans=0.125 2024-08-13 23:41:03,763 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2372330.0, ans=0.125 2024-08-13 23:41:11,663 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2372330.0, ans=0.1 2024-08-13 23:41:21,439 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.749e+01 2.441e+01 2.659e+01 2.902e+01 4.183e+01, threshold=5.318e+01, percent-clipped=0.0 2024-08-13 23:41:27,685 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 35 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 23:41:27,986 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 23:41:53,145 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 5400, loss[loss=0.1021, beats_loss=0.009192, ecapa_loss=0.0001819, whisper_loss=0.09109, over 22826.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01073, ecapa_loss=0.0001574, whisper_loss=0.09239, over 3856543.66 frames. ], batch size: 92, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:42:11,173 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 24 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-13 23:42:15,164 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 23 from LS+wenet, 31 from Vox, 37 fro AS 2024-08-13 23:42:18,953 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 22 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-13 23:42:34,257 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2373030.0, ans=0.1 2024-08-13 23:42:41,901 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2373030.0, ans=0.125 2024-08-13 23:42:56,290 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 17 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-13 23:42:57,388 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 5450, loss[loss=0.09415, beats_loss=0.009743, ecapa_loss=0.0001482, whisper_loss=0.08292, over 15252.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01073, ecapa_loss=0.0001575, whisper_loss=0.0921, over 3842434.96 frames. ], batch size: 59, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:43:04,490 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.03 vs. limit=15.0 2024-08-13 23:43:24,375 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2373430.0, ans=0.125 2024-08-13 23:43:29,351 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.699e+01 2.305e+01 2.546e+01 2.870e+01 4.387e+01, threshold=5.093e+01, percent-clipped=0.0 2024-08-13 23:43:32,278 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 26 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-13 23:43:35,723 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.52 vs. limit=15.0 2024-08-13 23:43:38,696 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2373530.0, ans=0.0 2024-08-13 23:43:43,886 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2373530.0, ans=0.0 2024-08-13 23:43:52,826 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 20 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-13 23:43:53,103 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2373630.0, ans=0.125 2024-08-13 23:43:54,677 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2373630.0, ans=0.025 2024-08-13 23:43:57,429 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.30 vs. limit=15.0 2024-08-13 23:44:02,540 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 5500, loss[loss=0.09534, beats_loss=0.01228, ecapa_loss=0.0001734, whisper_loss=0.08132, over 16013.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01078, ecapa_loss=0.0001582, whisper_loss=0.09167, over 3873984.67 frames. ], batch size: 67, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:44:02,950 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2373730.0, ans=0.125 2024-08-13 23:44:05,090 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 24 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-13 23:44:08,145 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2373730.0, ans=0.04949747468305833 2024-08-13 23:44:09,989 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2373730.0, ans=0.125 2024-08-13 23:44:22,607 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2373830.0, ans=0.2 2024-08-13 23:44:43,486 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2373930.0, ans=0.1 2024-08-13 23:44:46,534 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2374030.0, ans=0.2 2024-08-13 23:45:14,983 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 5550, loss[loss=0.1003, beats_loss=0.008862, ecapa_loss=0.0001809, whisper_loss=0.08966, over 17515.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01085, ecapa_loss=0.0001584, whisper_loss=0.09123, over 3880838.23 frames. ], batch size: 71, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:45:21,163 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2374230.0, ans=0.0 2024-08-13 23:45:35,535 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2374330.0, ans=0.125 2024-08-13 23:45:35,599 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2374330.0, ans=0.1 2024-08-13 23:45:38,690 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 14 from LS+wenet, 30 from Vox, 28 fro AS 2024-08-13 23:45:39,003 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2374330.0, ans=0.0 2024-08-13 23:45:50,234 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2374430.0, ans=0.09899494936611666 2024-08-13 23:45:51,020 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.017e+01 2.310e+01 2.523e+01 2.896e+01 4.190e+01, threshold=5.046e+01, percent-clipped=0.0 2024-08-13 23:45:56,519 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2374430.0, ans=0.1 2024-08-13 23:45:57,461 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 20 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-13 23:46:00,563 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2374530.0, ans=0.125 2024-08-13 23:46:02,012 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2374530.0, ans=0.0 2024-08-13 23:46:07,210 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 15 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-13 23:46:26,371 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 5600, loss[loss=0.1025, beats_loss=0.01074, ecapa_loss=0.0001667, whisper_loss=0.09013, over 16720.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01087, ecapa_loss=0.0001595, whisper_loss=0.09085, over 3882053.80 frames. ], batch size: 67, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:46:33,464 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-13 23:46:33,753 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2374730.0, ans=0.125 2024-08-13 23:46:44,065 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2374830.0, ans=0.2 2024-08-13 23:47:11,197 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 14 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-13 23:47:23,509 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2375030.0, ans=0.1 2024-08-13 23:47:28,375 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 20 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-13 23:47:39,936 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 5650, loss[loss=0.1215, beats_loss=0.007987, ecapa_loss=0.0001743, whisper_loss=0.1118, over 19833.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01089, ecapa_loss=0.0001601, whisper_loss=0.09059, over 3912576.00 frames. ], batch size: 77, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:47:44,095 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 21 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-13 23:48:13,901 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.753e+01 2.432e+01 2.622e+01 2.958e+01 1.611e+02, threshold=5.244e+01, percent-clipped=2.0 2024-08-13 23:48:16,709 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 22 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-13 23:48:26,199 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2375530.0, ans=0.07 2024-08-13 23:48:32,560 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 22 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-13 23:48:37,723 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 25 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-13 23:48:45,320 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2375730.0, ans=0.0 2024-08-13 23:48:46,065 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 5700, loss[loss=0.1118, beats_loss=0.01084, ecapa_loss=0.0001617, whisper_loss=0.09933, over 19520.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01088, ecapa_loss=0.0001603, whisper_loss=0.0902, over 3911083.30 frames. ], batch size: 76, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:48:49,256 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 28 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-13 23:48:57,210 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2375730.0, ans=0.2 2024-08-13 23:49:10,838 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-13 23:49:20,674 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 25 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-13 23:49:25,294 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.98 vs. limit=10.0 2024-08-13 23:49:29,350 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 24 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-13 23:49:34,516 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.45 vs. limit=10.0 2024-08-13 23:49:52,424 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-13 23:49:56,026 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2376230.0, ans=0.09899494936611666 2024-08-13 23:49:57,085 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 5750, loss[loss=0.083, beats_loss=0.01071, ecapa_loss=0.0001587, whisper_loss=0.0707, over 19915.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01087, ecapa_loss=0.0001608, whisper_loss=0.09047, over 3907593.44 frames. ], batch size: 81, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:49:58,938 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.931e+01 2024-08-13 23:50:06,900 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 16 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-13 23:50:17,745 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2376330.0, ans=0.1 2024-08-13 23:50:19,982 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 18 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-13 23:50:32,593 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.912e+01 2.376e+01 2.677e+01 2.886e+01 5.408e+01, threshold=5.355e+01, percent-clipped=1.0 2024-08-13 23:50:49,135 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2376530.0, ans=0.125 2024-08-13 23:50:56,948 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2376630.0, ans=0.0 2024-08-13 23:51:06,517 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 21 from LS+wenet, 32 from Vox, 41 fro AS 2024-08-13 23:51:08,979 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.25 vs. limit=22.5 2024-08-13 23:51:09,479 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 5800, loss[loss=0.08979, beats_loss=0.01183, ecapa_loss=0.0001319, whisper_loss=0.07664, over 17822.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01088, ecapa_loss=0.0001603, whisper_loss=0.0905, over 3870751.74 frames. ], batch size: 68, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:51:31,984 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 33 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-13 23:51:33,551 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2376830.0, ans=0.125 2024-08-13 23:51:34,929 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 21 from LS+wenet, 26 from Vox, 45 fro AS 2024-08-13 23:51:39,401 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2376930.0, ans=0.2 2024-08-13 23:51:40,530 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2376930.0, ans=0.1 2024-08-13 23:51:51,105 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 20 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-13 23:51:53,987 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2377030.0, ans=0.1 2024-08-13 23:51:54,977 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 26 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-13 23:52:18,452 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 5850, loss[loss=0.1095, beats_loss=0.01168, ecapa_loss=0.0001304, whisper_loss=0.09652, over 22871.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01086, ecapa_loss=0.0001603, whisper_loss=0.09127, over 3894543.62 frames. ], batch size: 88, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:52:40,807 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2377330.0, ans=0.125 2024-08-13 23:52:44,431 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 15 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-13 23:52:50,899 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.427e+01 2.667e+01 3.028e+01 6.435e+01, threshold=5.335e+01, percent-clipped=1.0 2024-08-13 23:52:51,391 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2377430.0, ans=0.0 2024-08-13 23:53:00,419 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-13 23:53:01,668 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2377530.0, ans=0.125 2024-08-13 23:53:05,608 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 15 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-13 23:53:10,073 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2377630.0, ans=0.1 2024-08-13 23:53:23,669 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 5900, loss[loss=0.09303, beats_loss=0.01053, ecapa_loss=0.000143, whisper_loss=0.08107, over 15326.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01087, ecapa_loss=0.0001596, whisper_loss=0.09063, over 3834483.13 frames. ], batch size: 59, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:53:23,855 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 21 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-13 23:53:25,597 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.54 vs. limit=15.0 2024-08-13 23:53:30,930 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2377730.0, ans=0.2 2024-08-13 23:53:35,829 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2377830.0, ans=0.05 2024-08-13 23:53:46,215 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2377830.0, ans=0.125 2024-08-13 23:54:15,980 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2378130.0, ans=0.125 2024-08-13 23:54:23,363 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2378130.0, ans=10.0 2024-08-13 23:54:28,239 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 5950, loss[loss=0.09452, beats_loss=0.01142, ecapa_loss=0.0001531, whisper_loss=0.08157, over 13828.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01091, ecapa_loss=0.0001597, whisper_loss=0.09083, over 3862805.81 frames. ], batch size: 55, lr: 3.69e-03, grad_scale: 1.152921504606847e+18 2024-08-13 23:54:38,889 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2378230.0, ans=0.025 2024-08-13 23:54:42,541 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 18 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-13 23:54:57,874 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 37 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-13 23:55:00,186 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.905e+01 2.346e+01 2.593e+01 2.833e+01 5.502e+01, threshold=5.186e+01, percent-clipped=1.0 2024-08-13 23:55:10,299 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2378530.0, ans=0.0 2024-08-13 23:55:15,413 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.68 vs. limit=22.5 2024-08-13 23:55:27,075 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.15 vs. limit=15.0 2024-08-13 23:55:32,588 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 6000, loss[loss=0.09912, beats_loss=0.01168, ecapa_loss=0.0001723, whisper_loss=0.08572, over 17829.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01083, ecapa_loss=0.0001598, whisper_loss=0.09132, over 3886617.10 frames. ], batch size: 74, lr: 3.69e-03, grad_scale: 1.152921504606847e+18 2024-08-13 23:55:32,588 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-13 23:56:14,170 INFO [train_multi_KD3.py:1149] (1/4) Epoch 17, validation on ASR_libri: loss=0.2527, beats_loss=0, ecapa_loss=0.0005558, whisper_loss=0.2472, over 922467.00 frames. 2024-08-13 23:56:35,071 INFO [train_multi_KD3.py:1149] (1/4) Epoch 17, validation on SV_voxceleb1: loss=0.004377, beats_loss=0, ecapa_loss=0.0004377, whisper_loss=0, over 939242.00 frames. 2024-08-13 23:58:33,222 INFO [train_multi_KD3.py:1149] (1/4) Epoch 17, validation on AT_audioset: loss=0.02362, beats_loss=0.02362, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 23:58:33,226 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 30639MB 2024-08-13 23:58:42,507 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 22 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-13 23:58:52,856 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2378830.0, ans=0.1 2024-08-13 23:58:56,464 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 17 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-13 23:59:06,191 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 22 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-13 23:59:08,320 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2378930.0, ans=0.2 2024-08-13 23:59:20,709 WARNING [optim.py:496] (1/4) Scaling gradients by 0.09559547156095505, model_norm_threshold=51.8635368347168 2024-08-13 23:59:20,945 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.26, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.554e+04, grad_sumsq=7.554e+04, orig_rms_sq=1.000e+00 2024-08-13 23:59:27,363 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2379030.0, ans=0.125 2024-08-13 23:59:27,477 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2379030.0, ans=0.125 2024-08-13 23:59:31,147 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-13 23:59:44,458 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 6050, loss[loss=0.125, beats_loss=0.0104, ecapa_loss=0.0001545, whisper_loss=0.1131, over 20998.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01087, ecapa_loss=0.0001597, whisper_loss=0.09082, over 3880748.33 frames. ], batch size: 81, lr: 3.69e-03, grad_scale: 1.152921504606847e+18 2024-08-13 23:59:46,132 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 22 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-13 23:59:53,557 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2379230.0, ans=0.0 2024-08-13 23:59:54,561 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 22 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-14 00:00:07,972 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2379330.0, ans=0.0 2024-08-14 00:00:15,654 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2379430.0, ans=0.125 2024-08-14 00:00:18,027 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 20 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-14 00:00:20,910 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.813e+01 2.343e+01 2.535e+01 2.756e+01 5.425e+02, threshold=5.070e+01, percent-clipped=3.0 2024-08-14 00:00:42,853 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2379630.0, ans=0.2 2024-08-14 00:00:44,455 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2379630.0, ans=0.125 2024-08-14 00:00:49,414 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2379630.0, ans=0.1 2024-08-14 00:00:57,828 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 6100, loss[loss=0.1047, beats_loss=0.009655, ecapa_loss=0.0001758, whisper_loss=0.09329, over 22762.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0109, ecapa_loss=0.0001595, whisper_loss=0.09107, over 3906111.43 frames. ], batch size: 93, lr: 3.69e-03, grad_scale: 1.152921504606847e+18 2024-08-14 00:01:19,427 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=2379830.0, ans=15.0 2024-08-14 00:01:20,030 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 18 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-14 00:01:23,750 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.79 vs. limit=22.5 2024-08-14 00:01:24,806 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2379930.0, ans=0.125 2024-08-14 00:01:36,898 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=2379930.0, ans=22.5 2024-08-14 00:01:40,643 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 00:01:49,117 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 24 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-14 00:01:58,828 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2380130.0, ans=0.0 2024-08-14 00:02:04,680 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 6150, loss[loss=0.09168, beats_loss=0.0115, ecapa_loss=0.0001541, whisper_loss=0.07864, over 22416.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01083, ecapa_loss=0.0001601, whisper_loss=0.09156, over 3907265.68 frames. ], batch size: 91, lr: 3.69e-03, grad_scale: 1.152921504606847e+18 2024-08-14 00:02:12,535 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2380230.0, ans=0.0 2024-08-14 00:02:23,908 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2380330.0, ans=0.04949747468305833 2024-08-14 00:02:29,354 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2380430.0, ans=0.025 2024-08-14 00:02:31,030 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.37 vs. limit=15.0 2024-08-14 00:02:32,904 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 16 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-14 00:02:36,935 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.905e+01 2.475e+01 2.774e+01 3.233e+01 4.746e+01, threshold=5.548e+01, percent-clipped=1.0 2024-08-14 00:02:55,349 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 23 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-14 00:02:56,944 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2380630.0, ans=0.125 2024-08-14 00:03:03,072 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.730e-02 2024-08-14 00:03:09,099 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 6200, loss[loss=0.1276, beats_loss=0.008858, ecapa_loss=0.0001709, whisper_loss=0.117, over 23378.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01081, ecapa_loss=0.0001606, whisper_loss=0.09123, over 3892349.61 frames. ], batch size: 92, lr: 3.69e-03, grad_scale: 1.152921504606847e+18 2024-08-14 00:03:18,177 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.08 vs. limit=15.0 2024-08-14 00:03:35,422 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2380930.0, ans=0.0 2024-08-14 00:03:38,509 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.05 vs. limit=12.0 2024-08-14 00:03:46,863 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2380930.0, ans=0.2 2024-08-14 00:03:55,076 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 00:04:15,105 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 6250, loss[loss=0.1059, beats_loss=0.01053, ecapa_loss=0.0001598, whisper_loss=0.09378, over 19081.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01078, ecapa_loss=0.0001615, whisper_loss=0.09092, over 3877553.84 frames. ], batch size: 78, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:04:15,550 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2381230.0, ans=0.125 2024-08-14 00:04:18,102 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2381230.0, ans=0.125 2024-08-14 00:04:20,620 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.048e-02 2024-08-14 00:04:23,062 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2381230.0, ans=0.2 2024-08-14 00:04:38,568 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 34 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-14 00:04:48,484 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.400e+01 2.693e+01 3.116e+01 1.076e+02, threshold=5.386e+01, percent-clipped=3.0 2024-08-14 00:05:00,738 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2381530.0, ans=0.125 2024-08-14 00:05:19,006 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2381730.0, ans=0.125 2024-08-14 00:05:19,774 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 6300, loss[loss=0.1182, beats_loss=0.01066, ecapa_loss=0.0001559, whisper_loss=0.106, over 23439.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01078, ecapa_loss=0.0001606, whisper_loss=0.09124, over 3888391.21 frames. ], batch size: 92, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:05:20,249 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2381730.0, ans=0.125 2024-08-14 00:05:22,818 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.37 vs. limit=12.0 2024-08-14 00:05:50,324 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2381930.0, ans=0.125 2024-08-14 00:05:51,813 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2381930.0, ans=0.05 2024-08-14 00:06:06,606 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.71 vs. limit=15.0 2024-08-14 00:06:13,440 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 26 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-14 00:06:13,747 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2382130.0, ans=0.125 2024-08-14 00:06:16,116 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 15 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-14 00:06:21,864 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.55 vs. limit=6.0 2024-08-14 00:06:23,694 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 6350, loss[loss=0.1128, beats_loss=0.01249, ecapa_loss=0.0001349, whisper_loss=0.099, over 16953.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01081, ecapa_loss=0.0001614, whisper_loss=0.09156, over 3863136.25 frames. ], batch size: 66, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:06:34,807 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.33 vs. limit=22.5 2024-08-14 00:06:38,204 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 23 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-14 00:06:47,465 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=2382330.0, ans=0.05 2024-08-14 00:06:57,628 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.923e+01 2.344e+01 2.620e+01 2.945e+01 1.011e+02, threshold=5.239e+01, percent-clipped=2.0 2024-08-14 00:07:02,553 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.79 vs. limit=6.0 2024-08-14 00:07:28,799 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 6400, loss[loss=0.1273, beats_loss=0.009052, ecapa_loss=0.0002014, whisper_loss=0.1162, over 17059.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01087, ecapa_loss=0.0001602, whisper_loss=0.09117, over 3856109.84 frames. ], batch size: 68, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:07:31,315 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 15 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-14 00:07:46,232 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2382830.0, ans=0.125 2024-08-14 00:07:55,557 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2382930.0, ans=0.2 2024-08-14 00:08:05,937 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2382930.0, ans=0.07 2024-08-14 00:08:23,361 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2383130.0, ans=0.125 2024-08-14 00:08:29,918 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2383130.0, ans=0.125 2024-08-14 00:08:31,625 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2383130.0, ans=0.125 2024-08-14 00:08:34,034 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 6450, loss[loss=0.09662, beats_loss=0.01212, ecapa_loss=0.000155, whisper_loss=0.08295, over 22493.00 frames. ], tot_loss[loss=0.104, beats_loss=0.0109, ecapa_loss=0.0001602, whisper_loss=0.09148, over 3918282.25 frames. ], batch size: 90, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:08:36,915 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2383230.0, ans=0.07 2024-08-14 00:08:54,842 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2383330.0, ans=0.1 2024-08-14 00:09:02,758 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.14 vs. limit=12.0 2024-08-14 00:09:03,659 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2383430.0, ans=0.07 2024-08-14 00:09:06,867 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.325e+01 2.600e+01 2.932e+01 4.417e+01, threshold=5.200e+01, percent-clipped=0.0 2024-08-14 00:09:09,982 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2383430.0, ans=0.125 2024-08-14 00:09:14,408 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2383530.0, ans=0.125 2024-08-14 00:09:18,466 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.20 vs. limit=15.0 2024-08-14 00:09:35,853 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2383630.0, ans=0.125 2024-08-14 00:09:37,826 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 6500, loss[loss=0.1136, beats_loss=0.01155, ecapa_loss=0.0001626, whisper_loss=0.1004, over 18453.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01091, ecapa_loss=0.0001595, whisper_loss=0.09183, over 3924016.40 frames. ], batch size: 74, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:09:43,337 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2383730.0, ans=0.125 2024-08-14 00:10:30,108 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2384130.0, ans=0.125 2024-08-14 00:10:34,655 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 31 from LS+wenet, 12 from Vox, 41 fro AS 2024-08-14 00:10:41,698 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 6550, loss[loss=0.1453, beats_loss=0.007758, ecapa_loss=0.0001798, whisper_loss=0.1358, over 23585.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01087, ecapa_loss=0.0001598, whisper_loss=0.09234, over 3964203.68 frames. ], batch size: 90, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:11:11,637 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-14 00:11:15,105 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.910e+01 2.424e+01 2.648e+01 2.996e+01 4.448e+01, threshold=5.297e+01, percent-clipped=0.0 2024-08-14 00:11:20,242 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 30 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-14 00:11:21,791 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2384530.0, ans=0.2 2024-08-14 00:11:30,795 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 24 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-14 00:11:33,242 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 26 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-14 00:11:45,380 WARNING [optim.py:496] (1/4) Scaling gradients by 0.09902217984199524, model_norm_threshold=52.96651840209961 2024-08-14 00:11:45,548 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.30, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.471e+04, grad_sumsq=8.471e+04, orig_rms_sq=1.000e+00 2024-08-14 00:11:45,575 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 6600, loss[loss=0.07711, beats_loss=0.01193, ecapa_loss=0.0001617, whisper_loss=0.06356, over 14021.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01083, ecapa_loss=0.0001599, whisper_loss=0.09154, over 3936904.02 frames. ], batch size: 57, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:12:04,903 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2384830.0, ans=0.125 2024-08-14 00:12:05,900 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 18 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-14 00:12:16,217 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2384930.0, ans=0.0 2024-08-14 00:12:32,907 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2385030.0, ans=0.125 2024-08-14 00:12:48,489 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=4.374e-01 2024-08-14 00:12:49,209 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 6650, loss[loss=0.08939, beats_loss=0.01217, ecapa_loss=0.0001364, whisper_loss=0.07586, over 18459.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01078, ecapa_loss=0.0001619, whisper_loss=0.09137, over 3941523.11 frames. ], batch size: 71, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:12:49,375 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 37 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-14 00:13:12,006 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 27 from LS+wenet, 10 from Vox, 19 fro AS 2024-08-14 00:13:16,370 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.07 vs. limit=22.5 2024-08-14 00:13:22,214 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.754e+01 2.456e+01 2.724e+01 3.056e+01 5.349e+02, threshold=5.448e+01, percent-clipped=1.0 2024-08-14 00:13:40,730 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2385630.0, ans=0.07 2024-08-14 00:13:44,181 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-14 00:13:53,117 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 6700, loss[loss=0.0937, beats_loss=0.01103, ecapa_loss=0.0001426, whisper_loss=0.08125, over 22489.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01081, ecapa_loss=0.0001624, whisper_loss=0.09137, over 3924020.31 frames. ], batch size: 89, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:14:12,147 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 26 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-14 00:14:23,257 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.36 vs. limit=22.5 2024-08-14 00:14:26,775 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2385930.0, ans=0.125 2024-08-14 00:14:35,785 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2386030.0, ans=0.125 2024-08-14 00:14:57,208 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.82 vs. limit=12.0 2024-08-14 00:14:57,574 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 6750, loss[loss=0.1113, beats_loss=0.008738, ecapa_loss=0.0001704, whisper_loss=0.1009, over 20599.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01077, ecapa_loss=0.0001614, whisper_loss=0.09151, over 3903135.38 frames. ], batch size: 79, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:15:18,738 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-14 00:15:25,072 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 31 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-14 00:15:26,917 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.85 vs. limit=22.5 2024-08-14 00:15:31,285 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.840e+01 2.377e+01 2.658e+01 2.891e+01 6.359e+01, threshold=5.316e+01, percent-clipped=1.0 2024-08-14 00:15:35,632 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2386530.0, ans=0.125 2024-08-14 00:15:41,021 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 26 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-14 00:15:45,923 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 21 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-14 00:15:52,559 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2386630.0, ans=0.125 2024-08-14 00:15:52,875 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.09 vs. limit=15.0 2024-08-14 00:16:02,321 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 6800, loss[loss=0.1103, beats_loss=0.01192, ecapa_loss=0.0001552, whisper_loss=0.09683, over 20389.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01073, ecapa_loss=0.0001621, whisper_loss=0.092, over 3884322.01 frames. ], batch size: 79, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:16:03,216 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.18 vs. limit=15.0 2024-08-14 00:16:03,724 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-14 00:16:29,818 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.40 vs. limit=22.5 2024-08-14 00:16:34,754 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 15 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-14 00:16:57,543 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 17 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-14 00:17:06,493 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 6850, loss[loss=0.1021, beats_loss=0.01041, ecapa_loss=0.0001481, whisper_loss=0.09025, over 19240.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01073, ecapa_loss=0.0001617, whisper_loss=0.09134, over 3879586.95 frames. ], batch size: 77, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:17:22,651 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2387330.0, ans=0.125 2024-08-14 00:17:41,085 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.806e+01 2.424e+01 2.658e+01 2.894e+01 9.462e+01, threshold=5.316e+01, percent-clipped=2.0 2024-08-14 00:17:49,167 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2387530.0, ans=0.125 2024-08-14 00:17:51,883 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2387530.0, ans=0.2 2024-08-14 00:17:55,251 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 25 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-14 00:18:05,836 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 31 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-14 00:18:11,966 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 6900, loss[loss=0.0847, beats_loss=0.01167, ecapa_loss=0.0001568, whisper_loss=0.07146, over 13734.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01073, ecapa_loss=0.0001635, whisper_loss=0.09097, over 3840199.36 frames. ], batch size: 55, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:18:12,177 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 34 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-14 00:18:19,120 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-14 00:18:43,011 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.00 vs. limit=15.0 2024-08-14 00:18:43,503 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 14 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-14 00:19:08,728 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2388130.0, ans=0.0 2024-08-14 00:19:20,612 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 6950, loss[loss=0.1057, beats_loss=0.0126, ecapa_loss=0.0001635, whisper_loss=0.09146, over 20348.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01076, ecapa_loss=0.0001633, whisper_loss=0.091, over 3835242.60 frames. ], batch size: 79, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:19:22,334 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2388230.0, ans=0.2 2024-08-14 00:19:57,553 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 28 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-14 00:20:00,389 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.095e+01 2.466e+01 2.702e+01 3.028e+01 4.381e+01, threshold=5.405e+01, percent-clipped=0.0 2024-08-14 00:20:06,708 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 24 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-14 00:20:13,838 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2388530.0, ans=0.125 2024-08-14 00:20:17,666 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-14 00:20:22,944 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2388630.0, ans=0.1 2024-08-14 00:20:32,419 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2388630.0, ans=0.125 2024-08-14 00:20:37,006 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2388730.0, ans=0.2 2024-08-14 00:20:38,018 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 7000, loss[loss=0.1152, beats_loss=0.01253, ecapa_loss=0.0001481, whisper_loss=0.1011, over 22036.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01075, ecapa_loss=0.0001629, whisper_loss=0.09129, over 3848886.47 frames. ], batch size: 89, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:20:55,517 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2388830.0, ans=0.125 2024-08-14 00:20:58,547 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 20 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-14 00:20:58,690 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2388830.0, ans=0.125 2024-08-14 00:21:03,798 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2388830.0, ans=0.0 2024-08-14 00:21:08,382 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2388930.0, ans=0.1 2024-08-14 00:21:21,869 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2388930.0, ans=0.0 2024-08-14 00:21:34,899 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2389030.0, ans=0.125 2024-08-14 00:21:41,266 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 00:21:48,898 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 15 from Vox, 48 fro AS 2024-08-14 00:21:58,309 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 7050, loss[loss=0.1096, beats_loss=0.01199, ecapa_loss=0.0001345, whisper_loss=0.09625, over 14006.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01082, ecapa_loss=0.0001621, whisper_loss=0.09104, over 3849786.96 frames. ], batch size: 53, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:22:01,034 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2389230.0, ans=0.125 2024-08-14 00:22:11,696 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.50 vs. limit=15.0 2024-08-14 00:22:14,310 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2389330.0, ans=0.1 2024-08-14 00:22:42,088 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.720e+01 2.266e+01 2.592e+01 2.903e+01 1.485e+02, threshold=5.183e+01, percent-clipped=2.0 2024-08-14 00:22:42,967 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.65 vs. limit=15.0 2024-08-14 00:22:45,530 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 20 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-14 00:22:45,759 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2389430.0, ans=0.125 2024-08-14 00:23:09,761 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2389630.0, ans=0.2 2024-08-14 00:23:18,275 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 24 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-14 00:23:19,720 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 7100, loss[loss=0.1021, beats_loss=0.01035, ecapa_loss=0.0001579, whisper_loss=0.09021, over 18967.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0108, ecapa_loss=0.000162, whisper_loss=0.09076, over 3867800.20 frames. ], batch size: 75, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:23:27,816 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2389730.0, ans=0.125 2024-08-14 00:23:40,697 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.83 vs. limit=6.0 2024-08-14 00:23:41,356 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 35 from LS+wenet, 12 from Vox, 42 fro AS 2024-08-14 00:23:56,281 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 19 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-14 00:23:58,403 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.54 vs. limit=15.0 2024-08-14 00:23:59,293 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2389930.0, ans=0.0 2024-08-14 00:24:02,319 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2389930.0, ans=0.125 2024-08-14 00:24:18,499 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.38 vs. limit=15.0 2024-08-14 00:24:39,422 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 7150, loss[loss=0.07658, beats_loss=0.01197, ecapa_loss=0.0001202, whisper_loss=0.06341, over 16578.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01088, ecapa_loss=0.0001599, whisper_loss=0.09056, over 3864879.38 frames. ], batch size: 66, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:25:01,198 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.51 vs. limit=15.0 2024-08-14 00:25:20,417 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.006e+01 2.385e+01 2.638e+01 3.035e+01 4.278e+01, threshold=5.277e+01, percent-clipped=0.0 2024-08-14 00:25:21,072 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2390430.0, ans=0.125 2024-08-14 00:25:22,876 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.88 vs. limit=22.5 2024-08-14 00:25:36,871 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2390530.0, ans=0.5 2024-08-14 00:25:52,240 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 34 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-14 00:25:57,498 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 7200, loss[loss=0.08545, beats_loss=0.01464, ecapa_loss=0.0001304, whisper_loss=0.0695, over 23166.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01087, ecapa_loss=0.0001598, whisper_loss=0.09033, over 3863384.92 frames. ], batch size: 94, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:26:09,433 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 17 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-14 00:26:17,543 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 20 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-14 00:26:17,857 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2390830.0, ans=0.125 2024-08-14 00:26:19,190 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2390830.0, ans=0.125 2024-08-14 00:26:20,151 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 16 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-14 00:26:34,547 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2390930.0, ans=0.0 2024-08-14 00:26:37,815 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.72 vs. limit=22.5 2024-08-14 00:26:55,754 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.34 vs. limit=22.5 2024-08-14 00:26:58,954 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 31 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-14 00:27:10,400 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 22 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-14 00:27:14,490 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 7250, loss[loss=0.08188, beats_loss=0.01268, ecapa_loss=0.0001562, whisper_loss=0.06764, over 21926.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01088, ecapa_loss=0.0001591, whisper_loss=0.09008, over 3900930.58 frames. ], batch size: 91, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:27:15,265 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2391230.0, ans=0.1 2024-08-14 00:27:27,561 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 24 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-14 00:27:35,535 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2391330.0, ans=0.05 2024-08-14 00:27:50,620 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2391430.0, ans=0.125 2024-08-14 00:27:55,095 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.401e+01 2.589e+01 2.911e+01 7.095e+01, threshold=5.179e+01, percent-clipped=1.0 2024-08-14 00:27:56,108 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 21 from LS+wenet, 24 from Vox, 19 fro AS 2024-08-14 00:27:57,726 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2391430.0, ans=0.0 2024-08-14 00:27:59,843 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.52 vs. limit=15.0 2024-08-14 00:28:33,704 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 7300, loss[loss=0.09512, beats_loss=0.013, ecapa_loss=9.861e-05, whisper_loss=0.08113, over 17728.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01081, ecapa_loss=0.000159, whisper_loss=0.0908, over 3898481.30 frames. ], batch size: 67, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:28:56,913 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2391830.0, ans=0.125 2024-08-14 00:29:07,570 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 39 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-14 00:29:12,568 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-14 00:29:17,437 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.00 vs. limit=15.0 2024-08-14 00:29:29,578 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2392030.0, ans=0.0 2024-08-14 00:29:38,639 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2392130.0, ans=0.125 2024-08-14 00:29:46,599 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.10 vs. limit=15.0 2024-08-14 00:29:47,688 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 27 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-14 00:29:50,857 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 7350, loss[loss=0.1054, beats_loss=0.009944, ecapa_loss=0.0001387, whisper_loss=0.0941, over 16151.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01073, ecapa_loss=0.0001604, whisper_loss=0.09095, over 3900185.41 frames. ], batch size: 63, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:29:57,136 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2392230.0, ans=0.2 2024-08-14 00:30:16,078 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2392330.0, ans=0.125 2024-08-14 00:30:19,140 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2392330.0, ans=0.125 2024-08-14 00:30:24,079 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2392430.0, ans=0.125 2024-08-14 00:30:32,261 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.930e+01 2.397e+01 2.587e+01 2.821e+01 4.137e+01, threshold=5.175e+01, percent-clipped=0.0 2024-08-14 00:31:12,989 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 7400, loss[loss=0.1026, beats_loss=0.01047, ecapa_loss=0.0001791, whisper_loss=0.09029, over 15523.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01077, ecapa_loss=0.0001603, whisper_loss=0.09044, over 3857117.08 frames. ], batch size: 60, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:31:17,620 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 30 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-14 00:31:20,886 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2392730.0, ans=0.125 2024-08-14 00:31:45,626 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2392930.0, ans=0.2 2024-08-14 00:31:49,444 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2392930.0, ans=0.2 2024-08-14 00:31:54,309 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-14 00:31:56,353 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2392930.0, ans=0.125 2024-08-14 00:32:04,061 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 34 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-14 00:32:10,606 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 27 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-14 00:32:18,806 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2393130.0, ans=0.0 2024-08-14 00:32:34,300 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 7450, loss[loss=0.1043, beats_loss=0.01056, ecapa_loss=0.0001924, whisper_loss=0.09177, over 21774.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01074, ecapa_loss=0.0001601, whisper_loss=0.09095, over 3853202.60 frames. ], batch size: 92, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:32:57,006 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2393330.0, ans=10.0 2024-08-14 00:33:19,150 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.758e+01 2.395e+01 2.642e+01 3.080e+01 4.669e+01, threshold=5.285e+01, percent-clipped=0.0 2024-08-14 00:33:24,706 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2393530.0, ans=0.0 2024-08-14 00:33:45,785 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 24 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-14 00:34:13,186 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2393630.0, ans=0.2 2024-08-14 00:34:17,295 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2393630.0, ans=0.125 2024-08-14 00:34:24,308 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 7500, loss[loss=0.09552, beats_loss=0.009939, ecapa_loss=0.0002365, whisper_loss=0.08321, over 21382.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01071, ecapa_loss=0.0001606, whisper_loss=0.09132, over 3866111.58 frames. ], batch size: 94, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:34:25,836 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2393730.0, ans=0.015 2024-08-14 00:34:29,319 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 23 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-14 00:34:29,739 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.32 vs. limit=22.5 2024-08-14 00:34:46,515 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2393830.0, ans=0.1 2024-08-14 00:34:54,821 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=5.880e-01 2024-08-14 00:34:59,882 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2393930.0, ans=0.0 2024-08-14 00:35:14,204 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 21 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-14 00:35:16,311 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.50 vs. limit=15.0 2024-08-14 00:35:34,155 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2394130.0, ans=0.125 2024-08-14 00:35:36,128 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.81 vs. limit=15.0 2024-08-14 00:35:38,724 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2394130.0, ans=0.0 2024-08-14 00:35:44,482 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 7550, loss[loss=0.08569, beats_loss=0.01073, ecapa_loss=0.000195, whisper_loss=0.07302, over 15883.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01068, ecapa_loss=0.0001618, whisper_loss=0.09098, over 3839056.27 frames. ], batch size: 66, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:35:46,135 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 17 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-14 00:35:53,075 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 00:35:59,176 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2394330.0, ans=0.125 2024-08-14 00:36:00,505 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-14 00:36:00,782 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2394330.0, ans=0.0 2024-08-14 00:36:06,442 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 21 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-14 00:36:07,967 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2394330.0, ans=0.125 2024-08-14 00:36:12,585 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2394330.0, ans=0.0 2024-08-14 00:36:16,171 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 23 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-14 00:36:22,236 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2394430.0, ans=0.0 2024-08-14 00:36:24,780 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 15 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-14 00:36:26,058 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.957e+01 2.308e+01 2.563e+01 2.921e+01 3.982e+01, threshold=5.125e+01, percent-clipped=0.0 2024-08-14 00:36:28,649 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2394430.0, ans=0.0 2024-08-14 00:37:03,666 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 37 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-14 00:37:05,692 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 7600, loss[loss=0.1226, beats_loss=0.01024, ecapa_loss=0.0001372, whisper_loss=0.111, over 23858.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01065, ecapa_loss=0.0001607, whisper_loss=0.09116, over 3819974.46 frames. ], batch size: 91, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:37:06,133 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2394730.0, ans=0.1 2024-08-14 00:37:14,522 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 20 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-14 00:37:18,029 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.06 vs. limit=22.5 2024-08-14 00:37:31,217 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 25 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-14 00:37:41,048 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2394930.0, ans=0.0 2024-08-14 00:37:46,499 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 26 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-14 00:37:49,940 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 17 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-14 00:37:54,370 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2395030.0, ans=0.0 2024-08-14 00:37:59,033 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2395030.0, ans=0.0 2024-08-14 00:38:03,896 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 16 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-14 00:38:24,406 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 7650, loss[loss=0.1058, beats_loss=0.01138, ecapa_loss=0.0001446, whisper_loss=0.09299, over 23095.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01058, ecapa_loss=0.00016, whisper_loss=0.09118, over 3809833.91 frames. ], batch size: 91, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:38:25,102 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2395230.0, ans=0.1 2024-08-14 00:38:37,117 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2395230.0, ans=0.0 2024-08-14 00:38:37,131 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2395230.0, ans=0.125 2024-08-14 00:38:39,872 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2395330.0, ans=0.0 2024-08-14 00:38:46,436 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2395330.0, ans=0.125 2024-08-14 00:38:53,274 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 21 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-14 00:38:55,819 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2024-08-14 00:39:07,183 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.336e+01 2.593e+01 2.907e+01 5.798e+01, threshold=5.186e+01, percent-clipped=1.0 2024-08-14 00:39:09,069 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 25 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-14 00:39:20,486 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 16 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-14 00:39:30,253 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2395630.0, ans=0.125 2024-08-14 00:39:30,267 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2395630.0, ans=0.2 2024-08-14 00:39:32,415 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.43 vs. limit=22.5 2024-08-14 00:39:33,934 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2395630.0, ans=0.0 2024-08-14 00:39:35,181 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 25 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-14 00:39:38,758 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.39 vs. limit=15.0 2024-08-14 00:39:39,887 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 21 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-14 00:39:47,193 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 7700, loss[loss=0.1153, beats_loss=0.008353, ecapa_loss=0.0001595, whisper_loss=0.1053, over 19565.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01062, ecapa_loss=0.00016, whisper_loss=0.09121, over 3841137.80 frames. ], batch size: 71, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:39:47,922 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=2395730.0, ans=6.0 2024-08-14 00:40:04,116 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.55 vs. limit=12.0 2024-08-14 00:40:23,937 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2395930.0, ans=0.125 2024-08-14 00:40:23,970 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2395930.0, ans=0.125 2024-08-14 00:40:25,650 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.22 vs. limit=6.0 2024-08-14 00:40:28,006 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2395930.0, ans=0.125 2024-08-14 00:40:34,688 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2396030.0, ans=0.125 2024-08-14 00:40:51,996 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-14 00:40:52,296 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2396130.0, ans=0.1 2024-08-14 00:40:54,086 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2396130.0, ans=0.0 2024-08-14 00:41:04,794 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.76 vs. limit=22.5 2024-08-14 00:41:07,501 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 7750, loss[loss=0.1332, beats_loss=0.007541, ecapa_loss=0.0001597, whisper_loss=0.1241, over 24038.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01068, ecapa_loss=0.0001596, whisper_loss=0.09086, over 3849780.35 frames. ], batch size: 88, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:41:14,228 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 29 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-14 00:41:23,407 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2396330.0, ans=0.125 2024-08-14 00:41:41,440 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.58 vs. limit=15.0 2024-08-14 00:41:49,002 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.794e+01 2.485e+01 2.781e+01 3.099e+01 5.095e+01, threshold=5.562e+01, percent-clipped=0.0 2024-08-14 00:41:53,026 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-14 00:41:58,858 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2396530.0, ans=0.125 2024-08-14 00:42:00,020 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2396530.0, ans=0.125 2024-08-14 00:42:26,653 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 7800, loss[loss=0.1066, beats_loss=0.01197, ecapa_loss=0.0001865, whisper_loss=0.09276, over 21750.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01068, ecapa_loss=0.000159, whisper_loss=0.09129, over 3858537.79 frames. ], batch size: 91, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:43:15,058 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 16 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-14 00:43:32,532 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 15 from Vox, 49 fro AS 2024-08-14 00:43:49,893 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 7850, loss[loss=0.09349, beats_loss=0.01133, ecapa_loss=0.0001538, whisper_loss=0.08062, over 22957.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01066, ecapa_loss=0.0001595, whisper_loss=0.09108, over 3855954.48 frames. ], batch size: 92, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:43:51,503 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2397230.0, ans=0.125 2024-08-14 00:44:00,878 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 24 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-14 00:44:04,114 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 22 from LS+wenet, 33 from Vox, 19 fro AS 2024-08-14 00:44:13,128 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2397330.0, ans=0.125 2024-08-14 00:44:23,387 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2397430.0, ans=0.025 2024-08-14 00:44:24,980 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.16 vs. limit=15.0 2024-08-14 00:44:30,190 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.344e+01 2.594e+01 2.942e+01 8.076e+01, threshold=5.188e+01, percent-clipped=2.0 2024-08-14 00:44:33,375 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2397430.0, ans=0.2 2024-08-14 00:44:47,580 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 00:44:50,130 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 18 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-14 00:44:54,185 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.87 vs. limit=15.0 2024-08-14 00:45:00,515 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2397630.0, ans=0.0 2024-08-14 00:45:08,018 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.70 vs. limit=22.5 2024-08-14 00:45:08,978 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 7900, loss[loss=0.08835, beats_loss=0.009514, ecapa_loss=0.0001606, whisper_loss=0.07723, over 19260.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01068, ecapa_loss=0.0001603, whisper_loss=0.09077, over 3843347.69 frames. ], batch size: 80, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:45:10,432 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-14 00:45:10,628 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2397730.0, ans=0.0 2024-08-14 00:45:13,112 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2397730.0, ans=0.125 2024-08-14 00:45:20,766 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2397730.0, ans=0.04949747468305833 2024-08-14 00:45:30,902 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 27 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-14 00:45:40,693 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2397930.0, ans=0.2 2024-08-14 00:45:54,929 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2398030.0, ans=0.125 2024-08-14 00:46:27,243 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 7950, loss[loss=0.08211, beats_loss=0.01342, ecapa_loss=0.0001249, whisper_loss=0.06744, over 22446.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01067, ecapa_loss=0.0001599, whisper_loss=0.09061, over 3842684.15 frames. ], batch size: 89, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:46:33,685 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 24 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-14 00:46:34,845 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 28 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-14 00:46:44,664 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.72 vs. limit=15.0 2024-08-14 00:46:49,677 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 22 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-14 00:46:54,190 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2398330.0, ans=0.0 2024-08-14 00:46:56,741 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 24 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-14 00:46:59,860 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 27 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-14 00:47:00,524 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.48 vs. limit=10.0 2024-08-14 00:47:04,723 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2398430.0, ans=0.2 2024-08-14 00:47:06,098 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.004e+01 2.380e+01 2.671e+01 3.071e+01 4.593e+01, threshold=5.341e+01, percent-clipped=0.0 2024-08-14 00:47:16,117 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2398530.0, ans=0.125 2024-08-14 00:47:28,518 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2398630.0, ans=0.125 2024-08-14 00:47:36,245 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2398630.0, ans=0.0 2024-08-14 00:47:41,105 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 8000, loss[loss=0.09832, beats_loss=0.01241, ecapa_loss=0.000148, whisper_loss=0.08443, over 17112.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01069, ecapa_loss=0.000159, whisper_loss=0.091, over 3845421.93 frames. ], batch size: 68, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:47:50,887 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2398730.0, ans=0.0 2024-08-14 00:47:56,980 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-14 00:48:05,601 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 33 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-14 00:48:07,303 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2398830.0, ans=0.125 2024-08-14 00:48:28,231 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.82 vs. limit=10.0 2024-08-14 00:48:36,666 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-14 00:48:37,895 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 16 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-14 00:48:46,811 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 17 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-14 00:48:52,587 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 29 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-14 00:48:52,782 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2399130.0, ans=0.035 2024-08-14 00:48:52,862 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2399130.0, ans=0.1 2024-08-14 00:48:57,353 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 8050, loss[loss=0.09102, beats_loss=0.01396, ecapa_loss=0.0001411, whisper_loss=0.07564, over 20668.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01065, ecapa_loss=0.0001598, whisper_loss=0.09116, over 3852376.28 frames. ], batch size: 87, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:49:08,704 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2399230.0, ans=0.125 2024-08-14 00:49:18,516 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.72 vs. limit=15.0 2024-08-14 00:49:33,871 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.792e+01 2.422e+01 2.734e+01 3.214e+01 1.918e+02, threshold=5.469e+01, percent-clipped=2.0 2024-08-14 00:49:43,205 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2399530.0, ans=0.04949747468305833 2024-08-14 00:50:03,676 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 15 from Vox, 51 fro AS 2024-08-14 00:50:10,132 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 8100, loss[loss=0.08772, beats_loss=0.01144, ecapa_loss=0.0001613, whisper_loss=0.07467, over 22720.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01069, ecapa_loss=0.0001606, whisper_loss=0.09029, over 3846735.17 frames. ], batch size: 92, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:50:12,225 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.66 vs. limit=10.0 2024-08-14 00:50:37,576 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=8.947e+00 2024-08-14 00:51:01,401 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-14 00:51:10,321 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2400030.0, ans=0.1 2024-08-14 00:51:32,790 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 8150, loss[loss=0.1336, beats_loss=0.008656, ecapa_loss=0.0001299, whisper_loss=0.1236, over 18379.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0107, ecapa_loss=0.0001615, whisper_loss=0.09022, over 3847010.69 frames. ], batch size: 67, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:51:35,408 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-14 00:51:35,691 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2400230.0, ans=0.125 2024-08-14 00:51:49,706 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2400330.0, ans=0.125 2024-08-14 00:52:07,837 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2400430.0, ans=0.125 2024-08-14 00:52:13,552 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.374e+01 2.607e+01 2.976e+01 8.538e+01, threshold=5.213e+01, percent-clipped=1.0 2024-08-14 00:52:18,949 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2400530.0, ans=0.2 2024-08-14 00:52:24,372 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2400530.0, ans=0.1 2024-08-14 00:52:30,264 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 21 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-14 00:52:34,585 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 35 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-14 00:52:45,293 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=2400630.0, ans=10.0 2024-08-14 00:52:49,262 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 8200, loss[loss=0.1109, beats_loss=0.01191, ecapa_loss=0.0001831, whisper_loss=0.09716, over 20164.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0107, ecapa_loss=0.0001601, whisper_loss=0.09084, over 3863528.22 frames. ], batch size: 84, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:52:51,226 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.80 vs. limit=15.0 2024-08-14 00:53:04,732 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2400830.0, ans=0.125 2024-08-14 00:53:27,586 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 22 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-14 00:53:49,516 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 33 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-14 00:54:03,444 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2401130.0, ans=0.125 2024-08-14 00:54:06,649 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 8250, loss[loss=0.09355, beats_loss=0.01477, ecapa_loss=9.564e-05, whisper_loss=0.07783, over 24133.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01075, ecapa_loss=0.0001583, whisper_loss=0.09125, over 3860114.92 frames. ], batch size: 93, lr: 3.68e-03, grad_scale: 1.152921504606847e+18 2024-08-14 00:54:35,139 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 25 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-14 00:54:46,416 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.920e+01 2.415e+01 2.692e+01 3.047e+01 4.213e+01, threshold=5.383e+01, percent-clipped=0.0 2024-08-14 00:54:52,165 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2401530.0, ans=0.04949747468305833 2024-08-14 00:54:52,213 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2401530.0, ans=0.125 2024-08-14 00:55:00,110 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2401530.0, ans=0.0 2024-08-14 00:55:11,205 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 19 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-14 00:55:21,853 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 22 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-14 00:55:26,671 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 8300, loss[loss=0.1054, beats_loss=0.009775, ecapa_loss=0.0001724, whisper_loss=0.09391, over 20315.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01076, ecapa_loss=0.0001572, whisper_loss=0.09151, over 3900643.93 frames. ], batch size: 79, lr: 3.68e-03, grad_scale: 1.152921504606847e+18 2024-08-14 00:55:44,040 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2401830.0, ans=0.0 2024-08-14 00:56:04,056 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 24 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-14 00:56:06,066 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2401930.0, ans=0.1 2024-08-14 00:56:31,323 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2402030.0, ans=0.0 2024-08-14 00:56:48,407 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 25 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-14 00:56:52,033 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 8350, loss[loss=0.1091, beats_loss=0.01035, ecapa_loss=0.000178, whisper_loss=0.09696, over 17497.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01069, ecapa_loss=0.0001571, whisper_loss=0.09193, over 3885424.96 frames. ], batch size: 71, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 00:57:09,079 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2402330.0, ans=0.0 2024-08-14 00:57:13,355 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 21 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-14 00:57:18,487 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 12 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-14 00:57:28,285 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2402430.0, ans=0.125 2024-08-14 00:57:36,821 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.282e+01 2.635e+01 3.067e+01 5.691e+01, threshold=5.270e+01, percent-clipped=1.0 2024-08-14 00:57:55,318 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 00:58:05,942 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2402630.0, ans=0.0 2024-08-14 00:58:11,946 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2402630.0, ans=0.0 2024-08-14 00:58:18,324 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 8400, loss[loss=0.1117, beats_loss=0.01297, ecapa_loss=0.0001471, whisper_loss=0.09728, over 15340.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01064, ecapa_loss=0.0001591, whisper_loss=0.09198, over 3869913.89 frames. ], batch size: 60, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 00:58:23,622 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 22 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-14 00:58:29,517 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2402730.0, ans=0.0 2024-08-14 00:58:34,035 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 21 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-14 00:58:47,342 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.17 vs. limit=12.0 2024-08-14 00:59:02,091 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2402930.0, ans=0.125 2024-08-14 00:59:09,617 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2403030.0, ans=0.2 2024-08-14 00:59:27,133 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 24 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-14 00:59:28,753 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2403130.0, ans=0.2 2024-08-14 00:59:30,168 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 13 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-14 00:59:39,988 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2403130.0, ans=0.0 2024-08-14 00:59:42,683 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 8450, loss[loss=0.09007, beats_loss=0.01281, ecapa_loss=0.0001266, whisper_loss=0.076, over 13255.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01058, ecapa_loss=0.0001593, whisper_loss=0.09163, over 3852166.90 frames. ], batch size: 53, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:00:05,502 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.89 vs. limit=15.0 2024-08-14 01:00:12,548 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2403330.0, ans=0.0 2024-08-14 01:00:22,650 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2403430.0, ans=0.0 2024-08-14 01:00:25,863 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 32 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-14 01:00:26,974 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.360e+01 2.603e+01 2.918e+01 4.445e+01, threshold=5.206e+01, percent-clipped=0.0 2024-08-14 01:00:29,970 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 18 from LS+wenet, 18 from Vox, 55 fro AS 2024-08-14 01:00:31,238 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 21 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-14 01:00:33,370 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 01:00:34,346 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 27 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-14 01:00:38,391 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2403530.0, ans=0.1 2024-08-14 01:00:39,547 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-14 01:00:47,268 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2403530.0, ans=0.0 2024-08-14 01:00:50,206 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-14 01:01:08,682 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 8500, loss[loss=0.1083, beats_loss=0.009262, ecapa_loss=0.0001812, whisper_loss=0.09718, over 22637.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01063, ecapa_loss=0.0001592, whisper_loss=0.09146, over 3865122.98 frames. ], batch size: 92, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:01:22,119 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2403730.0, ans=0.0 2024-08-14 01:01:25,205 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2403830.0, ans=0.125 2024-08-14 01:01:36,056 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2403830.0, ans=0.125 2024-08-14 01:01:42,447 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 17 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-14 01:02:00,387 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.86 vs. limit=15.0 2024-08-14 01:02:22,969 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 21 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-14 01:02:25,333 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=2404130.0, ans=15.0 2024-08-14 01:02:33,765 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 8550, loss[loss=0.09869, beats_loss=0.01058, ecapa_loss=0.0001542, whisper_loss=0.08657, over 19090.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01066, ecapa_loss=0.0001588, whisper_loss=0.09185, over 3849318.08 frames. ], batch size: 74, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:02:34,232 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2404230.0, ans=0.125 2024-08-14 01:03:15,129 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2404430.0, ans=0.125 2024-08-14 01:03:18,039 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.347e+01 2.626e+01 2.928e+01 4.701e+01, threshold=5.252e+01, percent-clipped=0.0 2024-08-14 01:03:19,870 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 13 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-14 01:03:26,204 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.82 vs. limit=22.5 2024-08-14 01:03:38,137 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2404530.0, ans=0.125 2024-08-14 01:04:02,840 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 8600, loss[loss=0.1278, beats_loss=0.0101, ecapa_loss=0.000139, whisper_loss=0.1163, over 20747.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01071, ecapa_loss=0.0001581, whisper_loss=0.09148, over 3874736.23 frames. ], batch size: 78, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:04:03,098 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-14 01:04:11,716 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2404730.0, ans=0.125 2024-08-14 01:04:15,090 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2404730.0, ans=0.0 2024-08-14 01:04:30,075 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 16 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-14 01:04:40,770 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2404930.0, ans=0.125 2024-08-14 01:04:47,816 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2404930.0, ans=0.125 2024-08-14 01:04:55,839 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2405030.0, ans=0.125 2024-08-14 01:05:11,338 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2405130.0, ans=0.125 2024-08-14 01:05:11,432 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2405130.0, ans=0.04949747468305833 2024-08-14 01:05:23,161 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 37 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-14 01:05:24,976 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 01:05:29,637 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 8650, loss[loss=0.1073, beats_loss=0.01282, ecapa_loss=0.0001663, whisper_loss=0.09279, over 21790.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01073, ecapa_loss=0.000159, whisper_loss=0.0913, over 3879520.52 frames. ], batch size: 91, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:05:39,364 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 21 from LS+wenet, 23 from Vox, 47 fro AS 2024-08-14 01:05:50,670 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2405330.0, ans=0.1 2024-08-14 01:06:15,013 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.912e+01 2.307e+01 2.549e+01 2.918e+01 2.030e+02, threshold=5.098e+01, percent-clipped=1.0 2024-08-14 01:06:31,692 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2405530.0, ans=0.2 2024-08-14 01:06:41,133 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2405630.0, ans=0.0 2024-08-14 01:06:49,036 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 16 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-14 01:06:58,174 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 8700, loss[loss=0.1021, beats_loss=0.01125, ecapa_loss=0.0001257, whisper_loss=0.08958, over 15324.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01066, ecapa_loss=0.0001604, whisper_loss=0.09106, over 3855585.78 frames. ], batch size: 58, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:07:00,585 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2405730.0, ans=0.0 2024-08-14 01:07:14,807 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2405830.0, ans=0.125 2024-08-14 01:07:28,821 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2405830.0, ans=0.2 2024-08-14 01:07:58,510 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 20 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-14 01:08:00,473 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2406030.0, ans=0.125 2024-08-14 01:08:10,303 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 27 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-14 01:08:15,370 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2406130.0, ans=0.0 2024-08-14 01:08:22,226 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 8750, loss[loss=0.1131, beats_loss=0.009294, ecapa_loss=0.000162, whisper_loss=0.1022, over 22779.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01067, ecapa_loss=0.0001616, whisper_loss=0.09101, over 3860730.26 frames. ], batch size: 91, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:08:28,173 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2406230.0, ans=0.09899494936611666 2024-08-14 01:08:32,191 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.61 vs. limit=15.0 2024-08-14 01:08:58,029 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-14 01:09:14,439 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.356e+01 2.644e+01 3.033e+01 3.229e+02, threshold=5.288e+01, percent-clipped=1.0 2024-08-14 01:09:32,773 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2406530.0, ans=0.0 2024-08-14 01:09:42,804 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.27 vs. limit=15.0 2024-08-14 01:09:45,557 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.42 vs. limit=10.0 2024-08-14 01:09:47,143 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2406630.0, ans=0.0 2024-08-14 01:09:49,154 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=5.599e-01 2024-08-14 01:10:03,637 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2406730.0, ans=0.1 2024-08-14 01:10:05,263 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 8800, loss[loss=0.09127, beats_loss=0.01212, ecapa_loss=0.0001418, whisper_loss=0.07772, over 18220.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01072, ecapa_loss=0.0001599, whisper_loss=0.09106, over 3849801.51 frames. ], batch size: 75, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:10:09,846 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.91 vs. limit=12.0 2024-08-14 01:10:42,298 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 24 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-14 01:10:44,250 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2406930.0, ans=0.1 2024-08-14 01:10:48,742 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=2406930.0, ans=0.5 2024-08-14 01:10:55,334 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2406930.0, ans=0.125 2024-08-14 01:11:03,995 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2407030.0, ans=0.07 2024-08-14 01:11:38,168 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.27 vs. limit=6.0 2024-08-14 01:11:39,704 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2407130.0, ans=0.2 2024-08-14 01:11:51,642 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 26 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-14 01:11:52,623 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 8850, loss[loss=0.1126, beats_loss=0.009614, ecapa_loss=0.0001767, whisper_loss=0.1012, over 17084.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01084, ecapa_loss=0.0001581, whisper_loss=0.09102, over 3880046.08 frames. ], batch size: 69, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:11:57,973 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.807e-02 2024-08-14 01:12:15,036 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2407330.0, ans=0.0 2024-08-14 01:12:40,320 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 27 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-14 01:12:50,277 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2407430.0, ans=0.125 2024-08-14 01:12:53,734 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.900e+01 2.365e+01 2.669e+01 3.063e+01 4.484e+01, threshold=5.339e+01, percent-clipped=0.0 2024-08-14 01:12:59,341 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.89 vs. limit=6.0 2024-08-14 01:13:01,486 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.12 vs. limit=10.0 2024-08-14 01:13:18,876 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2407530.0, ans=0.0 2024-08-14 01:13:28,035 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2407630.0, ans=0.125 2024-08-14 01:13:29,033 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 37 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-14 01:13:37,939 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2407630.0, ans=0.125 2024-08-14 01:13:45,492 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 8900, loss[loss=0.09541, beats_loss=0.01061, ecapa_loss=0.000166, whisper_loss=0.08314, over 17106.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.0108, ecapa_loss=0.0001583, whisper_loss=0.09153, over 3866666.05 frames. ], batch size: 69, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:13:51,277 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 23 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-14 01:14:09,202 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 23 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-14 01:14:15,382 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 13 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-14 01:14:42,692 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2407930.0, ans=0.0 2024-08-14 01:15:11,852 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 19 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-14 01:15:16,486 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.23 vs. limit=12.0 2024-08-14 01:15:21,983 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2408130.0, ans=0.2 2024-08-14 01:15:26,551 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 8950, loss[loss=0.09344, beats_loss=0.01128, ecapa_loss=0.0001389, whisper_loss=0.08076, over 17187.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01085, ecapa_loss=0.0001581, whisper_loss=0.09104, over 3852221.88 frames. ], batch size: 67, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:15:51,121 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2408330.0, ans=0.0 2024-08-14 01:16:03,394 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2408330.0, ans=0.0 2024-08-14 01:16:15,406 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.300e+01 2.488e+01 2.810e+01 4.417e+01, threshold=4.975e+01, percent-clipped=0.0 2024-08-14 01:16:35,000 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2408630.0, ans=0.0 2024-08-14 01:16:41,501 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.00 vs. limit=15.0 2024-08-14 01:16:45,347 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2408630.0, ans=0.125 2024-08-14 01:16:50,472 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 9000, loss[loss=0.1062, beats_loss=0.01066, ecapa_loss=0.0001563, whisper_loss=0.09394, over 16284.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01079, ecapa_loss=0.0001588, whisper_loss=0.09153, over 3856926.24 frames. ], batch size: 64, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:16:50,473 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-14 01:17:32,635 INFO [train_multi_KD3.py:1149] (1/4) Epoch 17, validation on ASR_libri: loss=0.2537, beats_loss=0, ecapa_loss=0.0005618, whisper_loss=0.2481, over 922467.00 frames. 2024-08-14 01:17:50,376 INFO [train_multi_KD3.py:1149] (1/4) Epoch 17, validation on SV_voxceleb1: loss=0.004363, beats_loss=0, ecapa_loss=0.0004363, whisper_loss=0, over 939242.00 frames. 2024-08-14 01:20:00,266 INFO [train_multi_KD3.py:1149] (1/4) Epoch 17, validation on AT_audioset: loss=0.02365, beats_loss=0.02365, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 01:20:00,270 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 30639MB 2024-08-14 01:20:03,021 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-14 01:20:07,184 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2408730.0, ans=0.0 2024-08-14 01:20:08,966 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2408730.0, ans=0.2 2024-08-14 01:20:22,452 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.01 vs. limit=15.0 2024-08-14 01:20:58,702 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2409130.0, ans=0.125 2024-08-14 01:21:09,409 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2409130.0, ans=0.1 2024-08-14 01:21:12,208 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-14 01:21:13,266 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 9050, loss[loss=0.1032, beats_loss=0.01182, ecapa_loss=0.0001511, whisper_loss=0.08989, over 22409.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01081, ecapa_loss=0.0001592, whisper_loss=0.09167, over 3884301.92 frames. ], batch size: 91, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:21:13,870 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 29 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-14 01:21:14,146 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2409230.0, ans=0.125 2024-08-14 01:21:39,458 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2409330.0, ans=0.0 2024-08-14 01:21:43,391 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.78 vs. limit=6.0 2024-08-14 01:21:43,447 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.25 vs. limit=15.0 2024-08-14 01:21:52,857 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.446e+01 2.670e+01 2.988e+01 4.436e+01, threshold=5.340e+01, percent-clipped=0.0 2024-08-14 01:22:03,029 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 30 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-14 01:22:07,992 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2409530.0, ans=0.125 2024-08-14 01:22:22,585 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-14 01:22:28,966 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 9100, loss[loss=0.1265, beats_loss=0.007687, ecapa_loss=0.0001735, whisper_loss=0.1171, over 15802.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01076, ecapa_loss=0.00016, whisper_loss=0.09191, over 3843900.51 frames. ], batch size: 61, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:22:37,235 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2409730.0, ans=0.1 2024-08-14 01:22:45,622 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.00 vs. limit=15.0 2024-08-14 01:22:50,289 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 25 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-14 01:23:06,660 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2409930.0, ans=0.125 2024-08-14 01:23:08,303 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.72 vs. limit=10.0 2024-08-14 01:23:08,475 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.73 vs. limit=15.0 2024-08-14 01:23:10,207 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten.whitening_limit, batch_count=2409930.0, ans=15.0 2024-08-14 01:23:33,037 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 26 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-14 01:23:42,524 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2410130.0, ans=0.125 2024-08-14 01:23:48,016 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 9150, loss[loss=0.09202, beats_loss=0.01291, ecapa_loss=0.0001712, whisper_loss=0.07739, over 20151.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01081, ecapa_loss=0.0001598, whisper_loss=0.09085, over 3846997.26 frames. ], batch size: 87, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:24:04,574 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 23 from LS+wenet, 34 from Vox, 34 fro AS 2024-08-14 01:24:10,884 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 32 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-14 01:24:18,585 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 19 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-14 01:24:29,308 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.433e+01 2.654e+01 2.886e+01 8.462e+01, threshold=5.308e+01, percent-clipped=1.0 2024-08-14 01:24:37,808 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.515e-01 2024-08-14 01:24:57,060 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2410630.0, ans=0.1 2024-08-14 01:25:03,305 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 01:25:08,571 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 9200, loss[loss=0.09635, beats_loss=0.01151, ecapa_loss=0.0001733, whisper_loss=0.08311, over 15497.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0107, ecapa_loss=0.0001605, whisper_loss=0.09136, over 3827774.48 frames. ], batch size: 64, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:25:19,204 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 23 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-14 01:25:34,922 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 25 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-14 01:25:39,925 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-14 01:25:47,576 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 21 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-14 01:25:48,957 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-14 01:26:00,050 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2411030.0, ans=0.125 2024-08-14 01:26:01,170 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-14 01:26:06,862 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2411030.0, ans=0.2 2024-08-14 01:26:19,651 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-14 01:26:30,176 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 9250, loss[loss=0.09392, beats_loss=0.01027, ecapa_loss=0.0001511, whisper_loss=0.08213, over 22977.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01071, ecapa_loss=0.0001595, whisper_loss=0.09105, over 3867949.37 frames. ], batch size: 89, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:26:42,782 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2411230.0, ans=0.0 2024-08-14 01:27:10,513 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.850e+01 2.291e+01 2.608e+01 2.884e+01 5.366e+01, threshold=5.217e+01, percent-clipped=1.0 2024-08-14 01:27:14,339 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=2411430.0, ans=10.0 2024-08-14 01:27:30,810 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2411530.0, ans=0.125 2024-08-14 01:27:31,044 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.90 vs. limit=6.0 2024-08-14 01:27:32,599 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2411630.0, ans=0.2 2024-08-14 01:27:38,022 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2411630.0, ans=0.125 2024-08-14 01:27:45,344 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 16 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-14 01:27:46,972 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2411630.0, ans=0.0 2024-08-14 01:27:49,534 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 9300, loss[loss=0.1028, beats_loss=0.01104, ecapa_loss=0.0001791, whisper_loss=0.08999, over 22437.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01073, ecapa_loss=0.000159, whisper_loss=0.09141, over 3885924.17 frames. ], batch size: 93, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:27:51,521 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 20 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-14 01:28:32,906 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2411930.0, ans=0.1 2024-08-14 01:28:41,999 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2412030.0, ans=0.0 2024-08-14 01:28:43,892 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 27 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-14 01:28:56,436 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2412130.0, ans=0.1 2024-08-14 01:28:58,058 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2412130.0, ans=0.125 2024-08-14 01:29:01,797 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2412130.0, ans=0.125 2024-08-14 01:29:07,600 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 9350, loss[loss=0.1053, beats_loss=0.01202, ecapa_loss=0.0001587, whisper_loss=0.09169, over 15333.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01073, ecapa_loss=0.0001583, whisper_loss=0.09201, over 3878695.79 frames. ], batch size: 62, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:29:07,763 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 24 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-14 01:29:08,040 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2412230.0, ans=0.1 2024-08-14 01:29:10,265 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 16 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-14 01:29:10,568 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2412230.0, ans=0.2 2024-08-14 01:29:10,574 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2412230.0, ans=0.0 2024-08-14 01:29:12,165 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2412230.0, ans=0.2 2024-08-14 01:29:28,116 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2412330.0, ans=0.125 2024-08-14 01:29:37,593 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2412430.0, ans=0.0 2024-08-14 01:29:42,927 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 25 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-14 01:29:47,733 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.674e+01 2.279e+01 2.558e+01 2.915e+01 7.467e+01, threshold=5.116e+01, percent-clipped=2.0 2024-08-14 01:30:01,183 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2412530.0, ans=0.1 2024-08-14 01:30:26,368 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 9400, loss[loss=0.1146, beats_loss=0.007974, ecapa_loss=0.0001795, whisper_loss=0.1048, over 15449.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01067, ecapa_loss=0.0001593, whisper_loss=0.09221, over 3859082.34 frames. ], batch size: 60, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:30:33,840 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 26 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-14 01:30:38,292 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 19 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-14 01:30:38,478 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2412730.0, ans=0.1 2024-08-14 01:31:17,360 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2413030.0, ans=0.2 2024-08-14 01:31:20,062 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 23 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-14 01:31:33,488 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 27 from LS+wenet, 11 from Vox, 21 fro AS 2024-08-14 01:31:42,082 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2413130.0, ans=0.0 2024-08-14 01:31:47,907 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 20 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-14 01:31:48,596 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.02 vs. limit=6.0 2024-08-14 01:31:48,940 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 9450, loss[loss=0.1143, beats_loss=0.0087, ecapa_loss=0.0001887, whisper_loss=0.1037, over 15385.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01077, ecapa_loss=0.0001587, whisper_loss=0.09139, over 3843984.69 frames. ], batch size: 61, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:31:57,656 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 32 from LS+wenet, 29 from Vox, 27 fro AS 2024-08-14 01:32:11,185 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2413330.0, ans=0.0 2024-08-14 01:32:26,982 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 27 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-14 01:32:30,674 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 28 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-14 01:32:35,005 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.995e+01 2.507e+01 2.797e+01 3.259e+01 9.131e+01, threshold=5.593e+01, percent-clipped=2.0 2024-08-14 01:32:35,535 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2413430.0, ans=0.125 2024-08-14 01:32:35,622 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2413430.0, ans=0.125 2024-08-14 01:32:40,702 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2413530.0, ans=0.125 2024-08-14 01:32:45,003 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 22 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-14 01:32:55,123 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 27 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-14 01:33:16,744 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 9500, loss[loss=0.1051, beats_loss=0.01011, ecapa_loss=0.0001795, whisper_loss=0.09324, over 16691.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.0107, ecapa_loss=0.0001587, whisper_loss=0.09218, over 3881772.40 frames. ], batch size: 67, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:33:17,746 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.18 vs. limit=6.0 2024-08-14 01:33:45,222 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2413830.0, ans=0.0 2024-08-14 01:34:00,831 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 22 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-14 01:34:02,829 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=4.630e+00 2024-08-14 01:34:33,159 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2414130.0, ans=0.125 2024-08-14 01:34:36,771 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 9550, loss[loss=0.09631, beats_loss=0.01122, ecapa_loss=0.0001557, whisper_loss=0.08353, over 16220.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01068, ecapa_loss=0.0001602, whisper_loss=0.0917, over 3854645.61 frames. ], batch size: 66, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:34:38,413 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-14 01:34:54,984 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2414330.0, ans=0.1 2024-08-14 01:35:02,285 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 24 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-14 01:35:03,698 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 21 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-14 01:35:17,558 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.838e+01 2.395e+01 2.666e+01 3.161e+01 6.328e+01, threshold=5.331e+01, percent-clipped=1.0 2024-08-14 01:35:19,790 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2414430.0, ans=0.125 2024-08-14 01:35:21,788 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.95 vs. limit=15.0 2024-08-14 01:35:37,555 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2414530.0, ans=0.2 2024-08-14 01:35:45,768 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.12 vs. limit=15.0 2024-08-14 01:35:47,625 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2414630.0, ans=0.1 2024-08-14 01:35:47,983 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.53 vs. limit=15.0 2024-08-14 01:35:50,639 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-14 01:35:57,723 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 9600, loss[loss=0.08016, beats_loss=0.01233, ecapa_loss=0.0001656, whisper_loss=0.06618, over 16872.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01068, ecapa_loss=0.0001606, whisper_loss=0.09148, over 3831459.77 frames. ], batch size: 72, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:36:05,914 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 26 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-14 01:36:19,319 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2414830.0, ans=0.125 2024-08-14 01:36:34,625 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2414930.0, ans=0.125 2024-08-14 01:36:41,929 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 30 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-14 01:36:53,431 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2415030.0, ans=0.0 2024-08-14 01:37:19,152 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2415130.0, ans=0.2 2024-08-14 01:37:21,903 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2415130.0, ans=0.2 2024-08-14 01:37:26,345 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 9650, loss[loss=0.1055, beats_loss=0.01128, ecapa_loss=0.0001568, whisper_loss=0.09268, over 22535.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01074, ecapa_loss=0.0001601, whisper_loss=0.09118, over 3811834.80 frames. ], batch size: 90, lr: 3.66e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:37:39,392 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2415230.0, ans=0.125 2024-08-14 01:37:45,912 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2415330.0, ans=0.125 2024-08-14 01:37:59,326 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2415430.0, ans=0.125 2024-08-14 01:38:09,239 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.345e+01 2.616e+01 2.966e+01 4.263e+01, threshold=5.231e+01, percent-clipped=0.0 2024-08-14 01:38:12,629 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2415430.0, ans=0.125 2024-08-14 01:38:16,755 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 16 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-14 01:38:20,558 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.59 vs. limit=15.0 2024-08-14 01:38:35,873 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 19 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-14 01:38:49,104 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 9700, loss[loss=0.09199, beats_loss=0.0133, ecapa_loss=0.0001209, whisper_loss=0.07747, over 19889.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01076, ecapa_loss=0.0001613, whisper_loss=0.0906, over 3805986.73 frames. ], batch size: 80, lr: 3.66e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:38:57,654 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2415730.0, ans=0.1 2024-08-14 01:39:30,639 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2415930.0, ans=0.07 2024-08-14 01:39:37,340 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 28 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-14 01:39:51,069 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=6.0 2024-08-14 01:39:57,536 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2416130.0, ans=0.125 2024-08-14 01:40:02,645 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2416130.0, ans=0.125 2024-08-14 01:40:10,298 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 9750, loss[loss=0.1242, beats_loss=0.007906, ecapa_loss=0.0001802, whisper_loss=0.1145, over 16109.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01073, ecapa_loss=0.0001608, whisper_loss=0.09094, over 3830494.05 frames. ], batch size: 61, lr: 3.66e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:40:16,086 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 17 from LS+wenet, 23 from Vox, 18 fro AS 2024-08-14 01:40:45,575 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2416430.0, ans=0.2 2024-08-14 01:40:46,603 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 20 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-14 01:40:51,020 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.366e+01 2.693e+01 3.078e+01 7.887e+01, threshold=5.385e+01, percent-clipped=1.0 2024-08-14 01:40:54,319 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 26 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-14 01:40:58,974 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 20 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-14 01:41:05,338 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2416530.0, ans=0.125 2024-08-14 01:41:14,991 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 19 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-14 01:41:20,512 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 20 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-14 01:41:24,636 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2416630.0, ans=0.07 2024-08-14 01:41:26,866 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 9800, loss[loss=0.09675, beats_loss=0.01397, ecapa_loss=0.0001151, whisper_loss=0.08164, over 22550.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01075, ecapa_loss=0.0001612, whisper_loss=0.09072, over 3839277.55 frames. ], batch size: 91, lr: 3.66e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:41:34,882 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.62 vs. limit=15.0 2024-08-14 01:41:47,067 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 20 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-14 01:41:47,438 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2416830.0, ans=0.125 2024-08-14 01:42:24,818 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 19 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-14 01:42:28,179 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.96 vs. limit=6.0 2024-08-14 01:42:28,834 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 31 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-14 01:42:29,560 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.98 vs. limit=15.0 2024-08-14 01:42:37,519 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 9850, loss[loss=0.1224, beats_loss=0.008506, ecapa_loss=0.0001861, whisper_loss=0.112, over 18721.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01078, ecapa_loss=0.0001617, whisper_loss=0.09093, over 3834668.84 frames. ], batch size: 73, lr: 3.66e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:42:41,619 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2417230.0, ans=0.125 2024-08-14 01:43:11,530 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.716e+01 2.319e+01 2.530e+01 2.883e+01 5.906e+01, threshold=5.059e+01, percent-clipped=1.0 2024-08-14 01:43:15,744 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 21 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-14 01:43:25,988 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2417530.0, ans=0.1 2024-08-14 01:43:30,052 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 21 from LS+wenet, 20 from Vox, 49 fro AS 2024-08-14 01:43:34,347 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=3.197e-02 2024-08-14 01:43:44,579 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 9900, loss[loss=0.1114, beats_loss=0.008619, ecapa_loss=0.0001787, whisper_loss=0.101, over 14305.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01076, ecapa_loss=0.0001608, whisper_loss=0.09171, over 3836220.21 frames. ], batch size: 56, lr: 3.66e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:43:55,743 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.75 vs. limit=22.5 2024-08-14 01:44:32,461 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2418030.0, ans=0.2 2024-08-14 01:44:32,472 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2418030.0, ans=0.1 2024-08-14 01:44:37,613 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 16 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-14 01:44:48,009 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2418130.0, ans=0.125 2024-08-14 01:44:52,653 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 9950, loss[loss=0.13, beats_loss=0.008032, ecapa_loss=0.0001511, whisper_loss=0.1205, over 19610.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01078, ecapa_loss=0.000161, whisper_loss=0.09126, over 3841883.20 frames. ], batch size: 74, lr: 3.66e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:45:17,723 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 26 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-14 01:45:26,873 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.958e+01 2.412e+01 2.652e+01 3.138e+01 4.371e+01, threshold=5.303e+01, percent-clipped=0.0 2024-08-14 01:45:29,754 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 27 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-14 01:45:33,733 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2418530.0, ans=0.2 2024-08-14 01:45:39,765 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2418530.0, ans=0.2 2024-08-14 01:45:59,840 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 10000, loss[loss=0.1156, beats_loss=0.01014, ecapa_loss=0.0001913, whisper_loss=0.1035, over 21147.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01082, ecapa_loss=0.0001621, whisper_loss=0.09085, over 3820356.39 frames. ], batch size: 90, lr: 3.66e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:46:07,024 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2418730.0, ans=0.2 2024-08-14 01:46:28,660 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2418930.0, ans=0.0 2024-08-14 01:46:28,984 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.85 vs. limit=6.0 2024-08-14 01:46:36,252 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2418930.0, ans=0.0 2024-08-14 01:46:39,197 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.05 vs. limit=15.0 2024-08-14 01:46:42,989 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2419030.0, ans=0.2 2024-08-14 01:47:05,556 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 23 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-14 01:47:06,582 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 10050, loss[loss=0.1076, beats_loss=0.01081, ecapa_loss=0.0001389, whisper_loss=0.0954, over 18830.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01076, ecapa_loss=0.000162, whisper_loss=0.09027, over 3790632.41 frames. ], batch size: 73, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 01:47:15,673 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2419230.0, ans=0.125 2024-08-14 01:47:21,574 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 20 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-14 01:47:21,953 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2419330.0, ans=0.2 2024-08-14 01:47:23,458 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2419330.0, ans=0.0 2024-08-14 01:47:29,821 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2419330.0, ans=0.125 2024-08-14 01:47:39,322 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2419430.0, ans=0.125 2024-08-14 01:47:41,566 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 35 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-14 01:47:42,746 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.962e+01 2.384e+01 2.686e+01 2.960e+01 2.282e+02, threshold=5.371e+01, percent-clipped=3.0 2024-08-14 01:48:00,703 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 24 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-14 01:48:03,933 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2419630.0, ans=0.1 2024-08-14 01:48:14,340 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 10100, loss[loss=0.1105, beats_loss=0.009716, ecapa_loss=0.0002079, whisper_loss=0.09868, over 22332.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01074, ecapa_loss=0.0001634, whisper_loss=0.09085, over 3839683.64 frames. ], batch size: 91, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 01:48:25,854 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2419730.0, ans=0.2 2024-08-14 01:48:31,291 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2419830.0, ans=0.0 2024-08-14 01:48:36,693 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 24 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-14 01:48:38,316 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2419830.0, ans=0.1 2024-08-14 01:48:43,608 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2419930.0, ans=0.1 2024-08-14 01:48:46,063 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=2419930.0, ans=0.025 2024-08-14 01:49:01,762 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2420030.0, ans=0.125 2024-08-14 01:49:24,074 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 10150, loss[loss=0.1213, beats_loss=0.01027, ecapa_loss=0.0001561, whisper_loss=0.1095, over 23785.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01075, ecapa_loss=0.0001635, whisper_loss=0.09076, over 3867593.70 frames. ], batch size: 93, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 01:49:38,627 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2420330.0, ans=0.2 2024-08-14 01:49:50,755 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 17 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-14 01:49:54,938 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2420430.0, ans=0.125 2024-08-14 01:50:05,427 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.404e+01 2.645e+01 2.951e+01 4.259e+01, threshold=5.291e+01, percent-clipped=0.0 2024-08-14 01:50:09,206 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 23 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-14 01:50:43,663 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 10200, loss[loss=0.1006, beats_loss=0.01181, ecapa_loss=0.0001455, whisper_loss=0.08732, over 22943.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01072, ecapa_loss=0.0001622, whisper_loss=0.09096, over 3857805.10 frames. ], batch size: 92, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 01:50:45,744 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2420730.0, ans=0.125 2024-08-14 01:50:55,646 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2420730.0, ans=0.2 2024-08-14 01:50:59,356 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2420830.0, ans=0.125 2024-08-14 01:51:05,994 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2420830.0, ans=0.0 2024-08-14 01:51:10,637 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-14 01:51:15,446 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.08 vs. limit=6.0 2024-08-14 01:52:13,233 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 10250, loss[loss=0.1124, beats_loss=0.01014, ecapa_loss=0.0001343, whisper_loss=0.1009, over 16950.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01067, ecapa_loss=0.0001611, whisper_loss=0.09162, over 3864774.75 frames. ], batch size: 64, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 01:52:16,356 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 31 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-14 01:52:23,377 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2421230.0, ans=0.05 2024-08-14 01:52:28,497 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2421330.0, ans=0.0 2024-08-14 01:52:43,320 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2421330.0, ans=0.0 2024-08-14 01:52:45,230 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2421330.0, ans=0.2 2024-08-14 01:52:47,159 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.19 vs. limit=15.0 2024-08-14 01:52:48,589 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2421430.0, ans=0.2 2024-08-14 01:53:01,391 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.900e+01 2.474e+01 2.733e+01 3.124e+01 2.948e+02, threshold=5.467e+01, percent-clipped=2.0 2024-08-14 01:53:27,627 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2421630.0, ans=0.2 2024-08-14 01:53:29,739 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.55 vs. limit=15.0 2024-08-14 01:53:37,928 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2421630.0, ans=0.0 2024-08-14 01:53:42,567 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 10300, loss[loss=0.101, beats_loss=0.01066, ecapa_loss=0.0001465, whisper_loss=0.08891, over 15134.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01078, ecapa_loss=0.0001605, whisper_loss=0.09048, over 3838434.81 frames. ], batch size: 57, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 01:54:18,587 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-14 01:54:25,826 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2421930.0, ans=0.0 2024-08-14 01:54:31,198 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2421930.0, ans=0.5 2024-08-14 01:54:31,417 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.09 vs. limit=10.0 2024-08-14 01:54:34,653 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2421930.0, ans=0.09899494936611666 2024-08-14 01:54:54,479 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2422130.0, ans=0.1 2024-08-14 01:55:06,674 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 23 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-14 01:55:11,346 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 10350, loss[loss=0.1104, beats_loss=0.01026, ecapa_loss=0.0001565, whisper_loss=0.09853, over 19693.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01078, ecapa_loss=0.0001605, whisper_loss=0.09053, over 3865521.42 frames. ], batch size: 78, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 01:55:13,580 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2422230.0, ans=0.125 2024-08-14 01:55:23,814 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2422230.0, ans=0.125 2024-08-14 01:55:35,647 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 24 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-14 01:55:35,857 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2422330.0, ans=0.125 2024-08-14 01:55:37,608 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 37 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-14 01:55:54,173 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2422430.0, ans=0.125 2024-08-14 01:55:56,404 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.917e+01 2.357e+01 2.601e+01 3.091e+01 4.779e+01, threshold=5.203e+01, percent-clipped=0.0 2024-08-14 01:56:14,655 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2422530.0, ans=0.0 2024-08-14 01:56:32,527 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 10400, loss[loss=0.1107, beats_loss=0.01213, ecapa_loss=0.0001146, whisper_loss=0.09739, over 23334.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01073, ecapa_loss=0.0001599, whisper_loss=0.09164, over 3895953.73 frames. ], batch size: 89, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 01:56:50,097 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2422830.0, ans=0.125 2024-08-14 01:57:13,898 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2423030.0, ans=0.125 2024-08-14 01:57:17,751 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2423030.0, ans=0.0 2024-08-14 01:57:19,156 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2423030.0, ans=0.2 2024-08-14 01:57:22,905 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 25 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-14 01:57:30,027 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2423130.0, ans=0.0 2024-08-14 01:57:33,749 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 19 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-14 01:57:41,603 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 10450, loss[loss=0.1078, beats_loss=0.006923, ecapa_loss=0.0002572, whisper_loss=0.09833, over 14036.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01067, ecapa_loss=0.00016, whisper_loss=0.09135, over 3875309.36 frames. ], batch size: 60, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 01:57:53,444 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.85 vs. limit=8.0 2024-08-14 01:58:07,961 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=2423430.0, ans=0.5 2024-08-14 01:58:15,967 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2423430.0, ans=0.0 2024-08-14 01:58:16,716 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.961e+01 2.463e+01 2.702e+01 3.082e+01 4.541e+01, threshold=5.404e+01, percent-clipped=0.0 2024-08-14 01:58:25,645 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2423530.0, ans=0.1 2024-08-14 01:58:33,357 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 37 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-14 01:58:47,460 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 10500, loss[loss=0.1026, beats_loss=0.01045, ecapa_loss=0.000151, whisper_loss=0.09068, over 14351.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01061, ecapa_loss=0.0001617, whisper_loss=0.09186, over 3864412.59 frames. ], batch size: 54, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 01:58:49,414 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2024-08-14 01:58:50,104 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 23 from LS+wenet, 31 from Vox, 38 fro AS 2024-08-14 01:59:00,895 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2423830.0, ans=0.2 2024-08-14 01:59:00,981 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2423830.0, ans=0.125 2024-08-14 01:59:05,764 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-14 01:59:12,849 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.65 vs. limit=15.0 2024-08-14 01:59:52,585 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 10550, loss[loss=0.09703, beats_loss=0.01078, ecapa_loss=0.0001769, whisper_loss=0.08448, over 20801.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01064, ecapa_loss=0.0001621, whisper_loss=0.09147, over 3867281.57 frames. ], batch size: 87, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:00:06,893 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-14 02:00:11,664 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 19 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-14 02:00:11,938 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2424330.0, ans=0.1 2024-08-14 02:00:26,080 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 16 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-14 02:00:27,174 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.316e+01 2.599e+01 2.857e+01 9.329e+01, threshold=5.198e+01, percent-clipped=3.0 2024-08-14 02:00:40,544 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2424530.0, ans=0.2 2024-08-14 02:00:42,760 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 32 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-14 02:00:48,702 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.63 vs. limit=15.0 2024-08-14 02:00:57,032 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 10600, loss[loss=0.1033, beats_loss=0.01052, ecapa_loss=0.000128, whisper_loss=0.09147, over 17313.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01069, ecapa_loss=0.0001618, whisper_loss=0.09087, over 3886221.84 frames. ], batch size: 65, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:01:04,171 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2424730.0, ans=0.1 2024-08-14 02:01:04,224 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2424730.0, ans=0.125 2024-08-14 02:01:23,179 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2424930.0, ans=0.0 2024-08-14 02:01:24,478 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2424930.0, ans=0.0 2024-08-14 02:01:30,442 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 27 from LS+wenet, 12 from Vox, 19 fro AS 2024-08-14 02:01:42,354 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2425030.0, ans=0.125 2024-08-14 02:01:57,042 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.60 vs. limit=22.5 2024-08-14 02:02:01,601 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 10650, loss[loss=0.1147, beats_loss=0.007024, ecapa_loss=0.0001708, whisper_loss=0.106, over 16223.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01068, ecapa_loss=0.0001613, whisper_loss=0.09099, over 3898531.02 frames. ], batch size: 67, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:02:25,403 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 34 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-14 02:02:37,112 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.012e+01 2.364e+01 2.670e+01 2.895e+01 4.194e+01, threshold=5.340e+01, percent-clipped=0.0 2024-08-14 02:02:40,120 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2425530.0, ans=0.1 2024-08-14 02:02:41,371 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2425530.0, ans=0.0 2024-08-14 02:02:41,466 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2425530.0, ans=0.04949747468305833 2024-08-14 02:02:43,640 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 26 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-14 02:02:55,702 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 26 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-14 02:03:07,487 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 10700, loss[loss=0.1139, beats_loss=0.01145, ecapa_loss=0.000162, whisper_loss=0.1009, over 16064.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01061, ecapa_loss=0.0001601, whisper_loss=0.09189, over 3877105.84 frames. ], batch size: 63, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:03:12,885 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2425730.0, ans=0.04949747468305833 2024-08-14 02:03:12,962 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2425730.0, ans=0.125 2024-08-14 02:03:14,270 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2425730.0, ans=0.0 2024-08-14 02:03:16,781 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2425730.0, ans=0.2 2024-08-14 02:03:32,205 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 16 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-14 02:03:41,555 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-14 02:03:43,590 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.40 vs. limit=22.5 2024-08-14 02:03:53,495 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 24 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-14 02:03:59,280 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2426130.0, ans=0.07 2024-08-14 02:04:11,353 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 33 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-14 02:04:12,477 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 29 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-14 02:04:13,577 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 10750, loss[loss=0.1123, beats_loss=0.01137, ecapa_loss=0.0001249, whisper_loss=0.09967, over 22243.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01063, ecapa_loss=0.0001602, whisper_loss=0.0924, over 3902902.51 frames. ], batch size: 83, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:04:29,517 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2426330.0, ans=0.0 2024-08-14 02:04:29,601 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2426330.0, ans=0.0 2024-08-14 02:04:49,421 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.977e+01 2.445e+01 2.667e+01 2.966e+01 4.209e+01, threshold=5.334e+01, percent-clipped=0.0 2024-08-14 02:05:01,276 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-14 02:05:15,288 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 23 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-14 02:05:20,534 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 10800, loss[loss=0.1279, beats_loss=0.008663, ecapa_loss=0.0001633, whisper_loss=0.1176, over 16344.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01066, ecapa_loss=0.0001606, whisper_loss=0.0931, over 3934264.14 frames. ], batch size: 62, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:05:28,973 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 28 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-14 02:05:43,154 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=4.435e+00 2024-08-14 02:05:44,577 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 21 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-14 02:05:47,745 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 22 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-14 02:05:51,554 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2426930.0, ans=0.125 2024-08-14 02:05:55,158 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 21 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-14 02:06:31,395 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2427130.0, ans=0.1 2024-08-14 02:06:39,689 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 10850, loss[loss=0.1212, beats_loss=0.01057, ecapa_loss=0.0001698, whisper_loss=0.1089, over 21768.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01079, ecapa_loss=0.0001603, whisper_loss=0.09249, over 3937837.08 frames. ], batch size: 84, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:06:40,294 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.11 vs. limit=15.0 2024-08-14 02:06:44,111 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 19 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-14 02:06:58,751 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=2427330.0, ans=10.0 2024-08-14 02:07:03,884 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2427330.0, ans=0.125 2024-08-14 02:07:05,505 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.12 vs. limit=15.0 2024-08-14 02:07:15,426 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-14 02:07:16,769 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2427430.0, ans=0.125 2024-08-14 02:07:21,998 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.381e+01 2.677e+01 3.006e+01 4.441e+01, threshold=5.355e+01, percent-clipped=0.0 2024-08-14 02:07:41,081 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2427530.0, ans=0.125 2024-08-14 02:07:57,461 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 10900, loss[loss=0.09129, beats_loss=0.01286, ecapa_loss=0.0002152, whisper_loss=0.07627, over 20321.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01065, ecapa_loss=0.0001596, whisper_loss=0.09296, over 3949127.46 frames. ], batch size: 90, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:08:15,210 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2427830.0, ans=0.1 2024-08-14 02:08:33,570 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 30 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-14 02:08:36,937 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2427930.0, ans=0.125 2024-08-14 02:08:51,550 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.45 vs. limit=22.5 2024-08-14 02:09:00,664 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 23 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-14 02:09:01,988 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2428130.0, ans=0.0 2024-08-14 02:09:09,154 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 10950, loss[loss=0.0851, beats_loss=0.01375, ecapa_loss=0.000133, whisper_loss=0.07002, over 14664.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01073, ecapa_loss=0.0001589, whisper_loss=0.09279, over 3913145.36 frames. ], batch size: 61, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:09:29,046 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2428330.0, ans=0.0 2024-08-14 02:09:36,824 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 22 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-14 02:09:46,058 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.804e+01 2.387e+01 2.678e+01 3.232e+01 4.538e+01, threshold=5.356e+01, percent-clipped=0.0 2024-08-14 02:10:05,116 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 19 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-14 02:10:05,674 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=2428630.0, ans=22.5 2024-08-14 02:10:17,045 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 11000, loss[loss=0.09135, beats_loss=0.01154, ecapa_loss=0.0001741, whisper_loss=0.07807, over 19081.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01079, ecapa_loss=0.0001589, whisper_loss=0.09163, over 3908227.78 frames. ], batch size: 81, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:10:25,443 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2428730.0, ans=0.0 2024-08-14 02:10:30,819 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2428830.0, ans=0.125 2024-08-14 02:10:33,372 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2428830.0, ans=0.125 2024-08-14 02:10:46,997 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.17 vs. limit=6.0 2024-08-14 02:11:01,512 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 22 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-14 02:11:04,255 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2429030.0, ans=0.125 2024-08-14 02:11:13,367 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2429130.0, ans=0.0 2024-08-14 02:11:13,564 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.97 vs. limit=15.0 2024-08-14 02:11:23,091 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 11050, loss[loss=0.1086, beats_loss=0.0124, ecapa_loss=0.000125, whisper_loss=0.09497, over 24258.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01076, ecapa_loss=0.000159, whisper_loss=0.09152, over 3937471.61 frames. ], batch size: 92, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:11:31,531 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2429230.0, ans=0.0 2024-08-14 02:11:32,477 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 22 from LS+wenet, 19 from Vox, 14 fro AS 2024-08-14 02:11:35,387 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2429330.0, ans=0.125 2024-08-14 02:11:44,252 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2429330.0, ans=0.0 2024-08-14 02:11:58,554 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.682e+01 2.348e+01 2.595e+01 2.854e+01 6.191e+01, threshold=5.189e+01, percent-clipped=1.0 2024-08-14 02:11:59,053 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2429430.0, ans=0.125 2024-08-14 02:12:20,735 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2429630.0, ans=0.2 2024-08-14 02:12:28,374 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 11100, loss[loss=0.08346, beats_loss=0.0127, ecapa_loss=0.0001192, whisper_loss=0.06957, over 18907.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01073, ecapa_loss=0.0001585, whisper_loss=0.09149, over 3951614.24 frames. ], batch size: 73, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:12:31,328 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2429730.0, ans=0.125 2024-08-14 02:12:52,227 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2429830.0, ans=0.2 2024-08-14 02:13:19,753 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 30 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-14 02:13:26,210 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 19 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-14 02:13:30,312 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 21 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-14 02:13:35,889 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 11150, loss[loss=0.08909, beats_loss=0.01135, ecapa_loss=0.000181, whisper_loss=0.07593, over 19590.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01074, ecapa_loss=0.0001588, whisper_loss=0.0913, over 3927888.48 frames. ], batch size: 81, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:13:51,282 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2430330.0, ans=0.125 2024-08-14 02:14:04,915 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.13 vs. limit=22.5 2024-08-14 02:14:05,753 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 02:14:12,353 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.864e+01 2.319e+01 2.556e+01 2.861e+01 3.873e+01, threshold=5.113e+01, percent-clipped=0.0 2024-08-14 02:14:28,687 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2430630.0, ans=0.125 2024-08-14 02:14:43,536 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 11200, loss[loss=0.1041, beats_loss=0.01058, ecapa_loss=0.0001532, whisper_loss=0.09195, over 21719.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01072, ecapa_loss=0.0001585, whisper_loss=0.0913, over 3935699.63 frames. ], batch size: 84, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:14:58,345 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 21 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-14 02:15:08,091 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2430830.0, ans=0.125 2024-08-14 02:15:15,958 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 34 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-14 02:15:21,450 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 26 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-14 02:15:38,791 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 25 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-14 02:15:44,459 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=2431130.0, ans=0.05 2024-08-14 02:15:49,705 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2431230.0, ans=0.1 2024-08-14 02:15:50,377 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 11250, loss[loss=0.1132, beats_loss=0.00835, ecapa_loss=0.0002039, whisper_loss=0.1028, over 13677.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01072, ecapa_loss=0.0001595, whisper_loss=0.09114, over 3910159.99 frames. ], batch size: 55, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:16:06,973 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2431330.0, ans=0.125 2024-08-14 02:16:08,600 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2431330.0, ans=0.0 2024-08-14 02:16:14,081 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2431330.0, ans=0.125 2024-08-14 02:16:26,989 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2431430.0, ans=0.0 2024-08-14 02:16:27,752 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.401e+01 2.755e+01 3.055e+01 4.281e+01, threshold=5.509e+01, percent-clipped=0.0 2024-08-14 02:16:54,242 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 33 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-14 02:16:57,196 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2431730.0, ans=0.125 2024-08-14 02:16:57,988 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 11300, loss[loss=0.1177, beats_loss=0.009759, ecapa_loss=0.0001716, whisper_loss=0.1062, over 21015.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01078, ecapa_loss=0.0001584, whisper_loss=0.09076, over 3906428.45 frames. ], batch size: 83, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:17:05,460 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.59 vs. limit=22.5 2024-08-14 02:17:08,732 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 35 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-14 02:17:17,037 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2431830.0, ans=0.125 2024-08-14 02:17:20,575 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 26 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-14 02:17:36,924 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 20 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-14 02:17:52,116 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2432130.0, ans=0.125 2024-08-14 02:18:03,331 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 27 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-14 02:18:04,282 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 11350, loss[loss=0.1091, beats_loss=0.01067, ecapa_loss=0.000134, whisper_loss=0.09705, over 21964.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0107, ecapa_loss=0.0001584, whisper_loss=0.09112, over 3909969.85 frames. ], batch size: 85, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:18:15,604 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2432230.0, ans=0.125 2024-08-14 02:18:22,481 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 25 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-14 02:18:27,847 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2432330.0, ans=0.05 2024-08-14 02:18:28,845 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 28 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-14 02:18:30,620 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2432430.0, ans=0.0 2024-08-14 02:18:32,500 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=21.67 vs. limit=22.5 2024-08-14 02:18:38,048 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2432430.0, ans=0.125 2024-08-14 02:18:41,296 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.926e+01 2.318e+01 2.543e+01 2.878e+01 4.882e+01, threshold=5.086e+01, percent-clipped=0.0 2024-08-14 02:18:45,751 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 22 from LS+wenet, 11 from Vox, 22 fro AS 2024-08-14 02:18:51,289 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2432530.0, ans=0.0 2024-08-14 02:18:51,370 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2432530.0, ans=0.1 2024-08-14 02:19:03,074 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2432630.0, ans=0.125 2024-08-14 02:19:06,549 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2432630.0, ans=0.125 2024-08-14 02:19:11,457 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 11400, loss[loss=0.1338, beats_loss=0.007696, ecapa_loss=0.00019, whisper_loss=0.1242, over 22630.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.0107, ecapa_loss=0.0001579, whisper_loss=0.09124, over 3863725.55 frames. ], batch size: 89, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:19:44,497 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2432930.0, ans=0.125 2024-08-14 02:19:56,160 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 29 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-14 02:20:00,843 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2433030.0, ans=0.125 2024-08-14 02:20:04,378 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2433130.0, ans=0.95 2024-08-14 02:20:06,553 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 21 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-14 02:20:18,426 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 11450, loss[loss=0.08959, beats_loss=0.01006, ecapa_loss=0.0001901, whisper_loss=0.07763, over 16378.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01072, ecapa_loss=0.0001579, whisper_loss=0.0915, over 3880234.40 frames. ], batch size: 67, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:20:18,821 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2433230.0, ans=0.125 2024-08-14 02:20:32,285 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 24 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-14 02:20:41,698 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 22 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-14 02:20:43,055 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 33 from LS+wenet, 12 from Vox, 43 fro AS 2024-08-14 02:20:46,703 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2433430.0, ans=0.125 2024-08-14 02:20:53,855 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2433430.0, ans=0.0 2024-08-14 02:20:56,911 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.916e+01 2.469e+01 2.659e+01 2.977e+01 4.724e+01, threshold=5.318e+01, percent-clipped=0.0 2024-08-14 02:21:02,666 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 12 from LS+wenet, 27 from Vox, 18 fro AS 2024-08-14 02:21:05,155 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-14 02:21:05,401 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2433530.0, ans=0.1 2024-08-14 02:21:16,455 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.14 vs. limit=6.0 2024-08-14 02:21:17,316 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 23 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-14 02:21:27,544 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 11500, loss[loss=0.0949, beats_loss=0.01153, ecapa_loss=0.0001297, whisper_loss=0.08207, over 15779.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01072, ecapa_loss=0.0001571, whisper_loss=0.09173, over 3908067.04 frames. ], batch size: 62, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:21:29,016 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 30 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-14 02:21:33,679 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.55 vs. limit=15.0 2024-08-14 02:21:38,737 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2433730.0, ans=0.2 2024-08-14 02:21:42,556 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 27 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-14 02:21:45,490 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2433830.0, ans=0.125 2024-08-14 02:22:10,946 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2434030.0, ans=0.1 2024-08-14 02:22:18,739 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2434030.0, ans=0.0 2024-08-14 02:22:21,478 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.06 vs. limit=6.0 2024-08-14 02:22:34,098 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 11550, loss[loss=0.09448, beats_loss=0.009553, ecapa_loss=0.0001577, whisper_loss=0.08335, over 15031.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01066, ecapa_loss=0.0001576, whisper_loss=0.09223, over 3868701.91 frames. ], batch size: 57, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:22:39,384 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 32 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-14 02:22:45,073 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2434230.0, ans=0.125 2024-08-14 02:23:03,894 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 28 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-14 02:23:09,161 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 26 from LS+wenet, 13 from Vox, 39 fro AS 2024-08-14 02:23:10,242 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.081e+01 2.439e+01 2.716e+01 3.080e+01 3.847e+02, threshold=5.432e+01, percent-clipped=2.0 2024-08-14 02:23:16,649 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2434530.0, ans=0.125 2024-08-14 02:23:26,242 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2434530.0, ans=0.125 2024-08-14 02:23:28,257 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 15 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-14 02:23:41,926 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 11600, loss[loss=0.1034, beats_loss=0.0123, ecapa_loss=0.0001747, whisper_loss=0.08936, over 21728.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01071, ecapa_loss=0.0001572, whisper_loss=0.09185, over 3894550.41 frames. ], batch size: 90, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:23:55,133 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 33 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-14 02:24:00,981 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2434830.0, ans=0.0 2024-08-14 02:24:02,466 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2434830.0, ans=0.025 2024-08-14 02:24:17,074 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.65 vs. limit=15.0 2024-08-14 02:24:19,401 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-14 02:24:21,884 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 21 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-14 02:24:23,740 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2435030.0, ans=0.0 2024-08-14 02:24:27,970 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2435030.0, ans=0.07 2024-08-14 02:24:31,637 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 22 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-14 02:24:38,357 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 19 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-14 02:24:39,142 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2435130.0, ans=0.125 2024-08-14 02:24:48,976 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-14 02:24:52,709 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 11650, loss[loss=0.07859, beats_loss=0.01355, ecapa_loss=0.0001901, whisper_loss=0.06314, over 20800.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01079, ecapa_loss=0.0001576, whisper_loss=0.0913, over 3914527.51 frames. ], batch size: 91, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:24:59,968 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2435230.0, ans=0.125 2024-08-14 02:25:10,511 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=2435330.0, ans=15.0 2024-08-14 02:25:19,488 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 20 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-14 02:25:30,556 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 22 from LS+wenet, 31 from Vox, 34 fro AS 2024-08-14 02:25:32,241 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.000e+01 2.452e+01 2.685e+01 3.038e+01 4.511e+01, threshold=5.370e+01, percent-clipped=0.0 2024-08-14 02:25:32,743 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2435430.0, ans=0.125 2024-08-14 02:25:33,015 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.61 vs. limit=6.0 2024-08-14 02:25:54,356 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2435630.0, ans=0.125 2024-08-14 02:26:01,751 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.38 vs. limit=15.0 2024-08-14 02:26:06,675 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 11700, loss[loss=0.1259, beats_loss=0.01001, ecapa_loss=0.0001349, whisper_loss=0.1145, over 15356.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01077, ecapa_loss=0.0001586, whisper_loss=0.09131, over 3946620.77 frames. ], batch size: 58, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:26:08,503 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2435730.0, ans=0.125 2024-08-14 02:26:45,748 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 36 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-14 02:26:49,943 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2435930.0, ans=0.125 2024-08-14 02:26:59,760 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2436030.0, ans=0.125 2024-08-14 02:27:06,857 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 26 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-14 02:27:16,598 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2436130.0, ans=0.125 2024-08-14 02:27:23,957 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 11750, loss[loss=0.1226, beats_loss=0.007884, ecapa_loss=0.0001807, whisper_loss=0.1129, over 19395.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01076, ecapa_loss=0.0001583, whisper_loss=0.09165, over 3984322.06 frames. ], batch size: 77, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:27:24,117 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 23 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-14 02:27:25,219 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 22 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-14 02:27:34,098 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 33 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-14 02:28:03,311 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.416e+01 2.659e+01 3.015e+01 1.752e+02, threshold=5.317e+01, percent-clipped=1.0 2024-08-14 02:28:04,012 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-14 02:28:10,887 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 24 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-14 02:28:18,091 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.54 vs. limit=8.0 2024-08-14 02:28:37,420 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 11800, loss[loss=0.07024, beats_loss=0.01361, ecapa_loss=0.0001443, whisper_loss=0.05519, over 19135.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.0108, ecapa_loss=0.0001593, whisper_loss=0.09174, over 3958243.04 frames. ], batch size: 79, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:28:51,819 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.81 vs. limit=12.0 2024-08-14 02:29:39,854 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2437130.0, ans=0.0 2024-08-14 02:29:46,158 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 11850, loss[loss=0.09354, beats_loss=0.01226, ecapa_loss=0.0001699, whisper_loss=0.07958, over 18984.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01081, ecapa_loss=0.0001588, whisper_loss=0.09253, over 3958915.30 frames. ], batch size: 79, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:29:54,786 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2437230.0, ans=0.125 2024-08-14 02:29:56,199 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2437230.0, ans=0.1 2024-08-14 02:30:19,944 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 23 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-14 02:30:21,087 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-14 02:30:23,379 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.460e+01 2.783e+01 3.243e+01 6.982e+01, threshold=5.565e+01, percent-clipped=2.0 2024-08-14 02:30:38,591 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2437530.0, ans=0.2 2024-08-14 02:30:50,522 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2437630.0, ans=0.0 2024-08-14 02:30:54,611 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2437730.0, ans=0.125 2024-08-14 02:30:55,385 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 11900, loss[loss=0.09761, beats_loss=0.01227, ecapa_loss=0.0001323, whisper_loss=0.08402, over 21805.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01078, ecapa_loss=0.0001596, whisper_loss=0.09227, over 3965891.82 frames. ], batch size: 89, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:31:07,550 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2437830.0, ans=0.125 2024-08-14 02:31:15,335 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2437830.0, ans=0.125 2024-08-14 02:31:34,189 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.whiten.whitening_limit, batch_count=2437930.0, ans=12.0 2024-08-14 02:31:36,514 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 13 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-14 02:31:39,343 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2438030.0, ans=0.0 2024-08-14 02:31:40,710 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2438030.0, ans=0.125 2024-08-14 02:31:47,767 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.21 vs. limit=15.0 2024-08-14 02:31:53,047 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2438130.0, ans=0.1 2024-08-14 02:31:56,915 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 24 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-14 02:32:03,873 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 11950, loss[loss=0.1149, beats_loss=0.009129, ecapa_loss=0.0001851, whisper_loss=0.1039, over 21800.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01071, ecapa_loss=0.00016, whisper_loss=0.09199, over 3949206.39 frames. ], batch size: 87, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:32:08,760 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2438230.0, ans=0.125 2024-08-14 02:32:12,385 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 17 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-14 02:32:20,052 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 14 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-14 02:32:21,752 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.50 vs. limit=22.5 2024-08-14 02:32:42,588 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.387e+01 2.634e+01 2.951e+01 4.369e+01, threshold=5.267e+01, percent-clipped=0.0 2024-08-14 02:32:57,338 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.49 vs. limit=15.0 2024-08-14 02:32:58,695 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2438530.0, ans=0.125 2024-08-14 02:33:15,496 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 12000, loss[loss=0.1051, beats_loss=0.009118, ecapa_loss=0.0001852, whisper_loss=0.09412, over 19725.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01064, ecapa_loss=0.0001609, whisper_loss=0.09203, over 3937340.66 frames. ], batch size: 78, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:33:15,496 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-14 02:34:00,811 INFO [train_multi_KD3.py:1149] (1/4) Epoch 17, validation on ASR_libri: loss=0.2528, beats_loss=0, ecapa_loss=0.0005541, whisper_loss=0.2473, over 922467.00 frames. 2024-08-14 02:34:21,366 INFO [train_multi_KD3.py:1149] (1/4) Epoch 17, validation on SV_voxceleb1: loss=0.004448, beats_loss=0, ecapa_loss=0.0004448, whisper_loss=0, over 939242.00 frames. 2024-08-14 02:36:27,261 INFO [train_multi_KD3.py:1149] (1/4) Epoch 17, validation on AT_audioset: loss=0.02358, beats_loss=0.02358, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 02:36:27,265 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 30639MB 2024-08-14 02:36:52,412 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2438830.0, ans=0.2 2024-08-14 02:36:59,051 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-14 02:37:02,195 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2438930.0, ans=0.1 2024-08-14 02:37:26,850 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 18 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-14 02:37:33,853 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2439130.0, ans=0.0 2024-08-14 02:37:36,002 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 12050, loss[loss=0.1026, beats_loss=0.0108, ecapa_loss=0.0001493, whisper_loss=0.0903, over 21965.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01055, ecapa_loss=0.00016, whisper_loss=0.09256, over 3906643.59 frames. ], batch size: 86, lr: 3.65e-03, grad_scale: 1.152921504606847e+18 2024-08-14 02:37:46,036 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2439230.0, ans=0.1 2024-08-14 02:37:47,824 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.93 vs. limit=22.5 2024-08-14 02:37:51,609 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 26 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-14 02:38:03,507 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2439430.0, ans=0.2 2024-08-14 02:38:06,303 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2439430.0, ans=0.125 2024-08-14 02:38:12,126 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2439430.0, ans=0.0 2024-08-14 02:38:13,606 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2439430.0, ans=0.125 2024-08-14 02:38:14,309 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.755e+01 2.389e+01 2.572e+01 2.864e+01 7.729e+01, threshold=5.144e+01, percent-clipped=2.0 2024-08-14 02:38:18,513 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2439530.0, ans=0.1 2024-08-14 02:38:43,443 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.24 vs. limit=15.0 2024-08-14 02:38:44,417 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 12100, loss[loss=0.1011, beats_loss=0.01109, ecapa_loss=0.0001903, whisper_loss=0.08811, over 17912.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01062, ecapa_loss=0.0001594, whisper_loss=0.09179, over 3895067.97 frames. ], batch size: 79, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:38:52,782 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 34 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-14 02:38:57,007 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 28 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-14 02:39:20,192 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2439930.0, ans=0.125 2024-08-14 02:39:40,502 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.11 vs. limit=15.0 2024-08-14 02:39:42,907 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 24 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-14 02:39:44,378 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 26 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-14 02:40:01,457 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 12150, loss[loss=0.1031, beats_loss=0.009589, ecapa_loss=0.0001518, whisper_loss=0.09198, over 16494.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01058, ecapa_loss=0.0001603, whisper_loss=0.0918, over 3869341.37 frames. ], batch size: 63, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:40:12,708 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 29 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-14 02:40:15,554 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2440330.0, ans=0.125 2024-08-14 02:40:26,289 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.47 vs. limit=12.0 2024-08-14 02:40:38,110 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2440430.0, ans=0.125 2024-08-14 02:40:39,132 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 02:40:44,597 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.942e+01 2.448e+01 2.795e+01 3.138e+01 2.484e+02, threshold=5.590e+01, percent-clipped=2.0 2024-08-14 02:40:46,923 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 23 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-14 02:41:12,935 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 20 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-14 02:41:13,165 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2440630.0, ans=0.125 2024-08-14 02:41:14,238 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-14 02:41:18,176 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 12200, loss[loss=0.08871, beats_loss=0.01284, ecapa_loss=0.0001428, whisper_loss=0.07444, over 16594.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01057, ecapa_loss=0.0001602, whisper_loss=0.09159, over 3869876.84 frames. ], batch size: 68, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:41:20,593 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2440730.0, ans=0.125 2024-08-14 02:41:22,157 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 24 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-14 02:41:22,420 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2440730.0, ans=0.0 2024-08-14 02:41:27,704 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 20 from LS+wenet, 33 from Vox, 36 fro AS 2024-08-14 02:41:40,349 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2440830.0, ans=0.0 2024-08-14 02:41:44,567 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2440830.0, ans=0.125 2024-08-14 02:41:55,762 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 15 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-14 02:41:56,018 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2440930.0, ans=0.1 2024-08-14 02:42:08,344 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2441030.0, ans=0.125 2024-08-14 02:42:16,376 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2441030.0, ans=0.0 2024-08-14 02:42:33,204 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 12250, loss[loss=0.09608, beats_loss=0.01122, ecapa_loss=0.0001989, whisper_loss=0.08286, over 21350.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01057, ecapa_loss=0.0001602, whisper_loss=0.09229, over 3903533.64 frames. ], batch size: 91, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:42:45,137 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2441230.0, ans=0.125 2024-08-14 02:43:08,787 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.93 vs. limit=22.5 2024-08-14 02:43:14,069 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.990e+01 2.529e+01 2.845e+01 3.228e+01 1.360e+02, threshold=5.691e+01, percent-clipped=2.0 2024-08-14 02:43:17,377 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.38 vs. limit=15.0 2024-08-14 02:43:38,259 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2441630.0, ans=0.0 2024-08-14 02:43:46,774 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 12300, loss[loss=0.121, beats_loss=0.009461, ecapa_loss=0.0001743, whisper_loss=0.1098, over 17071.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01067, ecapa_loss=0.0001588, whisper_loss=0.09168, over 3922307.68 frames. ], batch size: 67, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:43:46,961 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 23 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-14 02:43:52,939 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-14 02:44:00,131 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2441830.0, ans=0.0 2024-08-14 02:44:04,625 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2441830.0, ans=0.0 2024-08-14 02:44:08,909 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-14 02:44:11,173 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.19 vs. limit=22.5 2024-08-14 02:44:14,589 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2441830.0, ans=0.1 2024-08-14 02:44:15,918 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2441930.0, ans=0.1 2024-08-14 02:44:20,155 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2441930.0, ans=0.125 2024-08-14 02:44:43,961 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2442130.0, ans=0.125 2024-08-14 02:44:56,991 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 12350, loss[loss=0.09324, beats_loss=0.009925, ecapa_loss=0.0001598, whisper_loss=0.08172, over 21850.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01067, ecapa_loss=0.0001596, whisper_loss=0.09071, over 3898114.73 frames. ], batch size: 90, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:45:03,590 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 23 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-14 02:45:17,429 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2442330.0, ans=0.0 2024-08-14 02:45:29,363 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.349e+05 2024-08-14 02:45:34,484 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.759e+01 2.343e+01 2.707e+01 2.893e+01 7.539e+01, threshold=5.413e+01, percent-clipped=2.0 2024-08-14 02:46:03,073 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 12400, loss[loss=0.09746, beats_loss=0.01056, ecapa_loss=0.0001583, whisper_loss=0.08532, over 14413.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01065, ecapa_loss=0.0001594, whisper_loss=0.0913, over 3894941.07 frames. ], batch size: 56, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:46:07,110 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 36 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-14 02:46:09,897 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2442730.0, ans=0.05 2024-08-14 02:46:41,615 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2443030.0, ans=0.125 2024-08-14 02:46:42,737 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2443030.0, ans=0.125 2024-08-14 02:47:03,899 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2443130.0, ans=0.1 2024-08-14 02:47:05,120 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2443130.0, ans=0.1 2024-08-14 02:47:07,514 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 12450, loss[loss=0.1245, beats_loss=0.009332, ecapa_loss=0.000128, whisper_loss=0.1139, over 17801.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01061, ecapa_loss=0.0001599, whisper_loss=0.09121, over 3906073.63 frames. ], batch size: 65, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:47:07,845 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2443230.0, ans=0.125 2024-08-14 02:47:19,819 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2443330.0, ans=0.125 2024-08-14 02:47:22,065 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 20 from LS+wenet, 8 from Vox, 29 fro AS 2024-08-14 02:47:24,609 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 20 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-14 02:47:35,651 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2443430.0, ans=0.125 2024-08-14 02:47:41,571 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-14 02:47:44,037 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.385e+01 2.629e+01 3.074e+01 4.896e+01, threshold=5.258e+01, percent-clipped=0.0 2024-08-14 02:47:44,455 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2443430.0, ans=0.125 2024-08-14 02:47:48,125 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 30 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-14 02:47:56,523 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.30 vs. limit=12.0 2024-08-14 02:48:06,632 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2443630.0, ans=0.125 2024-08-14 02:48:11,728 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 32 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-14 02:48:12,820 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 12500, loss[loss=0.1163, beats_loss=0.008798, ecapa_loss=0.0001772, whisper_loss=0.1057, over 22457.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01063, ecapa_loss=0.0001595, whisper_loss=0.09183, over 3922420.53 frames. ], batch size: 89, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:48:18,260 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 20 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-14 02:48:19,900 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2443730.0, ans=0.1 2024-08-14 02:48:24,522 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 36 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-14 02:48:26,233 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.45 vs. limit=15.0 2024-08-14 02:48:47,961 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2443930.0, ans=0.125 2024-08-14 02:48:59,952 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2444030.0, ans=0.125 2024-08-14 02:49:02,552 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.33 vs. limit=15.0 2024-08-14 02:49:16,137 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.05 vs. limit=12.0 2024-08-14 02:49:17,703 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 12550, loss[loss=0.07973, beats_loss=0.01268, ecapa_loss=0.0002049, whisper_loss=0.065, over 18894.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01067, ecapa_loss=0.0001597, whisper_loss=0.09155, over 3939529.34 frames. ], batch size: 87, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:49:17,831 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 23 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-14 02:49:28,613 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2444230.0, ans=0.0 2024-08-14 02:49:40,363 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 24 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-14 02:49:44,295 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2444430.0, ans=0.125 2024-08-14 02:49:50,046 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.59 vs. limit=12.0 2024-08-14 02:49:54,365 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.347e+01 2.679e+01 3.056e+01 5.301e+01, threshold=5.357e+01, percent-clipped=1.0 2024-08-14 02:50:03,919 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2444530.0, ans=0.125 2024-08-14 02:50:06,510 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-14 02:50:08,598 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.06 vs. limit=6.0 2024-08-14 02:50:22,815 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 12600, loss[loss=0.1153, beats_loss=0.01171, ecapa_loss=0.0001491, whisper_loss=0.1021, over 22117.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.0107, ecapa_loss=0.0001595, whisper_loss=0.09164, over 3930048.81 frames. ], batch size: 87, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:50:22,937 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 19 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-14 02:50:27,000 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2444730.0, ans=0.2 2024-08-14 02:50:32,211 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.12 vs. limit=6.0 2024-08-14 02:50:34,968 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.16 vs. limit=12.0 2024-08-14 02:50:37,978 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 33 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-14 02:50:49,901 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2444930.0, ans=0.125 2024-08-14 02:50:49,974 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2444930.0, ans=0.2 2024-08-14 02:51:00,280 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2445030.0, ans=0.1 2024-08-14 02:51:12,752 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 24 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-14 02:51:19,959 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.17 vs. limit=15.0 2024-08-14 02:51:27,246 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 12650, loss[loss=0.08929, beats_loss=0.008413, ecapa_loss=0.0001834, whisper_loss=0.07904, over 14404.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01072, ecapa_loss=0.0001587, whisper_loss=0.0922, over 3916757.30 frames. ], batch size: 58, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:51:50,964 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2445330.0, ans=0.125 2024-08-14 02:52:02,851 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2445430.0, ans=0.2 2024-08-14 02:52:03,681 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.374e+01 2.633e+01 2.976e+01 1.427e+02, threshold=5.265e+01, percent-clipped=1.0 2024-08-14 02:52:16,828 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 18 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-14 02:52:25,003 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2445630.0, ans=0.035 2024-08-14 02:52:32,224 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 12700, loss[loss=0.09806, beats_loss=0.01234, ecapa_loss=0.0001554, whisper_loss=0.08416, over 22597.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01082, ecapa_loss=0.0001574, whisper_loss=0.09157, over 3902099.50 frames. ], batch size: 94, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:52:43,053 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2445730.0, ans=0.2 2024-08-14 02:52:46,673 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-14 02:52:50,400 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 16 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-14 02:53:08,199 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 27 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-14 02:53:12,403 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2446030.0, ans=0.2 2024-08-14 02:53:15,844 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-14 02:53:26,671 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2446130.0, ans=0.1 2024-08-14 02:53:27,544 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-14 02:53:30,106 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 27 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-14 02:53:37,662 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 12750, loss[loss=0.1288, beats_loss=0.01014, ecapa_loss=0.0001522, whisper_loss=0.1171, over 22559.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01084, ecapa_loss=0.0001578, whisper_loss=0.09201, over 3927103.21 frames. ], batch size: 87, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:53:44,157 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 18 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-14 02:53:48,454 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 02:53:48,729 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.35 vs. limit=10.0 2024-08-14 02:53:59,949 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 28 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-14 02:54:01,895 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.62 vs. limit=22.5 2024-08-14 02:54:06,575 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 29 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-14 02:54:07,759 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 23 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-14 02:54:14,210 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.854e+01 2.458e+01 2.855e+01 3.170e+01 1.362e+02, threshold=5.709e+01, percent-clipped=3.0 2024-08-14 02:54:24,791 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2446530.0, ans=0.125 2024-08-14 02:54:32,753 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2446630.0, ans=0.1 2024-08-14 02:54:33,835 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2446630.0, ans=0.125 2024-08-14 02:54:42,249 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 12800, loss[loss=0.1075, beats_loss=0.01227, ecapa_loss=0.0001701, whisper_loss=0.09357, over 22830.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01086, ecapa_loss=0.000159, whisper_loss=0.09178, over 3930497.72 frames. ], batch size: 94, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:54:46,338 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 13 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-14 02:54:56,553 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.75 vs. limit=15.0 2024-08-14 02:55:14,605 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.35 vs. limit=12.0 2024-08-14 02:55:35,092 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2447130.0, ans=0.2 2024-08-14 02:55:39,048 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2447130.0, ans=0.125 2024-08-14 02:55:45,482 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2447130.0, ans=0.0 2024-08-14 02:55:47,790 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 12850, loss[loss=0.1072, beats_loss=0.009132, ecapa_loss=0.0002235, whisper_loss=0.09587, over 15969.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01094, ecapa_loss=0.0001594, whisper_loss=0.09118, over 3892208.41 frames. ], batch size: 66, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:55:51,883 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 24 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-14 02:56:00,843 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 36 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-14 02:56:11,116 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 19 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-14 02:56:15,227 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2447430.0, ans=0.1 2024-08-14 02:56:23,184 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.80 vs. limit=15.0 2024-08-14 02:56:23,758 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.411e+01 2.741e+01 3.118e+01 1.301e+02, threshold=5.482e+01, percent-clipped=1.0 2024-08-14 02:56:25,506 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 34 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-14 02:56:26,452 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.15 vs. limit=15.0 2024-08-14 02:56:41,615 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2447630.0, ans=0.2 2024-08-14 02:56:42,827 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2447630.0, ans=0.125 2024-08-14 02:56:43,868 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 20 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-14 02:56:53,052 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 12900, loss[loss=0.09776, beats_loss=0.01168, ecapa_loss=0.0001585, whisper_loss=0.0845, over 22303.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01092, ecapa_loss=0.0001598, whisper_loss=0.09072, over 3880125.86 frames. ], batch size: 90, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:56:56,174 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=2447730.0, ans=15.0 2024-08-14 02:56:57,114 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2447730.0, ans=0.035 2024-08-14 02:56:58,354 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 23 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-14 02:56:58,667 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2447730.0, ans=0.0 2024-08-14 02:57:01,022 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2447730.0, ans=0.125 2024-08-14 02:57:13,126 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2447830.0, ans=0.125 2024-08-14 02:57:15,462 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 32 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-14 02:57:58,925 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 15 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-14 02:58:01,623 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 12950, loss[loss=0.08367, beats_loss=0.01174, ecapa_loss=0.0001757, whisper_loss=0.07017, over 19973.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01084, ecapa_loss=0.000161, whisper_loss=0.09169, over 3911741.18 frames. ], batch size: 86, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:58:21,527 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 24 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-14 02:58:35,694 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2448430.0, ans=0.0 2024-08-14 02:58:40,024 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 26 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-14 02:58:41,075 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.282e+01 2.587e+01 2.877e+01 4.043e+01, threshold=5.173e+01, percent-clipped=0.0 2024-08-14 02:59:01,091 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2448630.0, ans=0.0 2024-08-14 02:59:08,384 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.13 vs. limit=12.0 2024-08-14 02:59:08,542 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.25 vs. limit=15.0 2024-08-14 02:59:12,067 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 13000, loss[loss=0.1122, beats_loss=0.01042, ecapa_loss=0.0001515, whisper_loss=0.1003, over 18002.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01084, ecapa_loss=0.0001608, whisper_loss=0.09132, over 3921152.55 frames. ], batch size: 69, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:59:12,218 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 30 from LS+wenet, 34 from Vox, 29 fro AS 2024-08-14 02:59:12,400 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2448730.0, ans=0.0 2024-08-14 02:59:28,253 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2448830.0, ans=0.125 2024-08-14 02:59:33,430 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 21 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-14 02:59:41,578 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=1.309e-02 2024-08-14 03:00:08,174 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 21 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-14 03:00:26,319 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2449230.0, ans=0.2 2024-08-14 03:00:27,260 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 13050, loss[loss=0.1022, beats_loss=0.01023, ecapa_loss=0.0001783, whisper_loss=0.09017, over 21220.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01078, ecapa_loss=0.0001621, whisper_loss=0.09141, over 3894291.67 frames. ], batch size: 87, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:00:27,417 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 25 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-14 03:00:47,659 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2449330.0, ans=0.125 2024-08-14 03:01:16,900 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 14 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-14 03:01:18,468 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.987e+01 2.546e+01 2.785e+01 3.142e+01 1.124e+02, threshold=5.570e+01, percent-clipped=2.0 2024-08-14 03:01:39,182 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 26 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-14 03:01:39,297 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2449530.0, ans=0.1 2024-08-14 03:02:02,133 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.19 vs. limit=22.5 2024-08-14 03:02:03,182 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 13100, loss[loss=0.1119, beats_loss=0.009199, ecapa_loss=0.0001583, whisper_loss=0.1011, over 19520.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01092, ecapa_loss=0.0001599, whisper_loss=0.0904, over 3896636.92 frames. ], batch size: 79, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:02:38,924 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 28 from LS+wenet, 31 from Vox, 36 fro AS 2024-08-14 03:02:42,988 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 16 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-14 03:02:54,139 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 20 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-14 03:03:05,040 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2450030.0, ans=0.0 2024-08-14 03:03:22,057 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2450030.0, ans=0.05 2024-08-14 03:03:53,828 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 13150, loss[loss=0.07793, beats_loss=0.0136, ecapa_loss=0.0001515, whisper_loss=0.06281, over 15457.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01084, ecapa_loss=0.0001596, whisper_loss=0.09073, over 3886414.92 frames. ], batch size: 63, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:04:41,355 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.25 vs. limit=15.0 2024-08-14 03:04:51,763 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.19 vs. limit=15.0 2024-08-14 03:04:59,480 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2450430.0, ans=0.1 2024-08-14 03:05:09,508 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.967e+01 2.343e+01 2.636e+01 2.918e+01 3.888e+01, threshold=5.273e+01, percent-clipped=0.0 2024-08-14 03:05:24,010 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 15 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-14 03:05:46,433 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2450630.0, ans=0.1 2024-08-14 03:06:08,975 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 13200, loss[loss=0.084, beats_loss=0.00948, ecapa_loss=0.0002199, whisper_loss=0.07232, over 18272.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0109, ecapa_loss=0.0001595, whisper_loss=0.09015, over 3862940.23 frames. ], batch size: 80, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:06:18,382 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2450730.0, ans=0.025 2024-08-14 03:06:33,243 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2450830.0, ans=0.125 2024-08-14 03:06:50,918 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 15 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-14 03:06:56,129 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 22 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-14 03:07:04,197 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 19 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-14 03:07:21,076 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2450930.0, ans=0.0 2024-08-14 03:07:53,715 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2451130.0, ans=0.125 2024-08-14 03:08:07,233 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 20 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-14 03:08:09,372 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-14 03:08:16,150 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 13250, loss[loss=0.1035, beats_loss=0.009761, ecapa_loss=0.0001222, whisper_loss=0.0925, over 19376.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01084, ecapa_loss=0.00016, whisper_loss=0.09025, over 3841603.60 frames. ], batch size: 71, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:08:40,260 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.86 vs. limit=15.0 2024-08-14 03:08:42,300 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2451330.0, ans=0.125 2024-08-14 03:09:27,027 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.791e+01 2.488e+01 2.774e+01 3.161e+01 6.895e+01, threshold=5.548e+01, percent-clipped=1.0 2024-08-14 03:10:01,153 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2451730.0, ans=0.07 2024-08-14 03:10:01,987 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 13300, loss[loss=0.08244, beats_loss=0.008348, ecapa_loss=0.0001854, whisper_loss=0.07224, over 14767.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01077, ecapa_loss=0.0001604, whisper_loss=0.09056, over 3864618.31 frames. ], batch size: 58, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:10:12,651 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=2451730.0, ans=0.05 2024-08-14 03:10:24,210 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2451830.0, ans=0.125 2024-08-14 03:10:24,278 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2451830.0, ans=0.2 2024-08-14 03:10:38,648 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2451930.0, ans=0.0 2024-08-14 03:11:15,687 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2452130.0, ans=0.0 2024-08-14 03:11:23,654 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 16 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-14 03:11:23,917 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2452130.0, ans=0.125 2024-08-14 03:11:26,830 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 13350, loss[loss=0.07622, beats_loss=0.01485, ecapa_loss=0.000131, whisper_loss=0.06006, over 16100.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01079, ecapa_loss=0.0001593, whisper_loss=0.09018, over 3849017.38 frames. ], batch size: 65, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:11:30,480 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 24 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-14 03:11:35,701 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 18 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-14 03:11:42,436 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2452330.0, ans=0.125 2024-08-14 03:11:50,806 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-14 03:12:07,368 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2452430.0, ans=0.0 2024-08-14 03:12:12,720 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.893e+01 2.377e+01 2.695e+01 3.024e+01 3.722e+01, threshold=5.391e+01, percent-clipped=0.0 2024-08-14 03:12:17,698 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-14 03:12:32,187 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2452630.0, ans=0.125 2024-08-14 03:12:40,110 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 32 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-14 03:12:48,250 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 13400, loss[loss=0.07802, beats_loss=0.01445, ecapa_loss=0.0001153, whisper_loss=0.06242, over 18653.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01072, ecapa_loss=0.0001591, whisper_loss=0.09079, over 3854485.80 frames. ], batch size: 75, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:12:48,515 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 20 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-14 03:12:50,267 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2452730.0, ans=0.125 2024-08-14 03:12:52,225 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=2452730.0, ans=0.95 2024-08-14 03:13:13,129 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2452830.0, ans=0.125 2024-08-14 03:13:19,568 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2452930.0, ans=0.125 2024-08-14 03:13:39,136 WARNING [optim.py:496] (1/4) Scaling gradients by 0.06988123059272766, model_norm_threshold=53.90802001953125 2024-08-14 03:13:39,349 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.25, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.512e+05, grad_sumsq=1.512e+05, orig_rms_sq=1.000e+00 2024-08-14 03:13:58,749 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2453130.0, ans=0.125 2024-08-14 03:14:09,236 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 13450, loss[loss=0.1155, beats_loss=0.009503, ecapa_loss=0.0001714, whisper_loss=0.1043, over 23871.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01076, ecapa_loss=0.0001586, whisper_loss=0.09076, over 3859652.40 frames. ], batch size: 94, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:14:39,680 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2453330.0, ans=0.0 2024-08-14 03:14:39,922 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.54 vs. limit=15.0 2024-08-14 03:14:47,400 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-14 03:14:55,081 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.067e+01 2.489e+01 2.722e+01 3.204e+01 7.714e+02, threshold=5.444e+01, percent-clipped=1.0 2024-08-14 03:15:09,268 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2453530.0, ans=0.125 2024-08-14 03:15:12,620 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2453630.0, ans=0.125 2024-08-14 03:15:26,183 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 24 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-14 03:15:27,217 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 13500, loss[loss=0.09477, beats_loss=0.01186, ecapa_loss=0.0001677, whisper_loss=0.08124, over 20820.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01079, ecapa_loss=0.000158, whisper_loss=0.09072, over 3862557.73 frames. ], batch size: 88, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:15:37,766 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.36 vs. limit=6.0 2024-08-14 03:15:41,580 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2453830.0, ans=0.125 2024-08-14 03:15:52,814 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2453830.0, ans=0.0 2024-08-14 03:16:01,246 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 22 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-14 03:16:06,987 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 20 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-14 03:16:20,872 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2454030.0, ans=0.125 2024-08-14 03:16:24,755 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2454130.0, ans=0.1 2024-08-14 03:16:33,947 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 30 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-14 03:16:36,641 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 13550, loss[loss=0.09806, beats_loss=0.01174, ecapa_loss=0.0001387, whisper_loss=0.08493, over 22161.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01072, ecapa_loss=0.0001579, whisper_loss=0.09123, over 3853674.61 frames. ], batch size: 88, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:16:43,534 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2454230.0, ans=0.05 2024-08-14 03:16:51,438 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=2454330.0, ans=0.5 2024-08-14 03:16:51,794 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.20 vs. limit=22.5 2024-08-14 03:16:59,034 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2454330.0, ans=0.125 2024-08-14 03:17:12,748 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.332e+01 2.621e+01 2.776e+01 5.086e+01, threshold=5.241e+01, percent-clipped=0.0 2024-08-14 03:17:12,914 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 18 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-14 03:17:16,222 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.68 vs. limit=22.5 2024-08-14 03:17:24,697 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.66 vs. limit=22.5 2024-08-14 03:17:41,319 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 13600, loss[loss=0.08801, beats_loss=0.01181, ecapa_loss=0.000159, whisper_loss=0.07461, over 21990.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01081, ecapa_loss=0.0001566, whisper_loss=0.09061, over 3873578.20 frames. ], batch size: 91, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:17:47,085 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2454730.0, ans=0.125 2024-08-14 03:17:49,568 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2454730.0, ans=0.125 2024-08-14 03:18:21,147 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2455030.0, ans=0.125 2024-08-14 03:18:46,908 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 13650, loss[loss=0.08818, beats_loss=0.01311, ecapa_loss=0.0001228, whisper_loss=0.07384, over 20850.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01081, ecapa_loss=0.000157, whisper_loss=0.09119, over 3862920.48 frames. ], batch size: 79, lr: 3.63e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:19:03,983 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2455330.0, ans=0.04949747468305833 2024-08-14 03:19:24,884 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.739e+01 2.360e+01 2.649e+01 3.081e+01 1.605e+02, threshold=5.298e+01, percent-clipped=1.0 2024-08-14 03:19:32,251 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 19 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-14 03:19:48,216 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2455630.0, ans=0.0 2024-08-14 03:19:51,481 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.59 vs. limit=15.0 2024-08-14 03:19:52,401 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 36 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-14 03:19:57,162 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 13700, loss[loss=0.09094, beats_loss=0.01421, ecapa_loss=0.0001309, whisper_loss=0.07543, over 17071.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01082, ecapa_loss=0.0001569, whisper_loss=0.09167, over 3855173.08 frames. ], batch size: 68, lr: 3.63e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:20:28,392 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2455930.0, ans=0.125 2024-08-14 03:21:10,477 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=2456230.0, ans=6.0 2024-08-14 03:21:10,972 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 13750, loss[loss=0.103, beats_loss=0.009747, ecapa_loss=0.0001669, whisper_loss=0.0916, over 18937.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01084, ecapa_loss=0.0001575, whisper_loss=0.09079, over 3817633.50 frames. ], batch size: 75, lr: 3.63e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:21:21,651 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 13 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-14 03:21:53,610 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 31 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-14 03:21:54,667 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.778e+01 2.289e+01 2.530e+01 2.894e+01 7.886e+01, threshold=5.061e+01, percent-clipped=1.0 2024-08-14 03:21:56,396 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 16 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-14 03:22:01,469 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 18 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-14 03:22:28,419 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 13800, loss[loss=0.07576, beats_loss=0.0123, ecapa_loss=0.0001666, whisper_loss=0.0618, over 14190.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01082, ecapa_loss=0.0001575, whisper_loss=0.09052, over 3828806.00 frames. ], batch size: 56, lr: 3.63e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:22:53,411 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2456830.0, ans=0.0 2024-08-14 03:22:53,661 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.22 vs. limit=15.0 2024-08-14 03:23:10,365 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-14 03:23:13,017 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 26 from LS+wenet, 33 from Vox, 36 fro AS 2024-08-14 03:23:40,373 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 27 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-14 03:23:41,339 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.25 vs. limit=15.0 2024-08-14 03:23:44,545 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2457130.0, ans=0.0 2024-08-14 03:23:48,975 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 13850, loss[loss=0.1121, beats_loss=0.008058, ecapa_loss=0.0001906, whisper_loss=0.1021, over 15707.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01077, ecapa_loss=0.0001587, whisper_loss=0.09131, over 3852930.03 frames. ], batch size: 62, lr: 3.63e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:23:49,551 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2457230.0, ans=0.0 2024-08-14 03:23:58,790 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 25 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-14 03:24:00,295 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 24 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-14 03:24:14,854 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2457330.0, ans=0.0 2024-08-14 03:24:22,959 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.30 vs. limit=22.5 2024-08-14 03:24:32,595 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 32 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-14 03:24:33,983 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-14 03:24:35,963 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.926e+01 2.485e+01 2.798e+01 3.130e+01 4.713e+02, threshold=5.595e+01, percent-clipped=2.0 2024-08-14 03:24:48,377 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 27 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-14 03:25:10,367 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2457730.0, ans=0.0 2024-08-14 03:25:11,262 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 13900, loss[loss=0.08463, beats_loss=0.009882, ecapa_loss=0.0001481, whisper_loss=0.07326, over 15392.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01073, ecapa_loss=0.0001596, whisper_loss=0.09118, over 3873758.58 frames. ], batch size: 57, lr: 3.63e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:25:16,941 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.91 vs. limit=22.5 2024-08-14 03:25:23,171 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.02 vs. limit=6.0 2024-08-14 03:25:36,291 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=2457830.0, ans=15.0 2024-08-14 03:26:10,725 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.74 vs. limit=12.0 2024-08-14 03:26:34,140 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 13950, loss[loss=0.1117, beats_loss=0.01154, ecapa_loss=0.0001758, whisper_loss=0.09844, over 22918.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01074, ecapa_loss=0.0001596, whisper_loss=0.0914, over 3886117.57 frames. ], batch size: 93, lr: 3.63e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:26:36,146 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 26 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-14 03:26:46,204 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2458230.0, ans=0.0 2024-08-14 03:26:55,933 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2458330.0, ans=0.0 2024-08-14 03:27:05,631 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.70 vs. limit=15.0 2024-08-14 03:27:19,774 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.722e+01 2.317e+01 2.641e+01 2.864e+01 9.900e+01, threshold=5.282e+01, percent-clipped=1.0 2024-08-14 03:27:51,428 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 23 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-14 03:27:51,653 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=6.116e-02 2024-08-14 03:27:52,765 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 14000, loss[loss=0.09582, beats_loss=0.01299, ecapa_loss=0.0001317, whisper_loss=0.08151, over 21171.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01076, ecapa_loss=0.0001587, whisper_loss=0.09157, over 3906952.71 frames. ], batch size: 81, lr: 3.63e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:27:55,111 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 03:28:17,530 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2458830.0, ans=0.0 2024-08-14 03:28:18,942 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 25 from LS+wenet, 31 from Vox, 38 fro AS 2024-08-14 03:28:21,019 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 23 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-14 03:28:33,165 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2458930.0, ans=0.2 2024-08-14 03:28:37,082 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=10.13 vs. limit=10.0 2024-08-14 03:28:44,192 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2459030.0, ans=0.07 2024-08-14 03:28:44,437 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.81 vs. limit=15.0 2024-08-14 03:28:54,001 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 26 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-14 03:29:04,974 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 18 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-14 03:29:07,888 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2459130.0, ans=0.0 2024-08-14 03:29:10,122 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2459130.0, ans=0.125 2024-08-14 03:29:11,350 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2459230.0, ans=0.0 2024-08-14 03:29:12,380 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 14050, loss[loss=0.1293, beats_loss=0.01132, ecapa_loss=0.0001347, whisper_loss=0.1166, over 23762.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01072, ecapa_loss=0.0001577, whisper_loss=0.09218, over 3900244.78 frames. ], batch size: 90, lr: 3.63e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:29:12,801 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 28 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-14 03:29:16,427 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.03 vs. limit=15.0 2024-08-14 03:29:16,763 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.00 vs. limit=15.0 2024-08-14 03:29:21,138 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2459230.0, ans=0.125 2024-08-14 03:29:22,315 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2459230.0, ans=0.05 2024-08-14 03:29:35,195 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=12.71 vs. limit=15.0 2024-08-14 03:29:58,124 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.911e+01 2.432e+01 2.589e+01 2.887e+01 3.706e+01, threshold=5.177e+01, percent-clipped=0.0 2024-08-14 03:30:07,180 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.40 vs. limit=15.0 2024-08-14 03:30:12,293 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 16 from Vox, 49 fro AS 2024-08-14 03:30:27,907 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 24 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-14 03:30:31,878 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 14100, loss[loss=0.1009, beats_loss=0.01401, ecapa_loss=0.0001784, whisper_loss=0.08509, over 22047.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01077, ecapa_loss=0.0001569, whisper_loss=0.09193, over 3891382.77 frames. ], batch size: 93, lr: 3.63e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:30:37,265 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2459730.0, ans=0.125 2024-08-14 03:30:54,228 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2459830.0, ans=0.0 2024-08-14 03:30:59,766 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 33 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-14 03:31:03,112 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2459930.0, ans=0.125 2024-08-14 03:31:09,572 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2459930.0, ans=0.125 2024-08-14 03:31:11,164 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2459930.0, ans=0.125 2024-08-14 03:31:14,750 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2459930.0, ans=0.125 2024-08-14 03:31:16,059 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 26 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-14 03:31:27,480 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=2460030.0, ans=10.0 2024-08-14 03:31:49,702 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 29 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-14 03:31:52,550 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 14150, loss[loss=0.1089, beats_loss=0.01323, ecapa_loss=0.0001175, whisper_loss=0.09446, over 16590.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01073, ecapa_loss=0.0001571, whisper_loss=0.09246, over 3877549.23 frames. ], batch size: 61, lr: 3.63e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:31:53,186 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2460230.0, ans=0.125 2024-08-14 03:32:02,113 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2460230.0, ans=0.0 2024-08-14 03:32:25,961 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2460430.0, ans=0.125 2024-08-14 03:32:25,979 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2460430.0, ans=0.0 2024-08-14 03:32:26,236 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.62 vs. limit=12.0 2024-08-14 03:32:36,274 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2460430.0, ans=0.125 2024-08-14 03:32:40,284 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.017e+01 2.331e+01 2.560e+01 2.927e+01 4.747e+01, threshold=5.121e+01, percent-clipped=0.0 2024-08-14 03:32:47,747 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 23 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-14 03:33:10,148 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2460630.0, ans=0.125 2024-08-14 03:33:12,617 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.76 vs. limit=6.0 2024-08-14 03:33:16,370 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 14200, loss[loss=0.1155, beats_loss=0.00973, ecapa_loss=0.0001663, whisper_loss=0.1041, over 21557.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01076, ecapa_loss=0.0001572, whisper_loss=0.09224, over 3892750.10 frames. ], batch size: 88, lr: 3.63e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:33:34,024 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=2460830.0, ans=0.1 2024-08-14 03:33:39,193 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2460830.0, ans=0.125 2024-08-14 03:33:47,040 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2460830.0, ans=0.5 2024-08-14 03:34:13,235 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 20 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-14 03:34:38,748 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 17 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-14 03:34:40,261 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 14250, loss[loss=0.08743, beats_loss=0.01439, ecapa_loss=0.0001305, whisper_loss=0.07173, over 16451.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01067, ecapa_loss=0.0001567, whisper_loss=0.09272, over 3898145.08 frames. ], batch size: 67, lr: 3.63e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:34:40,948 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.66 vs. limit=15.0 2024-08-14 03:34:43,232 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 26 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-14 03:34:44,760 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=2461230.0, ans=10.0 2024-08-14 03:35:00,394 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2461330.0, ans=0.125 2024-08-14 03:35:09,970 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.45 vs. limit=15.0 2024-08-14 03:35:25,908 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.881e+01 2.288e+01 2.518e+01 2.897e+01 5.060e+01, threshold=5.036e+01, percent-clipped=0.0 2024-08-14 03:35:42,805 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2461630.0, ans=0.0 2024-08-14 03:35:46,396 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2461630.0, ans=0.5 2024-08-14 03:35:59,765 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 14300, loss[loss=0.12, beats_loss=0.009532, ecapa_loss=0.0001586, whisper_loss=0.1088, over 21008.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01069, ecapa_loss=0.0001565, whisper_loss=0.09212, over 3885058.20 frames. ], batch size: 81, lr: 3.63e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:36:01,842 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 19 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-14 03:36:03,776 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.61 vs. limit=15.0 2024-08-14 03:36:08,133 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 28 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-14 03:36:10,867 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2461730.0, ans=0.0 2024-08-14 03:36:13,632 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 29 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-14 03:36:16,595 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2461830.0, ans=0.0 2024-08-14 03:36:34,174 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2461930.0, ans=0.125 2024-08-14 03:36:38,065 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2461930.0, ans=0.07 2024-08-14 03:36:50,861 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 28 from LS+wenet, 33 from Vox, 25 fro AS 2024-08-14 03:36:51,566 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.73 vs. limit=12.0 2024-08-14 03:37:00,519 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 20 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-14 03:37:08,678 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2462130.0, ans=0.125 2024-08-14 03:37:11,750 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2462130.0, ans=0.125 2024-08-14 03:37:12,172 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=2462130.0, ans=15.0 2024-08-14 03:37:18,369 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 14350, loss[loss=0.09628, beats_loss=0.007302, ecapa_loss=0.0002206, whisper_loss=0.08677, over 13006.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.0107, ecapa_loss=0.0001564, whisper_loss=0.09166, over 3880215.85 frames. ], batch size: 56, lr: 3.63e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:37:18,750 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2462230.0, ans=0.125 2024-08-14 03:37:21,873 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-14 03:37:37,904 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2462330.0, ans=0.125 2024-08-14 03:37:44,856 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2462330.0, ans=0.125 2024-08-14 03:37:46,693 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.17 vs. limit=10.0 2024-08-14 03:37:55,618 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2462430.0, ans=0.125 2024-08-14 03:37:57,110 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2462430.0, ans=0.0 2024-08-14 03:38:03,863 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.804e+01 2.518e+01 2.731e+01 3.066e+01 7.073e+01, threshold=5.463e+01, percent-clipped=1.0 2024-08-14 03:38:20,568 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 25 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-14 03:38:21,847 INFO [train_multi_KD3.py:844] (1/4) A total of 97 cuts. 30 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-14 03:38:33,489 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.41 vs. limit=10.0 2024-08-14 03:38:36,967 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 14400, loss[loss=0.134, beats_loss=0.009063, ecapa_loss=0.0001331, whisper_loss=0.1236, over 23881.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01077, ecapa_loss=0.0001579, whisper_loss=0.09145, over 3916001.71 frames. ], batch size: 86, lr: 3.63e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:39:05,265 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 24 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-14 03:39:08,143 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2462930.0, ans=0.2 2024-08-14 03:39:08,271 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2462930.0, ans=0.125 2024-08-14 03:39:33,875 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 26 from LS+wenet, 10 from Vox, 27 fro AS 2024-08-14 03:39:44,214 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2463130.0, ans=0.2 2024-08-14 03:39:50,797 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2463130.0, ans=0.125 2024-08-14 03:39:53,619 INFO [train_multi_KD3.py:1116] (1/4) Epoch 17, batch 14450, loss[loss=0.1196, beats_loss=0.008857, ecapa_loss=0.0001585, whisper_loss=0.1092, over 23161.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01085, ecapa_loss=0.0001582, whisper_loss=0.09077, over 3904120.17 frames. ], batch size: 90, lr: 3.63e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:39:56,729 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 13 from Vox, 47 fro AS 2024-08-14 03:40:04,078 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2463230.0, ans=0.1 2024-08-14 03:40:20,018 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2463330.0, ans=0.1 2024-08-14 03:40:21,491 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 24 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-14 03:40:40,727 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.994e+01 2.412e+01 2.642e+01 2.928e+01 4.301e+01, threshold=5.284e+01, percent-clipped=0.0 2024-08-14 03:40:42,059 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 21 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-14 03:40:58,511 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2463630.0, ans=0.125 2024-08-14 03:41:52,886 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 0, loss[loss=0.1269, beats_loss=0.007424, ecapa_loss=0.0001717, whisper_loss=0.1178, over 14083.00 frames. ], tot_loss[loss=0.1269, beats_loss=0.007424, ecapa_loss=0.0001717, whisper_loss=0.1178, over 14083.00 frames. ], batch size: 54, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:41:52,886 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-14 03:42:32,759 INFO [train_multi_KD3.py:1149] (1/4) Epoch 18, validation on ASR_libri: loss=0.2539, beats_loss=0, ecapa_loss=0.0005528, whisper_loss=0.2483, over 922467.00 frames. 2024-08-14 03:42:43,703 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([9.8827e-04, 2.8607e-02, 8.6648e-03, 2.4886e+00, 3.2009e-03, 2.8083e-02, 3.2964e-02, 2.9425e-02], device='cuda:1') 2024-08-14 03:42:48,580 INFO [train_multi_KD3.py:1149] (1/4) Epoch 18, validation on SV_voxceleb1: loss=0.004396, beats_loss=0, ecapa_loss=0.0004396, whisper_loss=0, over 939242.00 frames. 2024-08-14 03:44:37,183 INFO [train_multi_KD3.py:1149] (1/4) Epoch 18, validation on AT_audioset: loss=0.0235, beats_loss=0.0235, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 03:44:37,186 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 30639MB 2024-08-14 03:45:07,604 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2463820.0, ans=0.0 2024-08-14 03:45:33,073 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 25 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-14 03:46:01,972 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 25 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-14 03:46:02,180 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2464020.0, ans=0.125 2024-08-14 03:46:07,540 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2464020.0, ans=0.125 2024-08-14 03:46:14,343 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 17 from LS+wenet, 27 from Vox, 15 fro AS 2024-08-14 03:46:19,007 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 22 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-14 03:46:26,529 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2464120.0, ans=0.125 2024-08-14 03:46:40,487 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 50, loss[loss=0.1162, beats_loss=0.008001, ecapa_loss=0.0001872, whisper_loss=0.1063, over 22579.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.009613, ecapa_loss=0.0001624, whisper_loss=0.09402, over 906296.52 frames. ], batch size: 87, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:46:51,507 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2464220.0, ans=0.1 2024-08-14 03:46:57,484 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 24 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-14 03:47:16,528 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.79 vs. limit=15.0 2024-08-14 03:47:23,291 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2464420.0, ans=0.125 2024-08-14 03:47:25,608 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.098e+01 2024-08-14 03:47:45,102 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 20 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-14 03:47:47,102 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.767e+01 2.625e+01 2.934e+01 3.274e+01 1.725e+02, threshold=5.869e+01, percent-clipped=1.0 2024-08-14 03:47:47,540 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2464520.0, ans=0.1 2024-08-14 03:47:57,363 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2464520.0, ans=0.125 2024-08-14 03:48:07,186 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2464620.0, ans=0.05 2024-08-14 03:48:11,829 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2464620.0, ans=0.125 2024-08-14 03:48:25,746 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2464620.0, ans=0.1 2024-08-14 03:48:27,903 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2464620.0, ans=0.1 2024-08-14 03:48:31,282 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 100, loss[loss=0.1053, beats_loss=0.009523, ecapa_loss=0.0001743, whisper_loss=0.09408, over 23377.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.009524, ecapa_loss=0.000162, whisper_loss=0.09265, over 1555344.97 frames. ], batch size: 95, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:48:51,216 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2464820.0, ans=0.0 2024-08-14 03:49:03,117 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.58 vs. limit=15.0 2024-08-14 03:49:07,237 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 16 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-14 03:49:11,928 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2464820.0, ans=0.0 2024-08-14 03:49:41,233 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2465020.0, ans=0.125 2024-08-14 03:49:55,148 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2465120.0, ans=0.035 2024-08-14 03:49:56,545 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 22 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-14 03:49:56,962 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.53 vs. limit=22.5 2024-08-14 03:50:14,148 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 150, loss[loss=0.08015, beats_loss=0.009474, ecapa_loss=0.0001431, whisper_loss=0.06924, over 16345.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.009565, ecapa_loss=0.0001615, whisper_loss=0.09219, over 2078465.59 frames. ], batch size: 58, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:50:20,132 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2465220.0, ans=0.125 2024-08-14 03:50:36,314 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2465320.0, ans=0.04949747468305833 2024-08-14 03:50:39,693 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2465320.0, ans=0.125 2024-08-14 03:50:40,964 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2465320.0, ans=0.1 2024-08-14 03:51:02,759 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.250e+01 2.708e+01 3.001e+01 3.363e+01 1.526e+02, threshold=6.002e+01, percent-clipped=2.0 2024-08-14 03:51:15,081 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.55 vs. limit=22.5 2024-08-14 03:51:17,489 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2465620.0, ans=0.1 2024-08-14 03:51:26,127 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 18 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-14 03:51:26,628 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.02 vs. limit=15.0 2024-08-14 03:51:32,417 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 17 from Vox, 49 fro AS 2024-08-14 03:51:33,969 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 200, loss[loss=0.09508, beats_loss=0.01458, ecapa_loss=0.000139, whisper_loss=0.0791, over 21997.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.009895, ecapa_loss=0.0001595, whisper_loss=0.09142, over 2460848.70 frames. ], batch size: 90, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:51:42,606 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 23 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-14 03:51:45,723 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2465720.0, ans=0.0 2024-08-14 03:52:08,781 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2465920.0, ans=0.125 2024-08-14 03:52:23,524 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 26 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-14 03:52:29,147 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2466020.0, ans=0.2 2024-08-14 03:52:44,311 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.97 vs. limit=22.5 2024-08-14 03:52:52,617 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2466120.0, ans=0.0 2024-08-14 03:52:55,050 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 250, loss[loss=0.1137, beats_loss=0.01064, ecapa_loss=0.00014, whisper_loss=0.1016, over 23397.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01008, ecapa_loss=0.0001593, whisper_loss=0.0914, over 2753068.59 frames. ], batch size: 93, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:52:55,168 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 21 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-14 03:52:57,166 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 23 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-14 03:52:58,755 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2466220.0, ans=0.125 2024-08-14 03:53:01,454 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2466220.0, ans=0.0 2024-08-14 03:53:13,855 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.55 vs. limit=15.0 2024-08-14 03:53:17,330 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 22 from LS+wenet, 8 from Vox, 24 fro AS 2024-08-14 03:53:33,529 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.85 vs. limit=10.0 2024-08-14 03:53:46,347 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.813e+01 2.439e+01 2.692e+01 3.141e+01 8.859e+01, threshold=5.385e+01, percent-clipped=1.0 2024-08-14 03:54:19,667 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 300, loss[loss=0.09552, beats_loss=0.01421, ecapa_loss=0.0001629, whisper_loss=0.07968, over 20955.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01023, ecapa_loss=0.0001585, whisper_loss=0.09084, over 2959013.52 frames. ], batch size: 86, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:54:59,255 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 20 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-14 03:55:14,243 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.18 vs. limit=22.5 2024-08-14 03:55:16,327 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.52 vs. limit=10.0 2024-08-14 03:55:21,680 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 23 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-14 03:55:27,691 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2467120.0, ans=0.2 2024-08-14 03:55:39,072 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 350, loss[loss=0.09673, beats_loss=0.01134, ecapa_loss=0.0001917, whisper_loss=0.08347, over 17577.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01033, ecapa_loss=0.0001587, whisper_loss=0.09056, over 3149119.32 frames. ], batch size: 72, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:55:49,984 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.25 vs. limit=22.5 2024-08-14 03:56:15,473 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 20 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-14 03:56:21,907 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2467420.0, ans=0.125 2024-08-14 03:56:25,342 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.807e+01 2.373e+01 2.538e+01 2.756e+01 1.193e+02, threshold=5.077e+01, percent-clipped=2.0 2024-08-14 03:56:28,814 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 25 from LS+wenet, 19 from Vox, 15 fro AS 2024-08-14 03:56:32,537 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.88 vs. limit=15.0 2024-08-14 03:56:33,573 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2467520.0, ans=0.2 2024-08-14 03:56:36,664 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.208e+01 2024-08-14 03:56:45,338 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2467620.0, ans=0.125 2024-08-14 03:56:55,350 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 400, loss[loss=0.1049, beats_loss=0.01185, ecapa_loss=0.0001348, whisper_loss=0.09166, over 23128.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01041, ecapa_loss=0.0001583, whisper_loss=0.09021, over 3297113.29 frames. ], batch size: 91, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:56:55,942 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2467720.0, ans=0.125 2024-08-14 03:56:58,916 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2467720.0, ans=0.125 2024-08-14 03:57:26,007 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2467920.0, ans=0.1 2024-08-14 03:57:34,391 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 21 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-14 03:57:36,325 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 15 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-14 03:57:36,861 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2467920.0, ans=0.0 2024-08-14 03:57:37,849 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 12 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-14 03:57:38,634 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.44 vs. limit=22.5 2024-08-14 03:57:42,940 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 20 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-14 03:57:59,857 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 22 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-14 03:58:04,827 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.06 vs. limit=15.0 2024-08-14 03:58:11,421 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 450, loss[loss=0.1162, beats_loss=0.01043, ecapa_loss=0.000155, whisper_loss=0.1042, over 22916.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01047, ecapa_loss=0.0001579, whisper_loss=0.08918, over 3431020.93 frames. ], batch size: 92, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:58:14,735 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=12.25 vs. limit=12.0 2024-08-14 03:58:34,444 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 24 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-14 03:58:39,375 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.04 vs. limit=15.0 2024-08-14 03:58:42,746 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2468420.0, ans=0.1 2024-08-14 03:58:49,351 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 26 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-14 03:58:57,379 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.720e+01 2.249e+01 2.491e+01 2.829e+01 3.988e+01, threshold=4.982e+01, percent-clipped=0.0 2024-08-14 03:58:58,852 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2468520.0, ans=0.2 2024-08-14 03:59:03,641 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2468520.0, ans=0.0 2024-08-14 03:59:06,773 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2468520.0, ans=0.0 2024-08-14 03:59:28,828 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 500, loss[loss=0.1061, beats_loss=0.007191, ecapa_loss=0.0001583, whisper_loss=0.09729, over 15793.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01051, ecapa_loss=0.000156, whisper_loss=0.08983, over 3519470.83 frames. ], batch size: 59, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:59:32,476 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 21 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-14 03:59:32,734 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2468720.0, ans=0.1 2024-08-14 03:59:59,315 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2468920.0, ans=0.0 2024-08-14 03:59:59,796 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.20 vs. limit=15.0 2024-08-14 04:00:05,418 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.79 vs. limit=22.5 2024-08-14 04:00:07,692 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-14 04:00:07,982 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2468920.0, ans=0.1 2024-08-14 04:00:14,115 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2469020.0, ans=0.125 2024-08-14 04:00:31,983 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 39 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-14 04:00:33,566 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 23 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-14 04:00:44,645 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2469220.0, ans=0.1 2024-08-14 04:00:45,692 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 550, loss[loss=0.1135, beats_loss=0.009476, ecapa_loss=0.0001603, whisper_loss=0.1024, over 18451.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01051, ecapa_loss=0.0001559, whisper_loss=0.09044, over 3626252.66 frames. ], batch size: 74, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:01:02,311 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2469320.0, ans=0.125 2024-08-14 04:01:12,488 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 19 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-14 04:01:17,683 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2469420.0, ans=0.125 2024-08-14 04:01:27,014 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2469420.0, ans=0.1 2024-08-14 04:01:32,132 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.839e+01 2.428e+01 2.764e+01 3.145e+01 1.301e+02, threshold=5.528e+01, percent-clipped=2.0 2024-08-14 04:01:37,010 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 20 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-14 04:01:38,466 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 20 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-14 04:01:58,053 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.65 vs. limit=15.0 2024-08-14 04:02:01,579 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 600, loss[loss=0.1033, beats_loss=0.01043, ecapa_loss=0.0001705, whisper_loss=0.09118, over 18681.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01056, ecapa_loss=0.0001559, whisper_loss=0.09057, over 3662821.14 frames. ], batch size: 75, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:02:12,450 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2469720.0, ans=0.125 2024-08-14 04:02:31,032 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 24 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-14 04:02:42,974 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2469920.0, ans=0.0 2024-08-14 04:02:58,221 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2470020.0, ans=0.05 2024-08-14 04:03:02,969 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2470120.0, ans=0.1 2024-08-14 04:03:15,894 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 650, loss[loss=0.1075, beats_loss=0.008897, ecapa_loss=0.0001888, whisper_loss=0.09669, over 17381.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0106, ecapa_loss=0.0001548, whisper_loss=0.09059, over 3699190.56 frames. ], batch size: 72, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:03:21,893 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 20 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-14 04:03:22,415 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2470220.0, ans=0.125 2024-08-14 04:03:32,660 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-14 04:03:37,672 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2470320.0, ans=0.125 2024-08-14 04:03:43,853 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2470320.0, ans=0.0 2024-08-14 04:03:49,665 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2470420.0, ans=0.125 2024-08-14 04:04:00,788 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 21 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-14 04:04:02,213 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.834e+01 2.415e+01 2.559e+01 3.017e+01 4.730e+01, threshold=5.119e+01, percent-clipped=1.0 2024-08-14 04:04:04,307 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 32 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-14 04:04:20,875 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2470620.0, ans=0.0 2024-08-14 04:04:21,027 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.35 vs. limit=15.0 2024-08-14 04:04:30,309 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.95 vs. limit=15.0 2024-08-14 04:04:32,382 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 700, loss[loss=0.1118, beats_loss=0.00975, ecapa_loss=0.0001596, whisper_loss=0.1005, over 23136.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01062, ecapa_loss=0.0001551, whisper_loss=0.09075, over 3748508.33 frames. ], batch size: 92, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:04:32,934 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2470720.0, ans=0.125 2024-08-14 04:05:08,737 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 19 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-14 04:05:11,677 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 32 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-14 04:05:22,250 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 21 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-14 04:05:29,708 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 24 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-14 04:05:32,899 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 20 from LS+wenet, 7 from Vox, 29 fro AS 2024-08-14 04:05:40,370 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 14 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-14 04:05:47,326 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 750, loss[loss=0.1043, beats_loss=0.01118, ecapa_loss=0.0001231, whisper_loss=0.09189, over 20377.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01063, ecapa_loss=0.0001547, whisper_loss=0.09004, over 3760349.63 frames. ], batch size: 77, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:06:09,033 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2471320.0, ans=0.0 2024-08-14 04:06:26,960 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2471420.0, ans=0.1 2024-08-14 04:06:30,413 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-14 04:06:31,911 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.864e+01 2.269e+01 2.490e+01 2.820e+01 4.318e+01, threshold=4.979e+01, percent-clipped=0.0 2024-08-14 04:06:34,549 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2471520.0, ans=0.125 2024-08-14 04:06:36,989 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 22 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-14 04:07:01,934 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 800, loss[loss=0.1059, beats_loss=0.009041, ecapa_loss=0.000178, whisper_loss=0.0951, over 18210.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0106, ecapa_loss=0.0001556, whisper_loss=0.09013, over 3793144.46 frames. ], batch size: 74, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:07:14,054 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2471720.0, ans=0.0 2024-08-14 04:07:15,019 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 21 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-14 04:07:24,660 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2471820.0, ans=0.125 2024-08-14 04:07:37,636 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 16 from LS+wenet, 26 from Vox, 24 fro AS 2024-08-14 04:07:42,101 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2471920.0, ans=0.2 2024-08-14 04:08:02,890 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 30 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-14 04:08:15,385 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2472120.0, ans=0.0 2024-08-14 04:08:16,533 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2472220.0, ans=0.125 2024-08-14 04:08:17,675 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 850, loss[loss=0.1298, beats_loss=0.008138, ecapa_loss=0.0001698, whisper_loss=0.12, over 18183.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01063, ecapa_loss=0.0001548, whisper_loss=0.08961, over 3771028.31 frames. ], batch size: 70, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:08:20,078 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.82 vs. limit=6.0 2024-08-14 04:08:26,018 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.82 vs. limit=12.0 2024-08-14 04:08:30,060 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2472220.0, ans=0.2 2024-08-14 04:08:42,554 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 20 from LS+wenet, 32 from Vox, 37 fro AS 2024-08-14 04:08:50,047 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2472420.0, ans=0.2 2024-08-14 04:08:52,660 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 33 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-14 04:08:54,247 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2472420.0, ans=0.0 2024-08-14 04:09:01,299 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.448e+01 2.671e+01 3.055e+01 4.887e+01, threshold=5.342e+01, percent-clipped=0.0 2024-08-14 04:09:05,064 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2472520.0, ans=0.125 2024-08-14 04:09:10,294 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2472520.0, ans=0.125 2024-08-14 04:09:33,194 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 900, loss[loss=0.07516, beats_loss=0.01384, ecapa_loss=0.0001766, whisper_loss=0.05955, over 21251.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01062, ecapa_loss=0.0001539, whisper_loss=0.08947, over 3797067.27 frames. ], batch size: 92, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:09:35,204 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 24 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-14 04:09:37,203 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.25 vs. limit=15.0 2024-08-14 04:09:48,112 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.25 vs. limit=15.0 2024-08-14 04:09:59,173 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.87 vs. limit=15.0 2024-08-14 04:10:13,692 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.92 vs. limit=10.0 2024-08-14 04:10:27,883 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-14 04:10:29,755 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2473020.0, ans=0.035 2024-08-14 04:10:45,093 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2473120.0, ans=0.0 2024-08-14 04:10:50,828 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 950, loss[loss=0.0987, beats_loss=0.01092, ecapa_loss=0.0001564, whisper_loss=0.08621, over 22216.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01063, ecapa_loss=0.0001539, whisper_loss=0.09004, over 3795549.70 frames. ], batch size: 89, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:11:04,995 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2473320.0, ans=0.125 2024-08-14 04:11:14,324 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.84 vs. limit=15.0 2024-08-14 04:11:25,161 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2473420.0, ans=0.1 2024-08-14 04:11:35,316 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.848e+01 2.279e+01 2.588e+01 3.016e+01 4.728e+01, threshold=5.177e+01, percent-clipped=0.0 2024-08-14 04:11:57,912 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2473620.0, ans=0.125 2024-08-14 04:12:01,002 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.41 vs. limit=15.0 2024-08-14 04:12:04,789 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 1000, loss[loss=0.1308, beats_loss=0.008682, ecapa_loss=0.0001612, whisper_loss=0.1205, over 23428.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01078, ecapa_loss=0.0001526, whisper_loss=0.08913, over 3792103.88 frames. ], batch size: 90, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:12:06,907 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2473720.0, ans=10.0 2024-08-14 04:12:14,511 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2473720.0, ans=0.0 2024-08-14 04:12:23,466 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 24 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-14 04:12:28,501 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2473820.0, ans=0.1 2024-08-14 04:13:03,742 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2474020.0, ans=0.2 2024-08-14 04:13:05,025 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2474120.0, ans=0.125 2024-08-14 04:13:21,857 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 1050, loss[loss=0.1128, beats_loss=0.01146, ecapa_loss=0.0001637, whisper_loss=0.09968, over 20027.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01073, ecapa_loss=0.0001523, whisper_loss=0.09007, over 3838358.81 frames. ], batch size: 83, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:13:41,108 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 33 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-14 04:13:43,798 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 28 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-14 04:13:59,041 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 22 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-14 04:14:07,653 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.30 vs. limit=10.0 2024-08-14 04:14:08,184 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.822e+01 2.402e+01 2.807e+01 3.075e+01 7.896e+01, threshold=5.614e+01, percent-clipped=1.0 2024-08-14 04:14:38,492 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 1100, loss[loss=0.1093, beats_loss=0.009286, ecapa_loss=0.0001934, whisper_loss=0.09811, over 20451.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0107, ecapa_loss=0.000153, whisper_loss=0.09062, over 3853382.82 frames. ], batch size: 84, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:14:56,375 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 17 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-14 04:14:57,875 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 32 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-14 04:15:12,571 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 28 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-14 04:15:21,927 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 29 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-14 04:15:52,716 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 1150, loss[loss=0.093, beats_loss=0.01174, ecapa_loss=0.0002075, whisper_loss=0.07918, over 20521.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01075, ecapa_loss=0.000154, whisper_loss=0.09027, over 3864442.03 frames. ], batch size: 89, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:16:07,470 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.20 vs. limit=15.0 2024-08-14 04:16:07,972 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 21 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-14 04:16:16,711 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 21 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-14 04:16:38,153 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.737e+01 2.338e+01 2.593e+01 2.937e+01 5.602e+01, threshold=5.186e+01, percent-clipped=0.0 2024-08-14 04:16:40,204 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2475520.0, ans=0.1 2024-08-14 04:16:43,443 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2475520.0, ans=10.0 2024-08-14 04:16:50,875 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.36 vs. limit=15.0 2024-08-14 04:16:51,949 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2475620.0, ans=0.125 2024-08-14 04:16:56,496 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2475620.0, ans=0.125 2024-08-14 04:17:07,617 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 1200, loss[loss=0.1092, beats_loss=0.01123, ecapa_loss=0.0001294, whisper_loss=0.09668, over 16168.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01076, ecapa_loss=0.0001535, whisper_loss=0.09014, over 3852755.47 frames. ], batch size: 62, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:17:18,462 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.40 vs. limit=15.0 2024-08-14 04:17:20,847 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 21 from LS+wenet, 29 from Vox, 26 fro AS 2024-08-14 04:17:38,141 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-14 04:17:53,011 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2476020.0, ans=0.0 2024-08-14 04:17:59,303 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2476020.0, ans=0.5 2024-08-14 04:18:04,889 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2476120.0, ans=0.0 2024-08-14 04:18:06,398 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 29 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-14 04:18:08,040 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2476120.0, ans=0.1 2024-08-14 04:18:13,907 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 25 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-14 04:18:21,519 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 1250, loss[loss=0.1048, beats_loss=0.01081, ecapa_loss=0.0001352, whisper_loss=0.09263, over 15757.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01074, ecapa_loss=0.0001539, whisper_loss=0.09009, over 3835504.81 frames. ], batch size: 61, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:18:25,309 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 04:18:31,545 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.85 vs. limit=15.0 2024-08-14 04:18:53,930 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 22 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-14 04:18:57,240 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 20 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-14 04:19:00,488 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 28 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-14 04:19:07,200 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.365e+01 2.557e+01 2.889e+01 4.348e+01, threshold=5.114e+01, percent-clipped=0.0 2024-08-14 04:19:25,882 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2476620.0, ans=0.0 2024-08-14 04:19:29,108 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2476620.0, ans=0.125 2024-08-14 04:19:38,279 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 1300, loss[loss=0.09972, beats_loss=0.01118, ecapa_loss=0.0001611, whisper_loss=0.08693, over 21831.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01081, ecapa_loss=0.0001538, whisper_loss=0.08952, over 3850343.44 frames. ], batch size: 89, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:19:52,567 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2476820.0, ans=0.125 2024-08-14 04:20:08,064 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2476920.0, ans=0.0 2024-08-14 04:20:19,089 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=2476920.0, ans=6.0 2024-08-14 04:20:19,706 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 38 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-14 04:20:55,394 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 1350, loss[loss=0.09059, beats_loss=0.01055, ecapa_loss=0.0001705, whisper_loss=0.07833, over 18479.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01076, ecapa_loss=0.0001541, whisper_loss=0.08988, over 3843748.77 frames. ], batch size: 77, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:21:30,801 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 16 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-14 04:21:38,670 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 24 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-14 04:21:40,142 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 34 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-14 04:21:41,216 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.297e+01 2.516e+01 2.764e+01 4.025e+01, threshold=5.033e+01, percent-clipped=0.0 2024-08-14 04:21:51,247 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.66 vs. limit=10.0 2024-08-14 04:22:11,502 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 1400, loss[loss=0.1192, beats_loss=0.01005, ecapa_loss=0.0001411, whisper_loss=0.1078, over 22640.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01081, ecapa_loss=0.0001533, whisper_loss=0.08928, over 3839996.30 frames. ], batch size: 90, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:22:25,308 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2477820.0, ans=0.125 2024-08-14 04:22:32,498 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 19 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-14 04:22:33,060 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.19 vs. limit=15.0 2024-08-14 04:22:34,707 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.69 vs. limit=15.0 2024-08-14 04:22:40,636 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2477920.0, ans=0.09899494936611666 2024-08-14 04:22:48,310 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2477920.0, ans=0.1 2024-08-14 04:23:00,753 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.58 vs. limit=15.0 2024-08-14 04:23:09,746 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2478020.0, ans=0.07 2024-08-14 04:23:17,234 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2478120.0, ans=0.125 2024-08-14 04:24:06,606 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 1450, loss[loss=0.08565, beats_loss=0.0134, ecapa_loss=0.0001701, whisper_loss=0.07055, over 22070.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0107, ecapa_loss=0.0001531, whisper_loss=0.08931, over 3807076.07 frames. ], batch size: 94, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:24:11,997 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2478220.0, ans=0.125 2024-08-14 04:24:32,867 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2478320.0, ans=0.05 2024-08-14 04:24:36,136 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2478320.0, ans=0.125 2024-08-14 04:24:37,166 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 27 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-14 04:24:40,364 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 32 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-14 04:24:55,530 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.896e+01 2.310e+01 2.554e+01 2.920e+01 4.164e+01, threshold=5.108e+01, percent-clipped=0.0 2024-08-14 04:24:58,463 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2478520.0, ans=0.1 2024-08-14 04:25:14,133 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2478620.0, ans=0.125 2024-08-14 04:25:16,337 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2478620.0, ans=0.0 2024-08-14 04:25:17,389 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 29 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-14 04:25:22,365 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2478620.0, ans=0.125 2024-08-14 04:25:28,284 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.84 vs. limit=15.0 2024-08-14 04:25:29,152 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 1500, loss[loss=0.09033, beats_loss=0.01444, ecapa_loss=0.0001206, whisper_loss=0.07468, over 22025.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01076, ecapa_loss=0.0001529, whisper_loss=0.08866, over 3808791.34 frames. ], batch size: 89, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:25:49,826 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-14 04:25:59,804 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-14 04:26:07,943 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 04:26:15,981 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2478920.0, ans=0.125 2024-08-14 04:26:29,600 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2479020.0, ans=0.125 2024-08-14 04:26:30,751 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-14 04:26:30,999 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 04:26:35,185 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2479120.0, ans=0.0 2024-08-14 04:26:39,605 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=2479120.0, ans=10.0 2024-08-14 04:26:50,149 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 1550, loss[loss=0.1027, beats_loss=0.01248, ecapa_loss=0.000133, whisper_loss=0.08893, over 18189.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01072, ecapa_loss=0.0001524, whisper_loss=0.08945, over 3839658.73 frames. ], batch size: 72, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:27:14,030 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2479320.0, ans=0.1 2024-08-14 04:27:39,079 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.798e+01 2.208e+01 2.513e+01 2.710e+01 4.785e+01, threshold=5.026e+01, percent-clipped=0.0 2024-08-14 04:28:02,185 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2479620.0, ans=0.2 2024-08-14 04:28:02,251 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2479620.0, ans=0.0 2024-08-14 04:28:06,537 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 17 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-14 04:28:06,948 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2479620.0, ans=0.2 2024-08-14 04:28:08,557 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2479620.0, ans=0.0 2024-08-14 04:28:11,011 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 1600, loss[loss=0.1264, beats_loss=0.009429, ecapa_loss=0.0001263, whisper_loss=0.1157, over 22771.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01062, ecapa_loss=0.0001519, whisper_loss=0.09022, over 3860087.25 frames. ], batch size: 85, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:28:25,977 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.25 vs. limit=15.0 2024-08-14 04:28:27,452 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.88 vs. limit=15.0 2024-08-14 04:28:31,057 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2479820.0, ans=0.0 2024-08-14 04:28:45,886 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.62 vs. limit=15.0 2024-08-14 04:29:01,809 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2480020.0, ans=0.125 2024-08-14 04:29:08,627 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.38 vs. limit=15.0 2024-08-14 04:29:09,943 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2480020.0, ans=0.125 2024-08-14 04:29:13,013 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2480020.0, ans=0.025 2024-08-14 04:29:17,098 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 18 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-14 04:29:29,413 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2480120.0, ans=0.1 2024-08-14 04:29:31,728 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 1650, loss[loss=0.07482, beats_loss=0.01286, ecapa_loss=0.0001404, whisper_loss=0.06055, over 16408.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01061, ecapa_loss=0.0001509, whisper_loss=0.09076, over 3893704.76 frames. ], batch size: 66, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:29:32,084 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 16 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-14 04:29:40,312 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2480220.0, ans=0.125 2024-08-14 04:29:40,516 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.38 vs. limit=15.0 2024-08-14 04:29:41,776 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2480220.0, ans=0.125 2024-08-14 04:29:44,319 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2480220.0, ans=0.2 2024-08-14 04:29:47,505 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2480320.0, ans=0.125 2024-08-14 04:30:00,809 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.65 vs. limit=15.0 2024-08-14 04:30:07,255 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2480420.0, ans=0.0 2024-08-14 04:30:08,461 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2480420.0, ans=0.125 2024-08-14 04:30:17,121 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.009e+01 2.355e+01 2.575e+01 2.902e+01 4.492e+01, threshold=5.151e+01, percent-clipped=0.0 2024-08-14 04:30:20,773 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2480520.0, ans=0.125 2024-08-14 04:30:23,737 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2480520.0, ans=0.0 2024-08-14 04:30:34,126 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2480620.0, ans=0.0 2024-08-14 04:30:46,895 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 1700, loss[loss=0.1111, beats_loss=0.007377, ecapa_loss=0.0001569, whisper_loss=0.1022, over 21271.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01051, ecapa_loss=0.0001518, whisper_loss=0.09157, over 3897220.02 frames. ], batch size: 81, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:30:59,340 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-14 04:31:11,392 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 14 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-14 04:31:46,398 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2481120.0, ans=0.0 2024-08-14 04:31:56,050 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 28 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-14 04:32:00,346 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 1750, loss[loss=0.1036, beats_loss=0.00832, ecapa_loss=0.0001772, whisper_loss=0.09355, over 16243.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01044, ecapa_loss=0.0001529, whisper_loss=0.09169, over 3859992.37 frames. ], batch size: 65, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:32:02,208 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 24 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-14 04:32:05,265 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2481220.0, ans=0.125 2024-08-14 04:32:06,806 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2481220.0, ans=0.125 2024-08-14 04:32:09,303 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 32 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-14 04:32:09,788 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2481220.0, ans=0.125 2024-08-14 04:32:22,304 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2481320.0, ans=0.1 2024-08-14 04:32:27,582 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2481320.0, ans=0.0 2024-08-14 04:32:29,118 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.00 vs. limit=12.0 2024-08-14 04:32:44,290 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.817e+01 2.328e+01 2.583e+01 3.000e+01 1.080e+02, threshold=5.167e+01, percent-clipped=1.0 2024-08-14 04:32:46,304 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-14 04:32:47,956 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2481520.0, ans=0.2 2024-08-14 04:32:51,104 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.64 vs. limit=12.0 2024-08-14 04:32:53,596 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2481520.0, ans=0.125 2024-08-14 04:33:06,318 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 27 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-14 04:33:07,017 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2481620.0, ans=0.0 2024-08-14 04:33:10,863 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 19 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-14 04:33:13,294 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 1800, loss[loss=0.101, beats_loss=0.01089, ecapa_loss=0.0001222, whisper_loss=0.08891, over 22455.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01049, ecapa_loss=0.0001531, whisper_loss=0.09121, over 3868978.48 frames. ], batch size: 89, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:33:13,484 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 22 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-14 04:33:18,634 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2481720.0, ans=0.5 2024-08-14 04:33:21,399 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2481720.0, ans=0.0 2024-08-14 04:33:40,661 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2481820.0, ans=0.0 2024-08-14 04:33:40,665 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2481820.0, ans=0.125 2024-08-14 04:33:57,653 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 27 from LS+wenet, 10 from Vox, 34 fro AS 2024-08-14 04:34:04,699 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.39 vs. limit=15.0 2024-08-14 04:34:05,772 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2482020.0, ans=0.025 2024-08-14 04:34:15,569 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 9 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-14 04:34:27,620 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 1850, loss[loss=0.08215, beats_loss=0.009478, ecapa_loss=0.0001891, whisper_loss=0.07078, over 13559.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01056, ecapa_loss=0.0001532, whisper_loss=0.0905, over 3844081.32 frames. ], batch size: 55, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:34:31,190 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=2482220.0, ans=0.1 2024-08-14 04:34:37,285 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2482220.0, ans=0.1 2024-08-14 04:34:54,240 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2482320.0, ans=0.2 2024-08-14 04:34:57,019 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 26 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-14 04:35:13,756 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.002e+01 2.339e+01 2.610e+01 2.958e+01 9.834e+01, threshold=5.220e+01, percent-clipped=1.0 2024-08-14 04:35:28,164 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 15 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-14 04:35:34,714 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2482620.0, ans=0.125 2024-08-14 04:35:44,541 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.37 vs. limit=15.0 2024-08-14 04:35:44,963 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 1900, loss[loss=0.07161, beats_loss=0.01176, ecapa_loss=0.000178, whisper_loss=0.05807, over 20664.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01063, ecapa_loss=0.0001523, whisper_loss=0.09064, over 3854294.92 frames. ], batch size: 91, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:35:45,150 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 28 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-14 04:36:21,551 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2482920.0, ans=0.0 2024-08-14 04:36:22,929 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2482920.0, ans=0.0 2024-08-14 04:36:29,402 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 23 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-14 04:36:29,706 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2483020.0, ans=0.1 2024-08-14 04:36:32,673 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2483020.0, ans=0.0 2024-08-14 04:36:36,290 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.47 vs. limit=15.0 2024-08-14 04:36:46,035 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 14 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-14 04:36:48,093 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2483120.0, ans=0.0 2024-08-14 04:37:01,338 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 1950, loss[loss=0.1033, beats_loss=0.00847, ecapa_loss=0.0001322, whisper_loss=0.09347, over 19076.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01056, ecapa_loss=0.0001527, whisper_loss=0.09045, over 3813058.30 frames. ], batch size: 70, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:37:01,786 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2483220.0, ans=0.2 2024-08-14 04:37:12,257 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2483220.0, ans=0.95 2024-08-14 04:37:18,184 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2483320.0, ans=0.125 2024-08-14 04:37:43,489 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2483420.0, ans=0.0 2024-08-14 04:37:43,828 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.21 vs. limit=12.0 2024-08-14 04:37:46,510 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.923e+01 2.351e+01 2.542e+01 2.768e+01 3.987e+01, threshold=5.084e+01, percent-clipped=0.0 2024-08-14 04:37:46,893 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-14 04:37:58,702 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 24 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-14 04:38:08,989 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2483620.0, ans=0.0 2024-08-14 04:38:16,665 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 2000, loss[loss=0.109, beats_loss=0.01009, ecapa_loss=0.0001854, whisper_loss=0.09706, over 21532.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01056, ecapa_loss=0.0001543, whisper_loss=0.09047, over 3803238.17 frames. ], batch size: 91, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:38:25,198 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=5.081e-03 2024-08-14 04:38:31,450 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2483820.0, ans=0.0 2024-08-14 04:38:32,842 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2483820.0, ans=0.0 2024-08-14 04:38:35,008 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.24 vs. limit=12.0 2024-08-14 04:38:35,978 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 27 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-14 04:38:52,837 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2483920.0, ans=0.125 2024-08-14 04:39:03,310 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2483920.0, ans=0.1 2024-08-14 04:39:06,204 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2484020.0, ans=0.2 2024-08-14 04:39:16,498 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2484020.0, ans=0.2 2024-08-14 04:39:37,817 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 2050, loss[loss=0.1245, beats_loss=0.0104, ecapa_loss=0.000167, whisper_loss=0.1124, over 22782.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01059, ecapa_loss=0.0001552, whisper_loss=0.09037, over 3797239.80 frames. ], batch size: 94, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:39:38,563 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2484220.0, ans=0.0 2024-08-14 04:39:42,503 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=2484220.0, ans=10.0 2024-08-14 04:39:50,900 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.949e+05 2024-08-14 04:39:54,648 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.64 vs. limit=12.0 2024-08-14 04:40:16,537 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2484420.0, ans=0.1 2024-08-14 04:40:18,155 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.66 vs. limit=15.0 2024-08-14 04:40:25,622 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.087e+01 2.326e+01 2.679e+01 3.072e+01 5.038e+01, threshold=5.357e+01, percent-clipped=0.0 2024-08-14 04:40:57,137 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 2100, loss[loss=0.09728, beats_loss=0.0119, ecapa_loss=0.0001604, whisper_loss=0.08377, over 15173.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0107, ecapa_loss=0.0001548, whisper_loss=0.08974, over 3785226.78 frames. ], batch size: 61, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:41:47,235 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2485020.0, ans=0.125 2024-08-14 04:41:53,431 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2485020.0, ans=0.1 2024-08-14 04:41:56,407 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 31 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-14 04:42:07,764 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 22 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-14 04:42:12,954 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2485120.0, ans=0.125 2024-08-14 04:42:14,478 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2485220.0, ans=0.125 2024-08-14 04:42:15,262 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 2150, loss[loss=0.08858, beats_loss=0.01209, ecapa_loss=0.0001269, whisper_loss=0.07521, over 16566.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0107, ecapa_loss=0.0001529, whisper_loss=0.09045, over 3809363.97 frames. ], batch size: 64, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:42:49,218 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 17 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-14 04:42:51,561 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2485420.0, ans=0.0 2024-08-14 04:42:58,999 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 31 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-14 04:43:04,033 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.306e+01 2.493e+01 2.947e+01 5.632e+01, threshold=4.986e+01, percent-clipped=1.0 2024-08-14 04:43:09,093 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2485520.0, ans=0.125 2024-08-14 04:43:10,471 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2485520.0, ans=0.125 2024-08-14 04:43:10,503 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2485520.0, ans=0.125 2024-08-14 04:43:28,932 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 22 from LS+wenet, 10 from Vox, 23 fro AS 2024-08-14 04:43:35,086 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 2200, loss[loss=0.1207, beats_loss=0.01306, ecapa_loss=0.0001464, whisper_loss=0.1062, over 22829.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01067, ecapa_loss=0.0001532, whisper_loss=0.09117, over 3781241.46 frames. ], batch size: 89, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:43:43,662 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2485720.0, ans=0.0 2024-08-14 04:43:49,605 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-14 04:43:59,434 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2485820.0, ans=0.125 2024-08-14 04:44:03,864 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2485820.0, ans=0.035 2024-08-14 04:44:26,491 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2486020.0, ans=0.125 2024-08-14 04:44:26,501 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2486020.0, ans=0.0 2024-08-14 04:44:28,179 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2486020.0, ans=0.2 2024-08-14 04:44:29,875 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2486020.0, ans=0.0 2024-08-14 04:44:37,391 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-14 04:44:54,445 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 2250, loss[loss=0.1082, beats_loss=0.01344, ecapa_loss=0.000131, whisper_loss=0.09347, over 23187.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01072, ecapa_loss=0.0001534, whisper_loss=0.09133, over 3800728.22 frames. ], batch size: 89, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:45:26,720 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-14 04:45:35,012 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2486420.0, ans=0.125 2024-08-14 04:45:42,495 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.982e+01 2.436e+01 2.743e+01 3.250e+01 7.629e+01, threshold=5.485e+01, percent-clipped=1.0 2024-08-14 04:45:43,060 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2486520.0, ans=0.0 2024-08-14 04:45:51,709 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2486520.0, ans=0.1 2024-08-14 04:46:04,809 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-14 04:46:09,240 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 23 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-14 04:46:15,001 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 2300, loss[loss=0.1076, beats_loss=0.01263, ecapa_loss=0.0001128, whisper_loss=0.09383, over 19700.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01071, ecapa_loss=0.0001536, whisper_loss=0.0918, over 3819344.53 frames. ], batch size: 75, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:46:52,725 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.35 vs. limit=22.5 2024-08-14 04:47:34,582 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 2350, loss[loss=0.07711, beats_loss=0.01092, ecapa_loss=0.0001988, whisper_loss=0.0642, over 17918.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01066, ecapa_loss=0.0001542, whisper_loss=0.09207, over 3846016.99 frames. ], batch size: 75, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:47:48,601 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 25 from LS+wenet, 17 from Vox, 48 fro AS 2024-08-14 04:48:22,139 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.598e+01 2.359e+01 2.626e+01 3.027e+01 4.535e+02, threshold=5.251e+01, percent-clipped=2.0 2024-08-14 04:48:42,219 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.19 vs. limit=12.0 2024-08-14 04:48:45,070 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2487620.0, ans=0.1 2024-08-14 04:48:55,357 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 2400, loss[loss=0.09323, beats_loss=0.01103, ecapa_loss=0.0001369, whisper_loss=0.08084, over 18993.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01068, ecapa_loss=0.000154, whisper_loss=0.09188, over 3864811.05 frames. ], batch size: 74, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:48:57,510 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 22 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 04:49:16,787 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2487820.0, ans=0.2 2024-08-14 04:49:20,476 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.29 vs. limit=22.5 2024-08-14 04:49:22,265 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.47 vs. limit=15.0 2024-08-14 04:49:22,980 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 24 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-14 04:49:24,508 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 19 from LS+wenet, 10 from Vox, 25 fro AS 2024-08-14 04:49:37,669 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2487920.0, ans=0.04949747468305833 2024-08-14 04:49:41,978 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.343e+01 2024-08-14 04:50:03,429 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2024-08-14 04:50:14,010 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 2450, loss[loss=0.1227, beats_loss=0.008224, ecapa_loss=0.0001618, whisper_loss=0.1129, over 22660.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01067, ecapa_loss=0.0001539, whisper_loss=0.09174, over 3867025.84 frames. ], batch size: 90, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:50:25,199 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2488220.0, ans=0.07 2024-08-14 04:50:30,441 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.92 vs. limit=15.0 2024-08-14 04:50:32,699 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 13 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-14 04:50:34,242 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2488320.0, ans=0.125 2024-08-14 04:50:37,911 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2488320.0, ans=0.2 2024-08-14 04:50:42,260 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 20 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-14 04:50:45,709 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2488420.0, ans=0.125 2024-08-14 04:50:53,117 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2488420.0, ans=0.1 2024-08-14 04:51:00,797 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.942e+01 2.341e+01 2.556e+01 2.864e+01 5.420e+01, threshold=5.112e+01, percent-clipped=1.0 2024-08-14 04:51:07,891 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2488520.0, ans=0.95 2024-08-14 04:51:18,046 INFO [train_multi_KD3.py:844] (1/4) A total of 96 cuts. 30 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-14 04:51:30,885 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 28 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-14 04:51:31,998 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 2500, loss[loss=0.113, beats_loss=0.009702, ecapa_loss=0.0001825, whisper_loss=0.1015, over 19640.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01069, ecapa_loss=0.0001543, whisper_loss=0.09088, over 3837663.38 frames. ], batch size: 79, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:51:40,961 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.20 vs. limit=22.5 2024-08-14 04:51:47,531 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 24 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-14 04:51:58,146 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 33 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-14 04:52:33,045 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 19 from LS+wenet, 30 from Vox, 24 fro AS 2024-08-14 04:52:35,398 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2489120.0, ans=0.0 2024-08-14 04:52:38,727 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2489120.0, ans=0.125 2024-08-14 04:52:41,713 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2489120.0, ans=0.125 2024-08-14 04:52:53,018 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 2550, loss[loss=0.08845, beats_loss=0.01057, ecapa_loss=0.0001796, whisper_loss=0.07608, over 16552.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01061, ecapa_loss=0.0001544, whisper_loss=0.09152, over 3839366.50 frames. ], batch size: 71, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:53:00,784 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2489220.0, ans=0.125 2024-08-14 04:53:34,165 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2489420.0, ans=0.0 2024-08-14 04:53:34,267 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2489420.0, ans=0.125 2024-08-14 04:53:34,281 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2489420.0, ans=0.2 2024-08-14 04:53:43,375 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.033e+01 2.452e+01 2.668e+01 3.104e+01 5.723e+01, threshold=5.337e+01, percent-clipped=1.0 2024-08-14 04:53:55,552 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2489520.0, ans=0.2 2024-08-14 04:54:14,158 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 2600, loss[loss=0.08776, beats_loss=0.01266, ecapa_loss=0.0001215, whisper_loss=0.07389, over 21064.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01064, ecapa_loss=0.0001548, whisper_loss=0.09139, over 3846830.02 frames. ], batch size: 83, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:54:16,395 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.17 vs. limit=15.0 2024-08-14 04:54:25,410 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2024-08-14 04:54:33,254 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.66 vs. limit=12.0 2024-08-14 04:55:01,803 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 19 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-14 04:55:02,015 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2489920.0, ans=0.0 2024-08-14 04:55:02,124 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2489920.0, ans=0.1 2024-08-14 04:55:06,340 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=2489920.0, ans=0.95 2024-08-14 04:55:19,437 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 17 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-14 04:55:43,696 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 25 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-14 04:55:51,158 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 2650, loss[loss=0.1114, beats_loss=0.008068, ecapa_loss=0.0002088, whisper_loss=0.1013, over 22090.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01068, ecapa_loss=0.0001555, whisper_loss=0.09104, over 3842102.34 frames. ], batch size: 88, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:56:24,689 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2490320.0, ans=0.0 2024-08-14 04:56:30,704 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2490420.0, ans=0.125 2024-08-14 04:56:44,243 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 14 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-14 04:56:45,316 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.782e+01 2.391e+01 2.607e+01 2.986e+01 4.430e+01, threshold=5.214e+01, percent-clipped=0.0 2024-08-14 04:56:57,809 WARNING [optim.py:496] (1/4) Scaling gradients by 0.0513666495680809, model_norm_threshold=52.13920593261719 2024-08-14 04:56:58,000 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.11, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.106e+05, grad_sumsq=1.106e+05, orig_rms_sq=1.000e+00 2024-08-14 04:57:15,271 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-14 04:57:18,119 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-14 04:57:26,886 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 2700, loss[loss=0.07944, beats_loss=0.01208, ecapa_loss=0.0001508, whisper_loss=0.06585, over 21766.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01074, ecapa_loss=0.0001551, whisper_loss=0.09036, over 3851842.40 frames. ], batch size: 93, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:57:34,086 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.74 vs. limit=10.0 2024-08-14 04:57:54,147 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.16 vs. limit=22.5 2024-08-14 04:58:09,854 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2490820.0, ans=0.0 2024-08-14 04:58:12,761 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2490920.0, ans=0.125 2024-08-14 04:58:38,500 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2491020.0, ans=0.1 2024-08-14 04:58:49,333 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 32 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-14 04:59:15,466 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=2491120.0, ans=10.0 2024-08-14 04:59:15,848 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.75 vs. limit=22.5 2024-08-14 04:59:21,971 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 17 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-14 04:59:24,247 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 26 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-14 04:59:26,955 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 2750, loss[loss=0.1012, beats_loss=0.01008, ecapa_loss=0.0001369, whisper_loss=0.08979, over 21107.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0107, ecapa_loss=0.0001559, whisper_loss=0.09088, over 3845539.33 frames. ], batch size: 81, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:59:37,031 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2491220.0, ans=0.125 2024-08-14 04:59:51,081 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2491320.0, ans=0.2 2024-08-14 05:00:07,915 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2491320.0, ans=0.1 2024-08-14 05:00:37,916 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.424e+01 2.607e+01 2.892e+01 1.015e+03, threshold=5.215e+01, percent-clipped=3.0 2024-08-14 05:00:59,859 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 26 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-14 05:01:04,576 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2491620.0, ans=10.0 2024-08-14 05:01:27,193 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 2800, loss[loss=0.1205, beats_loss=0.007223, ecapa_loss=0.0001982, whisper_loss=0.1113, over 22828.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01057, ecapa_loss=0.0001563, whisper_loss=0.09215, over 3859928.22 frames. ], batch size: 92, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:01:39,615 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 23 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-14 05:01:47,729 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 33 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-14 05:01:50,059 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2491820.0, ans=0.125 2024-08-14 05:01:51,360 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 23 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-14 05:01:58,213 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 18 from LS+wenet, 11 from Vox, 34 fro AS 2024-08-14 05:02:21,863 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 21 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-14 05:02:52,921 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2492020.0, ans=0.125 2024-08-14 05:03:01,415 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-14 05:03:01,689 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2492120.0, ans=0.125 2024-08-14 05:03:20,967 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 2850, loss[loss=0.1181, beats_loss=0.01049, ecapa_loss=0.0001757, whisper_loss=0.1059, over 18199.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01057, ecapa_loss=0.0001555, whisper_loss=0.09206, over 3863765.35 frames. ], batch size: 73, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:03:21,544 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2492220.0, ans=0.125 2024-08-14 05:03:27,724 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 27 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-14 05:03:50,056 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2492420.0, ans=0.125 2024-08-14 05:03:57,465 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-14 05:04:06,100 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.786e+01 2.327e+01 2.505e+01 2.806e+01 7.430e+01, threshold=5.010e+01, percent-clipped=1.0 2024-08-14 05:04:08,434 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.64 vs. limit=15.0 2024-08-14 05:04:15,619 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 35 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-14 05:04:15,828 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2492520.0, ans=0.125 2024-08-14 05:04:37,432 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 2900, loss[loss=0.09893, beats_loss=0.01147, ecapa_loss=0.0001411, whisper_loss=0.08605, over 19162.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01062, ecapa_loss=0.0001562, whisper_loss=0.09181, over 3864321.68 frames. ], batch size: 76, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:04:41,728 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2492720.0, ans=0.125 2024-08-14 05:04:56,897 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2492820.0, ans=0.125 2024-08-14 05:05:09,783 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 35 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-14 05:05:15,904 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 21 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-14 05:05:22,269 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.23 vs. limit=15.0 2024-08-14 05:05:33,472 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 25 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-14 05:05:52,697 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 2950, loss[loss=0.09321, beats_loss=0.01337, ecapa_loss=0.0001175, whisper_loss=0.07866, over 17739.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01068, ecapa_loss=0.0001563, whisper_loss=0.09164, over 3864213.42 frames. ], batch size: 66, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:05:53,226 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2493220.0, ans=0.025 2024-08-14 05:05:55,738 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 13 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-14 05:06:02,525 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-14 05:06:17,991 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2493320.0, ans=0.125 2024-08-14 05:06:26,640 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=6.657e-02 2024-08-14 05:06:34,553 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.898e+01 2.417e+01 2.624e+01 2.963e+01 8.640e+01, threshold=5.248e+01, percent-clipped=1.0 2024-08-14 05:06:48,038 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2493620.0, ans=0.125 2024-08-14 05:06:54,674 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.49 vs. limit=6.0 2024-08-14 05:07:03,622 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 3000, loss[loss=0.112, beats_loss=0.011, ecapa_loss=0.0001326, whisper_loss=0.09964, over 19063.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.0107, ecapa_loss=0.0001569, whisper_loss=0.0916, over 3904692.13 frames. ], batch size: 73, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:07:03,623 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-14 05:07:44,631 INFO [train_multi_KD3.py:1149] (1/4) Epoch 18, validation on ASR_libri: loss=0.2518, beats_loss=0, ecapa_loss=0.0005463, whisper_loss=0.2464, over 922467.00 frames. 2024-08-14 05:08:00,105 INFO [train_multi_KD3.py:1149] (1/4) Epoch 18, validation on SV_voxceleb1: loss=0.004304, beats_loss=0, ecapa_loss=0.0004304, whisper_loss=0, over 939242.00 frames. 2024-08-14 05:10:04,623 INFO [train_multi_KD3.py:1149] (1/4) Epoch 18, validation on AT_audioset: loss=0.02354, beats_loss=0.02354, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 05:10:04,626 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 30639MB 2024-08-14 05:10:09,620 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2493720.0, ans=0.0 2024-08-14 05:10:10,742 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 17 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-14 05:10:20,656 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 15 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-14 05:10:23,393 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 29 from LS+wenet, 31 from Vox, 34 fro AS 2024-08-14 05:10:26,726 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-14 05:10:31,337 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2493820.0, ans=0.1 2024-08-14 05:10:35,179 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-14 05:10:40,139 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2493920.0, ans=0.125 2024-08-14 05:10:44,801 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2493920.0, ans=0.125 2024-08-14 05:10:45,901 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 27 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-14 05:11:17,154 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 3050, loss[loss=0.1017, beats_loss=0.01084, ecapa_loss=0.0001663, whisper_loss=0.08915, over 18662.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01076, ecapa_loss=0.0001568, whisper_loss=0.09158, over 3926230.63 frames. ], batch size: 75, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:11:27,317 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-14 05:11:32,232 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2494320.0, ans=0.0 2024-08-14 05:11:38,026 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.40 vs. limit=12.0 2024-08-14 05:11:43,066 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-14 05:11:59,066 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2494520.0, ans=0.125 2024-08-14 05:11:59,749 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.503e+01 2.783e+01 3.185e+01 5.631e+01, threshold=5.566e+01, percent-clipped=1.0 2024-08-14 05:12:07,300 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2494520.0, ans=0.0 2024-08-14 05:12:28,533 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 3100, loss[loss=0.09629, beats_loss=0.01086, ecapa_loss=0.0001851, whisper_loss=0.08358, over 20762.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01077, ecapa_loss=0.0001564, whisper_loss=0.09201, over 3924856.85 frames. ], batch size: 88, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:12:36,429 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.675e+01 2024-08-14 05:13:00,869 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.00 vs. limit=15.0 2024-08-14 05:13:02,003 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2494920.0, ans=0.0 2024-08-14 05:13:03,253 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2494920.0, ans=0.1 2024-08-14 05:13:06,050 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2494920.0, ans=0.0 2024-08-14 05:13:09,827 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-14 05:13:18,943 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-14 05:13:35,401 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2495120.0, ans=0.125 2024-08-14 05:13:40,888 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-14 05:13:42,083 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 3150, loss[loss=0.09512, beats_loss=0.01388, ecapa_loss=0.0001392, whisper_loss=0.07985, over 21496.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01085, ecapa_loss=0.0001566, whisper_loss=0.0916, over 3916468.50 frames. ], batch size: 89, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:13:49,435 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 24 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-14 05:14:08,433 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 22 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-14 05:14:18,716 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2495420.0, ans=0.0 2024-08-14 05:14:21,582 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 25 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-14 05:14:25,938 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.913e+01 2.378e+01 2.581e+01 2.876e+01 7.737e+01, threshold=5.161e+01, percent-clipped=2.0 2024-08-14 05:14:39,476 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 23 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-14 05:14:43,946 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 16 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-14 05:14:49,355 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.30 vs. limit=15.0 2024-08-14 05:14:52,900 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 18 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-14 05:14:55,824 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 3200, loss[loss=0.07683, beats_loss=0.01031, ecapa_loss=0.0001655, whisper_loss=0.06486, over 15194.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01088, ecapa_loss=0.000157, whisper_loss=0.09085, over 3903327.78 frames. ], batch size: 61, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:15:16,709 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2495820.0, ans=0.125 2024-08-14 05:15:18,922 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 24 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-14 05:15:29,644 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.87 vs. limit=15.0 2024-08-14 05:15:38,103 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2496020.0, ans=0.125 2024-08-14 05:16:08,359 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 3250, loss[loss=0.1284, beats_loss=0.007605, ecapa_loss=0.0001967, whisper_loss=0.1189, over 18287.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01082, ecapa_loss=0.0001578, whisper_loss=0.0907, over 3887535.30 frames. ], batch size: 71, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:16:21,875 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 16 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 05:16:23,284 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 11 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-14 05:16:29,739 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.25 vs. limit=15.0 2024-08-14 05:16:31,819 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-14 05:16:39,045 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2496420.0, ans=0.125 2024-08-14 05:16:43,447 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2496420.0, ans=0.125 2024-08-14 05:16:47,566 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-14 05:16:51,223 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.851e+01 2.408e+01 2.775e+01 3.145e+01 3.018e+02, threshold=5.551e+01, percent-clipped=3.0 2024-08-14 05:16:51,706 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2496520.0, ans=0.125 2024-08-14 05:16:57,797 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2496520.0, ans=0.125 2024-08-14 05:17:15,689 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2496620.0, ans=0.125 2024-08-14 05:17:17,094 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2496620.0, ans=0.125 2024-08-14 05:17:20,480 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 3300, loss[loss=0.0841, beats_loss=0.01175, ecapa_loss=0.00019, whisper_loss=0.07045, over 17971.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01085, ecapa_loss=0.0001581, whisper_loss=0.09009, over 3861835.26 frames. ], batch size: 78, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:17:21,277 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2496720.0, ans=0.125 2024-08-14 05:17:40,927 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.09 vs. limit=15.0 2024-08-14 05:18:00,127 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-14 05:18:06,759 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2497020.0, ans=0.125 2024-08-14 05:18:12,431 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2497020.0, ans=0.125 2024-08-14 05:18:17,861 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 16 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-14 05:18:27,313 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2497120.0, ans=0.125 2024-08-14 05:18:33,901 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 3350, loss[loss=0.09779, beats_loss=0.01052, ecapa_loss=0.0001465, whisper_loss=0.0858, over 16332.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01077, ecapa_loss=0.0001577, whisper_loss=0.09035, over 3873428.95 frames. ], batch size: 65, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:18:43,752 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2497220.0, ans=0.09899494936611666 2024-08-14 05:18:52,683 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.59 vs. limit=22.5 2024-08-14 05:18:55,254 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2497320.0, ans=0.125 2024-08-14 05:19:16,675 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 15 from LS+wenet, 9 from Vox, 31 fro AS 2024-08-14 05:19:17,822 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.798e+01 2.312e+01 2.517e+01 2.799e+01 4.556e+01, threshold=5.034e+01, percent-clipped=0.0 2024-08-14 05:19:33,955 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 25 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-14 05:19:35,713 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2497620.0, ans=0.125 2024-08-14 05:19:44,486 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 30 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-14 05:19:47,279 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 3400, loss[loss=0.1104, beats_loss=0.009562, ecapa_loss=0.0001783, whisper_loss=0.09904, over 18079.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01072, ecapa_loss=0.0001577, whisper_loss=0.09019, over 3870000.40 frames. ], batch size: 72, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:20:10,837 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2497820.0, ans=0.0 2024-08-14 05:20:21,114 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2497920.0, ans=0.0 2024-08-14 05:20:29,412 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.20 vs. limit=15.0 2024-08-14 05:20:30,754 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.82 vs. limit=15.0 2024-08-14 05:20:36,124 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.31 vs. limit=15.0 2024-08-14 05:20:38,826 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2498020.0, ans=0.1 2024-08-14 05:20:43,252 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.19 vs. limit=15.0 2024-08-14 05:20:44,001 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 32 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-14 05:20:45,235 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 33 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-14 05:20:52,870 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2498120.0, ans=0.125 2024-08-14 05:20:57,035 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 25 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-14 05:20:58,771 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2498220.0, ans=0.125 2024-08-14 05:20:59,433 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 3450, loss[loss=0.08944, beats_loss=0.01114, ecapa_loss=0.0001427, whisper_loss=0.07688, over 16378.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01073, ecapa_loss=0.0001585, whisper_loss=0.09016, over 3893150.12 frames. ], batch size: 66, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:21:00,325 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2498220.0, ans=0.125 2024-08-14 05:21:01,434 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 20 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-14 05:21:17,381 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2498320.0, ans=0.0 2024-08-14 05:21:17,417 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2498320.0, ans=0.125 2024-08-14 05:21:27,751 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 24 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-14 05:21:35,556 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2498420.0, ans=0.125 2024-08-14 05:21:39,665 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2498420.0, ans=0.2 2024-08-14 05:21:41,140 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2498420.0, ans=0.2 2024-08-14 05:21:43,424 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.938e+01 2.288e+01 2.702e+01 3.056e+01 2.683e+02, threshold=5.405e+01, percent-clipped=1.0 2024-08-14 05:21:44,067 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2498520.0, ans=0.0 2024-08-14 05:21:55,799 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 05:22:00,355 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 05:22:00,731 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.31 vs. limit=15.0 2024-08-14 05:22:12,758 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 3500, loss[loss=0.07645, beats_loss=0.01247, ecapa_loss=0.0001542, whisper_loss=0.06244, over 13517.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01073, ecapa_loss=0.0001586, whisper_loss=0.09029, over 3868071.16 frames. ], batch size: 54, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:22:13,913 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.33 vs. limit=15.0 2024-08-14 05:22:15,096 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2498720.0, ans=0.0 2024-08-14 05:22:33,083 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 22 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-14 05:22:35,941 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 22 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-14 05:22:36,304 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2498820.0, ans=0.125 2024-08-14 05:22:40,302 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 16 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-14 05:22:46,631 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2498920.0, ans=0.0 2024-08-14 05:22:54,971 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-14 05:22:56,293 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 35 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-14 05:23:01,198 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2499020.0, ans=0.0 2024-08-14 05:23:12,258 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 22 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-14 05:23:15,246 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 18 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-14 05:23:24,733 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2499220.0, ans=0.0 2024-08-14 05:23:25,421 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 3550, loss[loss=0.08194, beats_loss=0.01002, ecapa_loss=0.0001383, whisper_loss=0.07053, over 14959.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01072, ecapa_loss=0.0001575, whisper_loss=0.09036, over 3891869.02 frames. ], batch size: 56, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:23:27,431 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 24 from LS+wenet, 15 from Vox, 47 fro AS 2024-08-14 05:23:40,709 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2499320.0, ans=0.1 2024-08-14 05:23:41,788 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-14 05:23:42,066 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=2499320.0, ans=0.2 2024-08-14 05:23:46,936 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2499320.0, ans=0.2 2024-08-14 05:23:49,502 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 24 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-14 05:23:59,655 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 23 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-14 05:24:04,549 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2499420.0, ans=0.125 2024-08-14 05:24:10,033 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.813e+01 2.402e+01 2.607e+01 2.928e+01 5.339e+01, threshold=5.213e+01, percent-clipped=0.0 2024-08-14 05:24:32,230 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 26 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-14 05:24:37,069 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 17 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-14 05:24:39,708 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 3600, loss[loss=0.083, beats_loss=0.01497, ecapa_loss=0.0001181, whisper_loss=0.06685, over 21252.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01083, ecapa_loss=0.0001556, whisper_loss=0.08967, over 3911608.78 frames. ], batch size: 86, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:24:46,017 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2499720.0, ans=0.1 2024-08-14 05:24:46,579 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.18 vs. limit=12.0 2024-08-14 05:24:53,780 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2499820.0, ans=0.0 2024-08-14 05:25:16,587 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 29 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-14 05:25:25,539 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 24 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-14 05:25:37,146 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 17 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-14 05:25:53,219 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 3650, loss[loss=0.115, beats_loss=0.008723, ecapa_loss=0.0002062, whisper_loss=0.1042, over 20585.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01077, ecapa_loss=0.0001568, whisper_loss=0.09, over 3889366.92 frames. ], batch size: 84, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:25:54,889 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 26 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-14 05:26:12,584 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 21 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-14 05:26:22,043 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2500420.0, ans=0.0 2024-08-14 05:26:33,645 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 24 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-14 05:26:38,002 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.741e+01 2.437e+01 2.673e+01 3.010e+01 1.345e+02, threshold=5.347e+01, percent-clipped=1.0 2024-08-14 05:26:41,047 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 24 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-14 05:27:07,323 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 3700, loss[loss=0.1085, beats_loss=0.01105, ecapa_loss=0.0001769, whisper_loss=0.09571, over 20874.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01082, ecapa_loss=0.0001567, whisper_loss=0.09035, over 3906027.22 frames. ], batch size: 82, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:27:09,937 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.10 vs. limit=22.5 2024-08-14 05:27:21,185 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.46 vs. limit=15.0 2024-08-14 05:27:31,383 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.75 vs. limit=15.0 2024-08-14 05:27:46,734 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2500920.0, ans=0.1 2024-08-14 05:27:47,805 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 22 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-14 05:27:56,271 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 19 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-14 05:27:56,546 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2501020.0, ans=0.125 2024-08-14 05:28:00,201 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.54 vs. limit=15.0 2024-08-14 05:28:07,034 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2501120.0, ans=0.04949747468305833 2024-08-14 05:28:19,942 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 3750, loss[loss=0.0984, beats_loss=0.01089, ecapa_loss=0.0001759, whisper_loss=0.08575, over 22118.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01077, ecapa_loss=0.0001563, whisper_loss=0.09131, over 3911882.01 frames. ], batch size: 90, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:28:20,170 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 17 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-14 05:28:33,269 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 35 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-14 05:28:46,621 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 19 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-14 05:28:47,001 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2501320.0, ans=0.0 2024-08-14 05:28:48,125 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 11 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-14 05:28:53,749 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 16 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-14 05:28:54,709 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.93 vs. limit=22.5 2024-08-14 05:28:59,929 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2501420.0, ans=0.0 2024-08-14 05:29:03,667 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.720e+01 2.382e+01 2.609e+01 2.989e+01 8.009e+01, threshold=5.218e+01, percent-clipped=2.0 2024-08-14 05:29:08,714 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2501520.0, ans=0.2 2024-08-14 05:29:17,304 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2501620.0, ans=0.0 2024-08-14 05:29:17,350 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2501620.0, ans=0.0 2024-08-14 05:29:20,067 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 30 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-14 05:29:32,773 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 3800, loss[loss=0.09469, beats_loss=0.01367, ecapa_loss=0.0001449, whisper_loss=0.07957, over 19080.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01082, ecapa_loss=0.0001569, whisper_loss=0.09097, over 3903793.41 frames. ], batch size: 79, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:29:46,821 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2501820.0, ans=0.125 2024-08-14 05:29:52,562 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2501820.0, ans=0.125 2024-08-14 05:30:02,582 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 20 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-14 05:30:02,872 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=2501920.0, ans=0.025 2024-08-14 05:30:04,453 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2501920.0, ans=0.125 2024-08-14 05:30:05,488 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 21 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-14 05:30:21,843 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 21 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-14 05:30:22,296 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2502020.0, ans=0.1 2024-08-14 05:30:31,452 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.77 vs. limit=15.0 2024-08-14 05:30:34,085 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2502120.0, ans=0.2 2024-08-14 05:30:43,129 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.28 vs. limit=15.0 2024-08-14 05:30:46,314 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 3850, loss[loss=0.09717, beats_loss=0.01094, ecapa_loss=0.0001494, whisper_loss=0.08473, over 21768.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01085, ecapa_loss=0.0001569, whisper_loss=0.0907, over 3910386.80 frames. ], batch size: 89, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:30:58,412 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 26 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-14 05:31:15,636 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2502420.0, ans=0.125 2024-08-14 05:31:29,591 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.353e+01 2.523e+01 2.870e+01 4.680e+01, threshold=5.047e+01, percent-clipped=0.0 2024-08-14 05:31:32,619 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 18 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-14 05:31:34,682 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2502520.0, ans=0.0 2024-08-14 05:31:37,190 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 20 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-14 05:31:37,495 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2502520.0, ans=0.2 2024-08-14 05:31:38,662 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 21 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-14 05:31:48,323 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2502620.0, ans=0.0 2024-08-14 05:31:55,263 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2502620.0, ans=0.125 2024-08-14 05:31:55,311 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2502620.0, ans=0.0 2024-08-14 05:31:59,156 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 3900, loss[loss=0.1323, beats_loss=0.009532, ecapa_loss=0.0001771, whisper_loss=0.121, over 23413.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0108, ecapa_loss=0.0001589, whisper_loss=0.09062, over 3890800.38 frames. ], batch size: 89, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:32:12,991 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.473e-02 2024-08-14 05:32:17,061 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2502820.0, ans=0.1 2024-08-14 05:32:19,681 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 19 from LS+wenet, 9 from Vox, 26 fro AS 2024-08-14 05:32:21,449 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2502820.0, ans=0.125 2024-08-14 05:32:30,179 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2502920.0, ans=0.0 2024-08-14 05:32:31,455 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-14 05:32:32,796 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 17 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-14 05:32:33,176 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2502920.0, ans=0.04949747468305833 2024-08-14 05:32:34,620 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2502920.0, ans=0.125 2024-08-14 05:32:36,023 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2502920.0, ans=0.125 2024-08-14 05:33:09,173 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 35 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-14 05:33:11,191 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2503220.0, ans=0.0 2024-08-14 05:33:12,055 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 3950, loss[loss=0.07669, beats_loss=0.01541, ecapa_loss=0.0001225, whisper_loss=0.06005, over 19351.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01077, ecapa_loss=0.0001585, whisper_loss=0.09182, over 3920415.79 frames. ], batch size: 77, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:33:14,111 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2503220.0, ans=0.0 2024-08-14 05:33:15,436 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2503220.0, ans=0.2 2024-08-14 05:33:19,687 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 18 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-14 05:33:23,510 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 21 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-14 05:33:47,544 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2503420.0, ans=0.125 2024-08-14 05:33:55,872 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.026e+01 2.430e+01 2.817e+01 3.192e+01 2.202e+02, threshold=5.633e+01, percent-clipped=4.0 2024-08-14 05:33:59,665 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 05:34:02,374 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 25 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-14 05:34:07,255 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2503520.0, ans=0.04949747468305833 2024-08-14 05:34:14,237 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2503620.0, ans=0.125 2024-08-14 05:34:19,880 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 24 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-14 05:34:25,030 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 4000, loss[loss=0.127, beats_loss=0.009327, ecapa_loss=0.0001419, whisper_loss=0.1163, over 22936.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01072, ecapa_loss=0.000158, whisper_loss=0.09215, over 3917748.98 frames. ], batch size: 89, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:34:36,256 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.60 vs. limit=10.0 2024-08-14 05:34:40,247 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 20 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-14 05:35:11,187 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2504020.0, ans=0.0 2024-08-14 05:35:21,660 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.46 vs. limit=15.0 2024-08-14 05:35:31,525 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 25 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-14 05:35:38,628 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 4050, loss[loss=0.1185, beats_loss=0.01032, ecapa_loss=0.0001651, whisper_loss=0.1065, over 23102.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01067, ecapa_loss=0.0001591, whisper_loss=0.09221, over 3904418.72 frames. ], batch size: 92, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:35:40,273 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-14 05:36:00,761 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 32 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-14 05:36:14,392 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 21 from LS+wenet, 27 from Vox, 24 fro AS 2024-08-14 05:36:19,368 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.829e+01 2024-08-14 05:36:22,927 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.679e+01 2.286e+01 2.527e+01 2.897e+01 4.039e+01, threshold=5.053e+01, percent-clipped=0.0 2024-08-14 05:36:36,399 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 23 from LS+wenet, 9 from Vox, 24 fro AS 2024-08-14 05:36:38,464 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.85 vs. limit=12.0 2024-08-14 05:36:51,926 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 4100, loss[loss=0.08688, beats_loss=0.01466, ecapa_loss=0.0001266, whisper_loss=0.07095, over 15530.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01071, ecapa_loss=0.0001587, whisper_loss=0.09185, over 3882726.98 frames. ], batch size: 61, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:36:54,120 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2504720.0, ans=0.0 2024-08-14 05:37:08,153 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-14 05:37:08,760 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.89 vs. limit=15.0 2024-08-14 05:37:11,562 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2504820.0, ans=0.5 2024-08-14 05:37:25,987 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2504920.0, ans=0.125 2024-08-14 05:37:49,178 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2505120.0, ans=0.0 2024-08-14 05:37:52,584 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=2505120.0, ans=6.0 2024-08-14 05:38:04,938 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 4150, loss[loss=0.1005, beats_loss=0.0122, ecapa_loss=0.0001494, whisper_loss=0.08685, over 22561.00 frames. ], tot_loss[loss=0.104, beats_loss=0.0107, ecapa_loss=0.0001594, whisper_loss=0.09171, over 3890693.38 frames. ], batch size: 92, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:38:07,278 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.51 vs. limit=15.0 2024-08-14 05:38:08,291 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=2505220.0, ans=0.025 2024-08-14 05:38:18,686 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.61 vs. limit=22.5 2024-08-14 05:38:22,249 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 25 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-14 05:38:24,079 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2505320.0, ans=0.1 2024-08-14 05:38:32,029 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 25 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-14 05:38:32,418 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2505420.0, ans=0.125 2024-08-14 05:38:33,851 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2505420.0, ans=0.125 2024-08-14 05:38:37,911 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 19 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-14 05:38:49,809 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.935e+01 2.395e+01 2.659e+01 2.961e+01 5.291e+01, threshold=5.319e+01, percent-clipped=1.0 2024-08-14 05:38:54,529 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 21 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-14 05:38:54,797 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2505520.0, ans=0.0 2024-08-14 05:39:00,981 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2505520.0, ans=0.0 2024-08-14 05:39:12,538 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 27 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-14 05:39:17,892 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 4200, loss[loss=0.09973, beats_loss=0.0119, ecapa_loss=0.0001381, whisper_loss=0.08645, over 22226.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.0107, ecapa_loss=0.0001583, whisper_loss=0.09183, over 3906934.82 frames. ], batch size: 88, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:39:31,791 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2505820.0, ans=0.0 2024-08-14 05:39:33,470 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2505820.0, ans=0.125 2024-08-14 05:39:33,544 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2505820.0, ans=0.125 2024-08-14 05:39:36,651 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.29 vs. limit=15.0 2024-08-14 05:39:43,465 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2505820.0, ans=0.0 2024-08-14 05:39:52,268 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2505920.0, ans=0.125 2024-08-14 05:39:56,328 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-14 05:40:10,972 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 34 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-14 05:40:11,062 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2506020.0, ans=0.125 2024-08-14 05:40:21,276 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 18 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-14 05:40:28,535 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 21 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-14 05:40:31,254 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 4250, loss[loss=0.1104, beats_loss=0.01032, ecapa_loss=0.0001305, whisper_loss=0.09879, over 18413.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01067, ecapa_loss=0.0001574, whisper_loss=0.09191, over 3868808.06 frames. ], batch size: 70, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:40:45,296 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.15 vs. limit=15.0 2024-08-14 05:40:52,853 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 22 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-14 05:40:54,698 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2506320.0, ans=0.1 2024-08-14 05:40:59,037 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2506420.0, ans=0.0 2024-08-14 05:41:02,267 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2506420.0, ans=0.0 2024-08-14 05:41:05,370 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2506420.0, ans=0.1 2024-08-14 05:41:07,748 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 28 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-14 05:41:15,117 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 17 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-14 05:41:15,448 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2506520.0, ans=0.0 2024-08-14 05:41:15,518 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2506520.0, ans=0.125 2024-08-14 05:41:16,269 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.332e+01 2.542e+01 2.821e+01 5.499e+01, threshold=5.083e+01, percent-clipped=1.0 2024-08-14 05:41:21,095 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2506520.0, ans=0.125 2024-08-14 05:41:40,442 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 24 from LS+wenet, 17 from Vox, 16 fro AS 2024-08-14 05:41:44,628 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 4300, loss[loss=0.1261, beats_loss=0.007979, ecapa_loss=0.0001684, whisper_loss=0.1165, over 20972.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01056, ecapa_loss=0.0001579, whisper_loss=0.0921, over 3857914.60 frames. ], batch size: 80, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:42:12,090 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-14 05:42:22,504 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 05:42:55,249 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 05:42:58,370 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2507220.0, ans=0.125 2024-08-14 05:42:59,088 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 4350, loss[loss=0.09791, beats_loss=0.01396, ecapa_loss=0.0001243, whisper_loss=0.0827, over 22147.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01064, ecapa_loss=0.0001567, whisper_loss=0.09142, over 3868853.87 frames. ], batch size: 89, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:43:10,297 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.73 vs. limit=6.0 2024-08-14 05:43:14,445 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2507320.0, ans=0.125 2024-08-14 05:43:28,876 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2507420.0, ans=0.125 2024-08-14 05:43:35,282 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.25 vs. limit=22.5 2024-08-14 05:43:36,030 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 28 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-14 05:43:41,602 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-14 05:43:43,954 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.369e+01 2.648e+01 3.108e+01 4.930e+01, threshold=5.296e+01, percent-clipped=0.0 2024-08-14 05:43:51,653 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2507520.0, ans=0.1 2024-08-14 05:43:54,474 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-14 05:44:08,275 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 25 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-14 05:44:12,386 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 4400, loss[loss=0.06556, beats_loss=0.01535, ecapa_loss=0.0001001, whisper_loss=0.04922, over 15044.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01064, ecapa_loss=0.0001571, whisper_loss=0.09115, over 3873859.22 frames. ], batch size: 59, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:44:14,828 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.07 vs. limit=15.0 2024-08-14 05:44:50,360 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.01 vs. limit=15.0 2024-08-14 05:44:52,920 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2507920.0, ans=0.0 2024-08-14 05:45:06,712 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2508020.0, ans=0.05 2024-08-14 05:45:27,864 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 4450, loss[loss=0.1115, beats_loss=0.009882, ecapa_loss=0.0001584, whisper_loss=0.1001, over 21036.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01065, ecapa_loss=0.0001561, whisper_loss=0.0907, over 3882219.40 frames. ], batch size: 82, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:45:31,288 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 32 from Vox, 29 fro AS 2024-08-14 05:45:56,375 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2508420.0, ans=0.2 2024-08-14 05:45:56,454 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2508420.0, ans=0.05 2024-08-14 05:45:58,398 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.94 vs. limit=10.0 2024-08-14 05:46:04,786 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-14 05:46:10,600 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 19 from LS+wenet, 9 from Vox, 25 fro AS 2024-08-14 05:46:13,337 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.080e+01 2.408e+01 2.704e+01 3.117e+01 4.091e+01, threshold=5.407e+01, percent-clipped=0.0 2024-08-14 05:46:16,478 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 26 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-14 05:46:42,286 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 4500, loss[loss=0.0959, beats_loss=0.009531, ecapa_loss=0.0001575, whisper_loss=0.08479, over 14159.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01065, ecapa_loss=0.0001576, whisper_loss=0.09051, over 3863298.61 frames. ], batch size: 57, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:46:56,628 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2508820.0, ans=0.2 2024-08-14 05:46:58,161 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2508820.0, ans=0.125 2024-08-14 05:47:23,151 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2508920.0, ans=0.0 2024-08-14 05:47:32,991 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2509020.0, ans=0.07 2024-08-14 05:47:37,916 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2509020.0, ans=0.2 2024-08-14 05:47:57,086 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 17 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-14 05:48:01,704 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 4550, loss[loss=0.09144, beats_loss=0.01018, ecapa_loss=0.0002137, whisper_loss=0.07913, over 21581.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01066, ecapa_loss=0.0001583, whisper_loss=0.09105, over 3878794.03 frames. ], batch size: 93, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:48:07,297 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2509220.0, ans=0.04949747468305833 2024-08-14 05:48:16,175 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2509320.0, ans=0.2 2024-08-14 05:48:27,344 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 18 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-14 05:48:34,464 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2509420.0, ans=0.07 2024-08-14 05:48:42,126 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 20 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-14 05:48:46,432 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 16 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-14 05:48:51,397 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.941e+01 2.350e+01 2.581e+01 3.010e+01 9.450e+01, threshold=5.163e+01, percent-clipped=2.0 2024-08-14 05:48:53,861 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2509520.0, ans=0.1 2024-08-14 05:49:06,398 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.46 vs. limit=15.0 2024-08-14 05:49:20,648 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 4600, loss[loss=0.1039, beats_loss=0.01056, ecapa_loss=0.0001778, whisper_loss=0.09159, over 21808.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01075, ecapa_loss=0.0001584, whisper_loss=0.09055, over 3890731.42 frames. ], batch size: 92, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:49:26,300 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.74 vs. limit=15.0 2024-08-14 05:49:40,532 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2509820.0, ans=0.125 2024-08-14 05:49:57,320 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2509920.0, ans=0.125 2024-08-14 05:50:06,779 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-14 05:50:15,546 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2510020.0, ans=0.125 2024-08-14 05:50:18,853 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2510020.0, ans=0.125 2024-08-14 05:50:36,523 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 19 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-14 05:50:41,109 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 4650, loss[loss=0.0925, beats_loss=0.01055, ecapa_loss=0.0001719, whisper_loss=0.08023, over 20588.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01066, ecapa_loss=0.0001585, whisper_loss=0.09133, over 3887367.01 frames. ], batch size: 83, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:50:54,432 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-14 05:50:56,479 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2510320.0, ans=0.125 2024-08-14 05:50:57,986 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2510320.0, ans=0.0 2024-08-14 05:50:58,016 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2510320.0, ans=0.2 2024-08-14 05:50:58,038 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2510320.0, ans=0.0 2024-08-14 05:50:59,724 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2510320.0, ans=0.1 2024-08-14 05:51:07,719 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2510320.0, ans=0.015 2024-08-14 05:51:30,950 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.010e+01 2.365e+01 2.623e+01 2.877e+01 4.425e+01, threshold=5.246e+01, percent-clipped=0.0 2024-08-14 05:51:36,540 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.63 vs. limit=22.5 2024-08-14 05:51:41,507 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.64 vs. limit=12.0 2024-08-14 05:51:53,524 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2510620.0, ans=0.125 2024-08-14 05:52:00,609 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 4700, loss[loss=0.1073, beats_loss=0.01032, ecapa_loss=0.000154, whisper_loss=0.09548, over 15206.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01068, ecapa_loss=0.0001578, whisper_loss=0.09172, over 3888047.77 frames. ], batch size: 61, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:52:06,144 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 18 from LS+wenet, 19 from Vox, 17 fro AS 2024-08-14 05:52:21,423 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 22 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-14 05:52:37,224 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2510920.0, ans=0.1 2024-08-14 05:52:42,828 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 21 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-14 05:52:51,274 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2511020.0, ans=0.2 2024-08-14 05:53:08,657 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 23 from LS+wenet, 33 from Vox, 33 fro AS 2024-08-14 05:53:19,713 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 4750, loss[loss=0.1062, beats_loss=0.01131, ecapa_loss=0.0001636, whisper_loss=0.09328, over 23173.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01075, ecapa_loss=0.0001575, whisper_loss=0.09093, over 3889800.94 frames. ], batch size: 94, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:53:21,711 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 24 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-14 05:53:30,272 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2511220.0, ans=0.125 2024-08-14 05:54:08,784 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.355e+01 2.556e+01 2.982e+01 9.125e+01, threshold=5.113e+01, percent-clipped=1.0 2024-08-14 05:54:27,661 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2511620.0, ans=0.125 2024-08-14 05:54:39,589 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 4800, loss[loss=0.09548, beats_loss=0.01118, ecapa_loss=0.0001524, whisper_loss=0.08277, over 18255.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01077, ecapa_loss=0.0001575, whisper_loss=0.09021, over 3845475.29 frames. ], batch size: 74, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:54:46,631 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2511720.0, ans=0.125 2024-08-14 05:54:55,067 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2511820.0, ans=0.125 2024-08-14 05:55:28,649 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2512020.0, ans=0.1 2024-08-14 05:55:30,407 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2512020.0, ans=0.035 2024-08-14 05:55:33,916 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2512020.0, ans=0.125 2024-08-14 05:55:40,037 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2512020.0, ans=0.0 2024-08-14 05:55:52,159 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2512120.0, ans=0.125 2024-08-14 05:56:01,255 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 4850, loss[loss=0.1157, beats_loss=0.007154, ecapa_loss=0.0002162, whisper_loss=0.1064, over 17115.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01082, ecapa_loss=0.0001585, whisper_loss=0.09041, over 3847491.97 frames. ], batch size: 72, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:56:16,814 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2512320.0, ans=0.125 2024-08-14 05:56:19,687 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2512320.0, ans=0.2 2024-08-14 05:56:23,031 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2512320.0, ans=0.0 2024-08-14 05:56:36,493 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.45 vs. limit=12.0 2024-08-14 05:56:51,710 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.039e+01 2.430e+01 2.607e+01 2.996e+01 1.441e+02, threshold=5.214e+01, percent-clipped=2.0 2024-08-14 05:57:08,537 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 31 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-14 05:57:14,943 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.18 vs. limit=15.0 2024-08-14 05:57:22,310 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 4900, loss[loss=0.09285, beats_loss=0.01084, ecapa_loss=0.0001376, whisper_loss=0.08064, over 21935.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01082, ecapa_loss=0.0001569, whisper_loss=0.09072, over 3845934.62 frames. ], batch size: 88, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:57:26,114 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2512720.0, ans=0.125 2024-08-14 05:57:39,991 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=6.519e+01 2024-08-14 05:57:49,900 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.98 vs. limit=22.5 2024-08-14 05:58:02,773 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 22 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-14 05:58:13,027 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2513020.0, ans=0.1 2024-08-14 05:58:13,073 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2513020.0, ans=0.125 2024-08-14 05:58:15,316 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2513020.0, ans=0.125 2024-08-14 05:58:25,016 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-14 05:58:28,671 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2513120.0, ans=0.0 2024-08-14 05:58:40,347 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 4950, loss[loss=0.09632, beats_loss=0.01074, ecapa_loss=0.0001401, whisper_loss=0.08418, over 22041.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01076, ecapa_loss=0.0001573, whisper_loss=0.09039, over 3826139.34 frames. ], batch size: 88, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:58:42,189 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 21 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-14 05:58:52,972 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 17 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-14 05:58:55,936 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2513320.0, ans=0.0 2024-08-14 05:58:59,247 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2513320.0, ans=0.0 2024-08-14 05:59:08,570 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 16 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-14 05:59:29,307 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.690e+01 2.393e+01 2.657e+01 2.925e+01 4.625e+01, threshold=5.315e+01, percent-clipped=0.0 2024-08-14 05:59:44,998 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2513620.0, ans=0.04949747468305833 2024-08-14 05:59:47,614 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2513620.0, ans=0.09899494936611666 2024-08-14 05:59:53,422 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.14 vs. limit=15.0 2024-08-14 05:59:58,118 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 5000, loss[loss=0.09521, beats_loss=0.007866, ecapa_loss=0.0001933, whisper_loss=0.08541, over 13590.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01077, ecapa_loss=0.0001579, whisper_loss=0.09033, over 3823843.14 frames. ], batch size: 54, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:00:04,507 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 23 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-14 06:00:12,439 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-14 06:00:25,545 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2513820.0, ans=0.0 2024-08-14 06:00:54,996 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2514020.0, ans=0.0 2024-08-14 06:00:55,938 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 20 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-14 06:01:05,667 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2514120.0, ans=0.125 2024-08-14 06:01:15,653 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2514220.0, ans=0.125 2024-08-14 06:01:16,415 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 5050, loss[loss=0.08621, beats_loss=0.01052, ecapa_loss=0.000181, whisper_loss=0.07388, over 18570.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01094, ecapa_loss=0.0001578, whisper_loss=0.08977, over 3845443.11 frames. ], batch size: 78, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:01:42,222 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2514320.0, ans=0.125 2024-08-14 06:01:59,742 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2514420.0, ans=0.0 2024-08-14 06:02:03,391 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 21 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-14 06:02:05,068 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.930e+01 2.401e+01 2.602e+01 2.912e+01 4.134e+01, threshold=5.204e+01, percent-clipped=0.0 2024-08-14 06:02:18,172 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 06:02:18,286 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=2514620.0, ans=0.1 2024-08-14 06:02:24,450 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2514620.0, ans=0.2 2024-08-14 06:02:25,745 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2514620.0, ans=0.1 2024-08-14 06:02:26,951 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2514620.0, ans=0.0 2024-08-14 06:02:31,465 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 24 from LS+wenet, 13 from Vox, 19 fro AS 2024-08-14 06:02:34,684 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 5100, loss[loss=0.08683, beats_loss=0.01112, ecapa_loss=0.0001926, whisper_loss=0.07378, over 14595.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0109, ecapa_loss=0.0001569, whisper_loss=0.09081, over 3879600.86 frames. ], batch size: 62, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:03:00,214 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-14 06:03:16,456 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 19 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-14 06:03:23,657 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2515020.0, ans=0.2 2024-08-14 06:03:29,437 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2515020.0, ans=0.125 2024-08-14 06:03:29,459 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2515020.0, ans=0.07 2024-08-14 06:03:53,591 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 5150, loss[loss=0.09646, beats_loss=0.01191, ecapa_loss=0.0001419, whisper_loss=0.08313, over 21884.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01089, ecapa_loss=0.0001559, whisper_loss=0.09038, over 3849773.84 frames. ], batch size: 88, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:04:05,003 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 21 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-14 06:04:07,036 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.61 vs. limit=15.0 2024-08-14 06:04:10,525 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2515320.0, ans=0.125 2024-08-14 06:04:36,181 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2515420.0, ans=0.07 2024-08-14 06:04:37,723 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2515420.0, ans=0.125 2024-08-14 06:04:42,454 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.426e+01 2.661e+01 3.204e+01 7.186e+01, threshold=5.323e+01, percent-clipped=2.0 2024-08-14 06:04:45,608 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 26 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-14 06:04:47,351 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 16 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-14 06:04:55,569 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.42 vs. limit=12.0 2024-08-14 06:05:12,755 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 5200, loss[loss=0.09101, beats_loss=0.01161, ecapa_loss=0.0001433, whisper_loss=0.07797, over 22187.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01089, ecapa_loss=0.0001549, whisper_loss=0.09044, over 3838087.58 frames. ], batch size: 88, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:05:13,239 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2515720.0, ans=0.125 2024-08-14 06:05:13,277 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2515720.0, ans=0.0 2024-08-14 06:05:15,346 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.46 vs. limit=22.5 2024-08-14 06:05:16,290 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2515720.0, ans=0.0 2024-08-14 06:05:19,357 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-14 06:05:31,088 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2515820.0, ans=0.0 2024-08-14 06:05:36,146 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2515820.0, ans=0.125 2024-08-14 06:05:43,932 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2515920.0, ans=0.125 2024-08-14 06:05:52,496 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2515920.0, ans=0.0 2024-08-14 06:05:56,660 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 21 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-14 06:06:16,334 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.91 vs. limit=22.5 2024-08-14 06:06:32,160 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 5250, loss[loss=0.1089, beats_loss=0.009527, ecapa_loss=0.0001553, whisper_loss=0.09781, over 17340.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01084, ecapa_loss=0.0001564, whisper_loss=0.09067, over 3837513.98 frames. ], batch size: 65, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:06:34,574 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2516220.0, ans=0.0 2024-08-14 06:06:46,378 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2516320.0, ans=0.1 2024-08-14 06:06:47,738 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-14 06:06:48,138 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2516320.0, ans=0.125 2024-08-14 06:06:51,872 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2516320.0, ans=0.0 2024-08-14 06:07:08,596 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 22 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-14 06:07:12,890 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2516420.0, ans=0.125 2024-08-14 06:07:21,420 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.751e+01 2.375e+01 2.671e+01 2.925e+01 9.126e+01, threshold=5.343e+01, percent-clipped=1.0 2024-08-14 06:07:30,959 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 15 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-14 06:07:31,756 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2516520.0, ans=0.0 2024-08-14 06:07:38,057 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2516620.0, ans=0.125 2024-08-14 06:07:52,381 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 5300, loss[loss=0.1152, beats_loss=0.01152, ecapa_loss=0.0001345, whisper_loss=0.1024, over 23675.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01081, ecapa_loss=0.0001562, whisper_loss=0.09081, over 3847789.55 frames. ], batch size: 92, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:08:03,826 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2516720.0, ans=0.1 2024-08-14 06:08:03,890 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 06:08:04,831 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 17 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-14 06:08:12,311 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2516820.0, ans=0.125 2024-08-14 06:08:17,196 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2516820.0, ans=0.0 2024-08-14 06:08:19,107 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=15.18 vs. limit=15.0 2024-08-14 06:08:20,204 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2516820.0, ans=0.125 2024-08-14 06:08:31,484 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2516920.0, ans=0.0 2024-08-14 06:08:31,532 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2516920.0, ans=0.125 2024-08-14 06:08:32,956 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 06:09:01,030 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 23 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-14 06:09:04,077 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 19 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-14 06:09:08,223 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.42 vs. limit=15.0 2024-08-14 06:09:12,400 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 5350, loss[loss=0.09419, beats_loss=0.01264, ecapa_loss=0.0001392, whisper_loss=0.08016, over 16624.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01085, ecapa_loss=0.0001551, whisper_loss=0.0904, over 3871447.26 frames. ], batch size: 67, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:09:17,567 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.52 vs. limit=6.0 2024-08-14 06:09:20,344 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2517220.0, ans=0.0 2024-08-14 06:09:21,978 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.18 vs. limit=15.0 2024-08-14 06:09:25,973 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 28 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-14 06:09:28,507 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2517320.0, ans=0.125 2024-08-14 06:09:32,862 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2517320.0, ans=0.0 2024-08-14 06:09:33,867 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 16 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-14 06:09:44,940 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 26 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-14 06:09:49,311 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2517420.0, ans=0.05 2024-08-14 06:09:54,087 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2517420.0, ans=0.2 2024-08-14 06:10:01,944 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.731e+01 2.326e+01 2.604e+01 3.065e+01 1.793e+02, threshold=5.208e+01, percent-clipped=2.0 2024-08-14 06:10:24,798 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2517620.0, ans=0.0 2024-08-14 06:10:29,906 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2517620.0, ans=0.2 2024-08-14 06:10:32,413 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 5400, loss[loss=0.1124, beats_loss=0.009203, ecapa_loss=0.0001553, whisper_loss=0.1017, over 22293.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01078, ecapa_loss=0.0001552, whisper_loss=0.09096, over 3900579.91 frames. ], batch size: 87, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:10:32,993 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2517720.0, ans=0.125 2024-08-14 06:11:04,453 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2517920.0, ans=0.0 2024-08-14 06:11:06,590 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2517920.0, ans=0.125 2024-08-14 06:11:06,593 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2517920.0, ans=0.0 2024-08-14 06:11:12,202 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2517920.0, ans=0.2 2024-08-14 06:11:44,740 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 23 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-14 06:11:45,145 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2518120.0, ans=0.125 2024-08-14 06:11:46,500 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 24 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-14 06:11:50,927 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2518220.0, ans=0.0 2024-08-14 06:11:51,718 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 5450, loss[loss=0.09074, beats_loss=0.01234, ecapa_loss=0.0001581, whisper_loss=0.07682, over 14511.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01079, ecapa_loss=0.0001561, whisper_loss=0.09108, over 3901777.39 frames. ], batch size: 59, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:11:52,208 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2518220.0, ans=0.2 2024-08-14 06:12:19,280 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 28 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-14 06:12:41,603 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.014e+01 2.369e+01 2.569e+01 2.930e+01 1.155e+02, threshold=5.138e+01, percent-clipped=3.0 2024-08-14 06:12:46,305 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 24 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-14 06:12:54,104 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2518620.0, ans=0.125 2024-08-14 06:13:10,370 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 5500, loss[loss=0.1111, beats_loss=0.009991, ecapa_loss=0.0001718, whisper_loss=0.09939, over 19667.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01085, ecapa_loss=0.0001563, whisper_loss=0.09091, over 3904560.03 frames. ], batch size: 76, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:13:23,652 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2518720.0, ans=0.0 2024-08-14 06:13:37,192 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2518820.0, ans=0.1 2024-08-14 06:13:54,846 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2518920.0, ans=0.1 2024-08-14 06:14:13,653 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.46 vs. limit=6.0 2024-08-14 06:14:30,655 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 5550, loss[loss=0.1264, beats_loss=0.009249, ecapa_loss=0.0001598, whisper_loss=0.1156, over 23915.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01082, ecapa_loss=0.0001558, whisper_loss=0.09107, over 3918999.46 frames. ], batch size: 92, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:14:50,979 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 20 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-14 06:14:51,493 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2519320.0, ans=0.125 2024-08-14 06:14:51,547 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2519320.0, ans=0.0 2024-08-14 06:15:21,737 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.883e+01 2.291e+01 2.517e+01 2.810e+01 6.286e+01, threshold=5.034e+01, percent-clipped=1.0 2024-08-14 06:15:24,194 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.74 vs. limit=15.0 2024-08-14 06:15:28,375 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.80 vs. limit=15.0 2024-08-14 06:15:37,900 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 06:15:41,304 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.49 vs. limit=15.0 2024-08-14 06:15:41,907 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 24 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-14 06:15:43,700 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2519620.0, ans=0.125 2024-08-14 06:15:50,520 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 5600, loss[loss=0.1131, beats_loss=0.009398, ecapa_loss=0.0001649, whisper_loss=0.102, over 19715.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01083, ecapa_loss=0.0001559, whisper_loss=0.09125, over 3945924.16 frames. ], batch size: 72, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:15:53,689 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.18 vs. limit=15.0 2024-08-14 06:15:58,918 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2519720.0, ans=0.125 2024-08-14 06:16:10,307 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2519820.0, ans=0.125 2024-08-14 06:16:22,823 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 21 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-14 06:16:27,452 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 19 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-14 06:16:40,601 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 29 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-14 06:16:42,394 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 26 from LS+wenet, 19 from Vox, 15 fro AS 2024-08-14 06:16:51,937 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2520020.0, ans=0.1 2024-08-14 06:17:00,902 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-14 06:17:10,364 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 5650, loss[loss=0.089, beats_loss=0.01085, ecapa_loss=0.0001607, whisper_loss=0.07655, over 18819.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01079, ecapa_loss=0.0001572, whisper_loss=0.09123, over 3929925.10 frames. ], batch size: 76, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:17:14,067 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2520220.0, ans=0.0 2024-08-14 06:17:19,274 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.00 vs. limit=15.0 2024-08-14 06:17:39,611 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2520320.0, ans=0.125 2024-08-14 06:18:00,645 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.971e+01 2.353e+01 2.635e+01 2.874e+01 6.701e+01, threshold=5.270e+01, percent-clipped=1.0 2024-08-14 06:18:08,604 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.27 vs. limit=15.0 2024-08-14 06:18:32,405 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 5700, loss[loss=0.1096, beats_loss=0.0102, ecapa_loss=0.0001275, whisper_loss=0.09809, over 19551.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01073, ecapa_loss=0.0001566, whisper_loss=0.09227, over 3980629.58 frames. ], batch size: 73, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:18:40,631 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 18 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-14 06:18:44,983 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-14 06:18:49,731 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-14 06:18:51,040 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 32 from Vox, 29 fro AS 2024-08-14 06:18:56,222 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2520820.0, ans=0.125 2024-08-14 06:18:59,197 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2520820.0, ans=0.1 2024-08-14 06:19:05,135 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2520920.0, ans=0.125 2024-08-14 06:19:13,308 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2520920.0, ans=0.1 2024-08-14 06:19:29,734 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 14 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-14 06:19:34,711 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2521020.0, ans=0.1 2024-08-14 06:19:52,978 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 5750, loss[loss=0.1099, beats_loss=0.009598, ecapa_loss=0.0001444, whisper_loss=0.0989, over 15537.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01072, ecapa_loss=0.000159, whisper_loss=0.09174, over 3943387.16 frames. ], batch size: 57, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:19:58,188 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2521220.0, ans=0.125 2024-08-14 06:20:00,457 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2521220.0, ans=0.0 2024-08-14 06:20:05,707 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 16 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-14 06:20:15,022 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 28 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-14 06:20:41,188 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.761e+01 2.372e+01 2.640e+01 2.859e+01 6.893e+01, threshold=5.281e+01, percent-clipped=1.0 2024-08-14 06:21:12,403 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 5800, loss[loss=0.0999, beats_loss=0.01208, ecapa_loss=0.0001447, whisper_loss=0.08637, over 21597.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01078, ecapa_loss=0.0001591, whisper_loss=0.09064, over 3921041.14 frames. ], batch size: 87, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:21:24,014 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.61 vs. limit=15.0 2024-08-14 06:21:48,033 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.73 vs. limit=6.0 2024-08-14 06:22:02,813 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-14 06:22:04,728 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2522020.0, ans=0.0 2024-08-14 06:22:17,020 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2522120.0, ans=0.07 2024-08-14 06:22:17,669 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.51 vs. limit=10.0 2024-08-14 06:22:22,789 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2522120.0, ans=0.125 2024-08-14 06:22:26,926 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 5850, loss[loss=0.1024, beats_loss=0.009268, ecapa_loss=0.0001611, whisper_loss=0.09152, over 17431.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01085, ecapa_loss=0.0001571, whisper_loss=0.09037, over 3943483.66 frames. ], batch size: 69, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:22:31,037 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 36 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-14 06:22:31,595 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2522220.0, ans=0.0 2024-08-14 06:22:37,038 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 23 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-14 06:22:45,215 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 22 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-14 06:23:05,597 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2522420.0, ans=0.0 2024-08-14 06:23:10,953 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.913e+01 2.428e+01 2.673e+01 2.941e+01 3.816e+01, threshold=5.346e+01, percent-clipped=0.0 2024-08-14 06:23:15,387 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 20 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-14 06:23:19,596 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 22 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-14 06:23:28,243 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 26 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-14 06:23:38,387 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 5900, loss[loss=0.1107, beats_loss=0.009036, ecapa_loss=0.0001477, whisper_loss=0.1002, over 16872.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0108, ecapa_loss=0.0001579, whisper_loss=0.09025, over 3911415.20 frames. ], batch size: 63, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:23:41,162 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 27 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-14 06:23:54,437 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2522820.0, ans=0.0 2024-08-14 06:24:01,294 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2522820.0, ans=0.0 2024-08-14 06:24:11,155 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.63 vs. limit=22.5 2024-08-14 06:24:12,006 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2522920.0, ans=0.125 2024-08-14 06:24:24,865 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2523020.0, ans=0.125 2024-08-14 06:24:33,114 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.10 vs. limit=15.0 2024-08-14 06:24:39,174 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-14 06:24:47,696 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 5950, loss[loss=0.1111, beats_loss=0.01213, ecapa_loss=0.000171, whisper_loss=0.09723, over 21540.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01088, ecapa_loss=0.0001574, whisper_loss=0.08964, over 3888901.23 frames. ], batch size: 89, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:24:50,981 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2523220.0, ans=0.2 2024-08-14 06:24:59,169 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2523220.0, ans=0.125 2024-08-14 06:25:11,698 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2523320.0, ans=0.1 2024-08-14 06:25:13,001 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2523320.0, ans=0.2 2024-08-14 06:25:15,654 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2523420.0, ans=0.0 2024-08-14 06:25:19,731 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2523420.0, ans=0.0 2024-08-14 06:25:21,133 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2523420.0, ans=0.2 2024-08-14 06:25:29,388 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 14 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-14 06:25:30,604 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.949e+01 2.432e+01 2.806e+01 3.149e+01 6.455e+01, threshold=5.612e+01, percent-clipped=2.0 2024-08-14 06:25:43,105 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2523620.0, ans=0.1 2024-08-14 06:25:55,671 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 25 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-14 06:25:56,762 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 6000, loss[loss=0.1098, beats_loss=0.008408, ecapa_loss=0.0001849, whisper_loss=0.09953, over 18795.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01082, ecapa_loss=0.0001581, whisper_loss=0.09012, over 3872709.73 frames. ], batch size: 76, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:25:56,763 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-14 06:26:36,882 INFO [train_multi_KD3.py:1149] (1/4) Epoch 18, validation on ASR_libri: loss=0.2513, beats_loss=0, ecapa_loss=0.0005424, whisper_loss=0.2459, over 922467.00 frames. 2024-08-14 06:26:55,861 INFO [train_multi_KD3.py:1149] (1/4) Epoch 18, validation on SV_voxceleb1: loss=0.004393, beats_loss=0, ecapa_loss=0.0004393, whisper_loss=0, over 939242.00 frames. 2024-08-14 06:28:56,618 INFO [train_multi_KD3.py:1149] (1/4) Epoch 18, validation on AT_audioset: loss=0.02347, beats_loss=0.02347, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 06:28:56,621 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 30639MB 2024-08-14 06:29:23,268 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2523920.0, ans=0.0 2024-08-14 06:29:24,579 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2523920.0, ans=0.0 2024-08-14 06:29:56,059 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2524120.0, ans=0.2 2024-08-14 06:30:05,660 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 6050, loss[loss=0.08629, beats_loss=0.01327, ecapa_loss=0.0001107, whisper_loss=0.07191, over 20312.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0108, ecapa_loss=0.0001575, whisper_loss=0.09062, over 3861411.81 frames. ], batch size: 81, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:30:22,262 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-14 06:30:24,719 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.48 vs. limit=22.5 2024-08-14 06:30:41,124 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=2524420.0, ans=0.95 2024-08-14 06:30:41,253 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2524420.0, ans=0.2 2024-08-14 06:30:49,362 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.623e+01 2.348e+01 2.542e+01 2.875e+01 5.513e+01, threshold=5.084e+01, percent-clipped=0.0 2024-08-14 06:30:51,404 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2524520.0, ans=0.125 2024-08-14 06:30:56,106 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 25 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-14 06:31:03,580 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2524620.0, ans=0.125 2024-08-14 06:31:10,181 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2524620.0, ans=0.5 2024-08-14 06:31:15,167 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 6100, loss[loss=0.09909, beats_loss=0.007479, ecapa_loss=0.0002015, whisper_loss=0.08959, over 13226.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01079, ecapa_loss=0.0001575, whisper_loss=0.09013, over 3838927.14 frames. ], batch size: 54, lr: 3.48e-03, grad_scale: 1.152921504606847e+18 2024-08-14 06:31:22,307 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 33 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-14 06:31:23,540 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 25 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-14 06:31:25,996 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=14.45 vs. limit=15.0 2024-08-14 06:31:52,347 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 17 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-14 06:31:53,659 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 12 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-14 06:32:07,509 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 21 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-14 06:32:09,572 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2525020.0, ans=0.0 2024-08-14 06:32:21,031 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2525120.0, ans=0.0 2024-08-14 06:32:23,148 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 12 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-14 06:32:25,691 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 6150, loss[loss=0.08272, beats_loss=0.01297, ecapa_loss=0.0001549, whisper_loss=0.0682, over 18037.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01079, ecapa_loss=0.0001592, whisper_loss=0.0903, over 3874425.80 frames. ], batch size: 73, lr: 3.48e-03, grad_scale: 1.152921504606847e+18 2024-08-14 06:32:36,795 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2525220.0, ans=0.125 2024-08-14 06:32:44,154 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2525320.0, ans=0.1 2024-08-14 06:33:10,113 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.296e+01 2.588e+01 2.950e+01 9.161e+01, threshold=5.175e+01, percent-clipped=1.0 2024-08-14 06:33:34,756 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 25 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-14 06:33:37,957 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 6200, loss[loss=0.1054, beats_loss=0.01161, ecapa_loss=0.0001512, whisper_loss=0.09223, over 21919.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01083, ecapa_loss=0.0001591, whisper_loss=0.08984, over 3856527.30 frames. ], batch size: 87, lr: 3.48e-03, grad_scale: 1.152921504606847e+18 2024-08-14 06:33:47,447 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2525720.0, ans=0.0 2024-08-14 06:33:58,850 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-14 06:33:59,549 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.08 vs. limit=15.0 2024-08-14 06:34:04,556 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2525820.0, ans=0.125 2024-08-14 06:34:04,816 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.11 vs. limit=15.0 2024-08-14 06:34:11,460 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2525920.0, ans=0.125 2024-08-14 06:34:16,134 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2525920.0, ans=0.125 2024-08-14 06:34:29,291 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2526020.0, ans=0.125 2024-08-14 06:34:31,209 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2526020.0, ans=0.0 2024-08-14 06:34:34,318 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2526020.0, ans=0.125 2024-08-14 06:34:39,063 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2526120.0, ans=0.125 2024-08-14 06:34:39,309 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.91 vs. limit=6.0 2024-08-14 06:34:40,231 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 20 from LS+wenet, 11 from Vox, 38 fro AS 2024-08-14 06:34:43,522 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 26 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-14 06:34:43,777 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2526120.0, ans=0.125 2024-08-14 06:34:54,183 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 6250, loss[loss=0.09827, beats_loss=0.01176, ecapa_loss=0.0001315, whisper_loss=0.08519, over 16678.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01082, ecapa_loss=0.0001594, whisper_loss=0.08982, over 3855096.80 frames. ], batch size: 64, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:35:20,287 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 25 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-14 06:35:31,541 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 32 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-14 06:35:37,114 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 33 from LS+wenet, 8 from Vox, 15 fro AS 2024-08-14 06:35:44,381 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.485e+01 2.719e+01 3.146e+01 4.092e+01, threshold=5.438e+01, percent-clipped=0.0 2024-08-14 06:35:54,182 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 26 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-14 06:36:08,358 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2526620.0, ans=0.2 2024-08-14 06:36:12,041 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 6300, loss[loss=0.09437, beats_loss=0.01177, ecapa_loss=0.0001304, whisper_loss=0.0813, over 22836.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0108, ecapa_loss=0.0001584, whisper_loss=0.08999, over 3867650.86 frames. ], batch size: 89, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:36:28,838 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.06 vs. limit=15.0 2024-08-14 06:36:35,541 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.23 vs. limit=6.0 2024-08-14 06:36:50,643 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2526920.0, ans=0.125 2024-08-14 06:37:02,442 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 26 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-14 06:37:26,804 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2527120.0, ans=0.125 2024-08-14 06:37:28,543 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2527120.0, ans=0.09899494936611666 2024-08-14 06:37:30,654 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 6350, loss[loss=0.09407, beats_loss=0.01306, ecapa_loss=0.0001777, whisper_loss=0.07923, over 21289.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01082, ecapa_loss=0.0001592, whisper_loss=0.09029, over 3874670.61 frames. ], batch size: 93, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:37:31,656 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2527220.0, ans=0.125 2024-08-14 06:37:36,788 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.17 vs. limit=6.0 2024-08-14 06:37:59,978 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 14 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-14 06:38:01,912 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 26 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-14 06:38:11,800 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2527420.0, ans=0.0 2024-08-14 06:38:19,893 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.875e+01 2.286e+01 2.522e+01 2.892e+01 3.872e+01, threshold=5.043e+01, percent-clipped=0.0 2024-08-14 06:38:27,161 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2527520.0, ans=0.0 2024-08-14 06:38:28,952 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2527520.0, ans=0.5 2024-08-14 06:38:41,976 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 29 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-14 06:38:46,727 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 24 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-14 06:38:47,912 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 6400, loss[loss=0.09726, beats_loss=0.01331, ecapa_loss=0.0001335, whisper_loss=0.08261, over 22628.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01078, ecapa_loss=0.0001606, whisper_loss=0.09008, over 3862792.06 frames. ], batch size: 91, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:38:57,663 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.04 vs. limit=22.5 2024-08-14 06:39:23,640 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 31 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-14 06:39:37,839 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2528020.0, ans=0.125 2024-08-14 06:39:49,452 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.70 vs. limit=15.0 2024-08-14 06:40:06,603 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 6450, loss[loss=0.09416, beats_loss=0.00928, ecapa_loss=0.0001818, whisper_loss=0.08307, over 19809.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01079, ecapa_loss=0.0001597, whisper_loss=0.09031, over 3874302.62 frames. ], batch size: 80, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:40:39,107 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-14 06:40:56,960 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.003e+01 2.368e+01 2.657e+01 3.046e+01 7.930e+01, threshold=5.314e+01, percent-clipped=1.0 2024-08-14 06:41:07,167 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 06:41:18,925 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2528620.0, ans=0.0 2024-08-14 06:41:24,190 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 6500, loss[loss=0.1114, beats_loss=0.01112, ecapa_loss=0.0001969, whisper_loss=0.09836, over 22046.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01073, ecapa_loss=0.0001588, whisper_loss=0.09093, over 3924465.10 frames. ], batch size: 91, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:41:25,049 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2528720.0, ans=0.09899494936611666 2024-08-14 06:41:35,426 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.05 vs. limit=15.0 2024-08-14 06:41:52,625 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-14 06:41:58,685 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 28 from LS+wenet, 7 from Vox, 19 fro AS 2024-08-14 06:42:00,315 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2528920.0, ans=0.125 2024-08-14 06:42:26,164 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2529120.0, ans=0.2 2024-08-14 06:42:40,862 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2529120.0, ans=0.09899494936611666 2024-08-14 06:42:43,613 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 6550, loss[loss=0.09727, beats_loss=0.01057, ecapa_loss=0.0001775, whisper_loss=0.08493, over 21330.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01074, ecapa_loss=0.0001588, whisper_loss=0.09157, over 3954078.07 frames. ], batch size: 88, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:43:14,039 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 24 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-14 06:43:35,261 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.448e+01 2.627e+01 2.898e+01 7.209e+01, threshold=5.254e+01, percent-clipped=1.0 2024-08-14 06:43:35,666 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-14 06:43:49,612 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2529620.0, ans=0.09899494936611666 2024-08-14 06:43:58,346 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2529620.0, ans=0.2 2024-08-14 06:44:05,894 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 6600, loss[loss=0.1084, beats_loss=0.01017, ecapa_loss=0.0001275, whisper_loss=0.097, over 17977.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01068, ecapa_loss=0.0001587, whisper_loss=0.09209, over 3947457.35 frames. ], batch size: 68, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:44:06,768 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2529720.0, ans=0.125 2024-08-14 06:44:11,904 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2529720.0, ans=0.125 2024-08-14 06:44:15,007 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 23 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-14 06:44:29,189 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 28 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-14 06:44:32,839 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 25 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-14 06:44:43,094 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 24 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-14 06:44:52,645 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.85 vs. limit=15.0 2024-08-14 06:44:55,339 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.10 vs. limit=15.0 2024-08-14 06:44:58,382 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.94 vs. limit=15.0 2024-08-14 06:45:00,778 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2530020.0, ans=0.1 2024-08-14 06:45:05,993 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.38 vs. limit=22.5 2024-08-14 06:45:21,937 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 24 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-14 06:45:23,839 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 06:45:28,178 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 6650, loss[loss=0.1198, beats_loss=0.009402, ecapa_loss=0.0001377, whisper_loss=0.1091, over 25036.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01066, ecapa_loss=0.0001578, whisper_loss=0.09247, over 3977985.18 frames. ], batch size: 93, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:45:31,005 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.37 vs. limit=15.0 2024-08-14 06:45:34,306 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.26 vs. limit=12.0 2024-08-14 06:46:20,581 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.361e+01 2.583e+01 2.896e+01 3.977e+01, threshold=5.167e+01, percent-clipped=0.0 2024-08-14 06:46:25,494 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 30 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-14 06:46:30,529 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2530520.0, ans=0.1 2024-08-14 06:46:48,225 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 6700, loss[loss=0.08771, beats_loss=0.01373, ecapa_loss=0.0001598, whisper_loss=0.07238, over 20577.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01063, ecapa_loss=0.000157, whisper_loss=0.09272, over 3947322.63 frames. ], batch size: 87, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:47:00,261 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 23 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-14 06:47:19,320 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 30 from LS+wenet, 32 from Vox, 30 fro AS 2024-08-14 06:47:20,942 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 24 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-14 06:47:33,381 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2530920.0, ans=0.0 2024-08-14 06:47:44,568 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 30 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-14 06:48:03,713 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 20 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-14 06:48:12,466 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2531120.0, ans=0.0 2024-08-14 06:48:16,239 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 6750, loss[loss=0.101, beats_loss=0.01101, ecapa_loss=0.0001196, whisper_loss=0.08882, over 21233.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01055, ecapa_loss=0.0001581, whisper_loss=0.093, over 3964304.83 frames. ], batch size: 80, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:48:20,366 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 24 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-14 06:48:39,812 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=25.58 vs. limit=15.0 2024-08-14 06:48:48,625 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2531420.0, ans=0.125 2024-08-14 06:49:01,730 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2531420.0, ans=0.0 2024-08-14 06:49:04,449 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2531520.0, ans=0.0 2024-08-14 06:49:08,034 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.900e+01 2.298e+01 2.539e+01 2.885e+01 4.400e+01, threshold=5.079e+01, percent-clipped=0.0 2024-08-14 06:49:11,467 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2531520.0, ans=0.125 2024-08-14 06:49:34,501 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2531620.0, ans=0.0 2024-08-14 06:49:39,204 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 6800, loss[loss=0.115, beats_loss=0.007533, ecapa_loss=0.0001723, whisper_loss=0.1058, over 16553.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01053, ecapa_loss=0.0001581, whisper_loss=0.09293, over 3980254.83 frames. ], batch size: 64, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:49:48,115 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2531720.0, ans=0.0 2024-08-14 06:49:55,306 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2531820.0, ans=0.125 2024-08-14 06:50:39,979 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2532020.0, ans=0.0 2024-08-14 06:50:54,633 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.06 vs. limit=15.0 2024-08-14 06:50:59,356 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2532120.0, ans=0.125 2024-08-14 06:51:08,658 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 18 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-14 06:51:09,771 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 6850, loss[loss=0.09036, beats_loss=0.01133, ecapa_loss=0.0001695, whisper_loss=0.07733, over 16799.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01062, ecapa_loss=0.000158, whisper_loss=0.09216, over 3962281.46 frames. ], batch size: 67, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:51:14,670 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2532220.0, ans=0.07 2024-08-14 06:51:14,681 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2532220.0, ans=0.0 2024-08-14 06:51:29,956 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 14 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-14 06:51:45,337 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2532420.0, ans=0.0 2024-08-14 06:51:53,804 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.60 vs. limit=15.0 2024-08-14 06:52:02,247 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2532520.0, ans=0.0 2024-08-14 06:52:02,257 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2532520.0, ans=0.125 2024-08-14 06:52:03,061 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.338e+01 2.591e+01 2.972e+01 6.425e+01, threshold=5.181e+01, percent-clipped=1.0 2024-08-14 06:52:03,208 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-14 06:52:40,377 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 6900, loss[loss=0.09419, beats_loss=0.01114, ecapa_loss=0.0001205, whisper_loss=0.08185, over 17303.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01068, ecapa_loss=0.0001586, whisper_loss=0.09136, over 3899721.61 frames. ], batch size: 66, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:53:07,805 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.66 vs. limit=15.0 2024-08-14 06:53:21,178 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.33 vs. limit=6.0 2024-08-14 06:53:23,392 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2532920.0, ans=0.2 2024-08-14 06:53:36,840 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2532920.0, ans=0.04949747468305833 2024-08-14 06:53:45,345 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2533020.0, ans=0.2 2024-08-14 06:54:08,542 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-14 06:54:17,050 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 28 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-14 06:54:22,590 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2533120.0, ans=0.125 2024-08-14 06:54:30,200 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 6950, loss[loss=0.08757, beats_loss=0.01356, ecapa_loss=0.0001155, whisper_loss=0.07285, over 23343.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01076, ecapa_loss=0.0001574, whisper_loss=0.0911, over 3911156.03 frames. ], batch size: 92, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:54:31,040 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2533220.0, ans=0.125 2024-08-14 06:54:35,003 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2533220.0, ans=0.0 2024-08-14 06:55:02,903 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 16 from LS+wenet, 10 from Vox, 27 fro AS 2024-08-14 06:55:17,632 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 23 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-14 06:55:18,945 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2533420.0, ans=0.0 2024-08-14 06:55:26,684 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.00 vs. limit=15.0 2024-08-14 06:55:27,255 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 28 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-14 06:55:32,711 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 27 from LS+wenet, 32 from Vox, 26 fro AS 2024-08-14 06:55:41,201 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.379e+01 2.570e+01 2.940e+01 3.906e+01, threshold=5.139e+01, percent-clipped=0.0 2024-08-14 06:55:57,483 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_na.min_abs, batch_count=2533620.0, ans=0.02 2024-08-14 06:56:20,527 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 7000, loss[loss=0.1146, beats_loss=0.01046, ecapa_loss=0.0001589, whisper_loss=0.1026, over 23732.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01075, ecapa_loss=0.0001589, whisper_loss=0.09142, over 3895573.29 frames. ], batch size: 90, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:56:32,784 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 23 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-14 06:56:53,056 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 21 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-14 06:56:53,811 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.11 vs. limit=15.0 2024-08-14 06:57:12,316 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2533920.0, ans=0.1 2024-08-14 06:57:35,545 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.67 vs. limit=15.0 2024-08-14 06:57:37,056 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2534020.0, ans=0.0 2024-08-14 06:57:48,519 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 06:58:01,125 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 7050, loss[loss=0.1069, beats_loss=0.0104, ecapa_loss=0.0001415, whisper_loss=0.09506, over 19674.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01064, ecapa_loss=0.0001599, whisper_loss=0.09198, over 3875020.72 frames. ], batch size: 77, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:58:15,329 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2534320.0, ans=0.0 2024-08-14 06:58:33,020 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2534420.0, ans=0.125 2024-08-14 06:58:48,173 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.840e+01 2.340e+01 2.637e+01 3.082e+01 1.011e+02, threshold=5.275e+01, percent-clipped=1.0 2024-08-14 06:59:06,066 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.74 vs. limit=15.0 2024-08-14 06:59:09,085 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2534620.0, ans=0.125 2024-08-14 06:59:13,999 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 7100, loss[loss=0.121, beats_loss=0.009197, ecapa_loss=0.0001406, whisper_loss=0.1104, over 15877.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01066, ecapa_loss=0.0001589, whisper_loss=0.0923, over 3882970.57 frames. ], batch size: 58, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:59:23,293 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 27 from LS+wenet, 16 from Vox, 50 fro AS 2024-08-14 06:59:25,532 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2534720.0, ans=0.07 2024-08-14 06:59:28,678 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2534820.0, ans=0.0 2024-08-14 06:59:56,113 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.35 vs. limit=6.0 2024-08-14 07:00:08,956 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.95 vs. limit=22.5 2024-08-14 07:00:10,167 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.24 vs. limit=10.0 2024-08-14 07:00:14,985 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 32 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-14 07:00:29,899 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 26 from LS+wenet, 10 from Vox, 35 fro AS 2024-08-14 07:00:31,574 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 7150, loss[loss=0.1109, beats_loss=0.01172, ecapa_loss=0.0001232, whisper_loss=0.09791, over 18783.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01071, ecapa_loss=0.0001588, whisper_loss=0.092, over 3872303.59 frames. ], batch size: 71, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:00:32,010 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2535220.0, ans=0.2 2024-08-14 07:00:37,507 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0 2024-08-14 07:00:40,968 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-14 07:00:56,430 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2535320.0, ans=0.125 2024-08-14 07:00:57,628 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 20 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-14 07:01:14,836 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2535420.0, ans=0.125 2024-08-14 07:01:20,075 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.290e+01 2.562e+01 2.920e+01 7.577e+01, threshold=5.124e+01, percent-clipped=1.0 2024-08-14 07:01:47,510 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 7200, loss[loss=0.09786, beats_loss=0.01201, ecapa_loss=0.0001582, whisper_loss=0.08427, over 16479.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01071, ecapa_loss=0.0001587, whisper_loss=0.09219, over 3902162.92 frames. ], batch size: 66, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:01:52,278 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 33 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-14 07:01:52,625 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2535720.0, ans=0.2 2024-08-14 07:01:53,895 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2535720.0, ans=0.125 2024-08-14 07:01:54,155 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.54 vs. limit=15.0 2024-08-14 07:02:02,846 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 23 from LS+wenet, 7 from Vox, 30 fro AS 2024-08-14 07:02:43,522 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2536020.0, ans=0.09899494936611666 2024-08-14 07:02:47,083 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 20 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-14 07:02:51,673 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2536120.0, ans=0.2 2024-08-14 07:03:03,829 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2536220.0, ans=0.125 2024-08-14 07:03:03,852 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2536220.0, ans=0.1 2024-08-14 07:03:04,625 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 7250, loss[loss=0.1135, beats_loss=0.008703, ecapa_loss=0.0001727, whisper_loss=0.1031, over 22346.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01065, ecapa_loss=0.0001574, whisper_loss=0.09308, over 3949822.42 frames. ], batch size: 91, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:03:25,193 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2536320.0, ans=0.09899494936611666 2024-08-14 07:03:32,744 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.32 vs. limit=15.0 2024-08-14 07:03:42,436 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.84 vs. limit=15.0 2024-08-14 07:03:55,221 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.008e+01 2.431e+01 2.606e+01 2.894e+01 4.565e+01, threshold=5.211e+01, percent-clipped=0.0 2024-08-14 07:04:11,372 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 07:04:15,554 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2536620.0, ans=0.1 2024-08-14 07:04:20,118 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2536620.0, ans=0.0 2024-08-14 07:04:22,896 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 7300, loss[loss=0.09339, beats_loss=0.01152, ecapa_loss=0.0001907, whisper_loss=0.07997, over 21077.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01064, ecapa_loss=0.0001588, whisper_loss=0.09259, over 3926634.43 frames. ], batch size: 91, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:04:24,960 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2536720.0, ans=0.125 2024-08-14 07:04:35,763 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2536720.0, ans=0.125 2024-08-14 07:04:37,961 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=2536820.0, ans=15.0 2024-08-14 07:04:42,740 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 16 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-14 07:05:18,108 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2537020.0, ans=0.125 2024-08-14 07:05:21,578 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2537020.0, ans=0.1 2024-08-14 07:05:29,747 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 25 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-14 07:05:38,172 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 7350, loss[loss=0.1119, beats_loss=0.00863, ecapa_loss=0.0001783, whisper_loss=0.1015, over 21666.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01066, ecapa_loss=0.0001588, whisper_loss=0.09166, over 3902137.79 frames. ], batch size: 87, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:05:44,080 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2537220.0, ans=0.0 2024-08-14 07:05:50,935 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-14 07:05:55,698 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 07:06:01,402 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2537320.0, ans=0.125 2024-08-14 07:06:06,102 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 23 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-14 07:06:06,482 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2537320.0, ans=0.1 2024-08-14 07:06:07,997 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2537420.0, ans=0.125 2024-08-14 07:06:26,862 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.875e+01 2.465e+01 2.701e+01 2.899e+01 2.044e+02, threshold=5.402e+01, percent-clipped=2.0 2024-08-14 07:06:35,027 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2537520.0, ans=0.125 2024-08-14 07:06:53,450 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 29 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-14 07:06:54,425 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 7400, loss[loss=0.1224, beats_loss=0.008637, ecapa_loss=0.0001835, whisper_loss=0.112, over 19795.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01063, ecapa_loss=0.0001586, whisper_loss=0.09177, over 3885357.21 frames. ], batch size: 75, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:07:10,939 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2537820.0, ans=0.1 2024-08-14 07:07:14,574 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2537820.0, ans=0.125 2024-08-14 07:07:14,658 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2537820.0, ans=0.1 2024-08-14 07:07:17,216 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2537820.0, ans=0.125 2024-08-14 07:07:27,889 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2537920.0, ans=0.1 2024-08-14 07:07:37,622 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 32 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-14 07:07:43,479 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 26 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-14 07:07:58,387 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.52 vs. limit=15.0 2024-08-14 07:08:01,000 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2538120.0, ans=0.2 2024-08-14 07:08:08,636 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 17 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-14 07:08:12,617 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 7450, loss[loss=0.1085, beats_loss=0.009315, ecapa_loss=0.000201, whisper_loss=0.09715, over 19605.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01073, ecapa_loss=0.0001585, whisper_loss=0.09137, over 3945182.71 frames. ], batch size: 81, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:08:18,696 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2538220.0, ans=0.1 2024-08-14 07:08:27,807 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-14 07:09:05,383 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.838e+01 2.389e+01 2.665e+01 3.000e+01 5.031e+01, threshold=5.329e+01, percent-clipped=0.0 2024-08-14 07:09:31,424 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2538620.0, ans=0.0 2024-08-14 07:09:33,860 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 7500, loss[loss=0.0905, beats_loss=0.01363, ecapa_loss=0.0001414, whisper_loss=0.07546, over 18659.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01071, ecapa_loss=0.0001581, whisper_loss=0.09131, over 3914412.99 frames. ], batch size: 75, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:09:34,762 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 26 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-14 07:09:35,044 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2538720.0, ans=0.125 2024-08-14 07:09:43,152 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2538720.0, ans=0.0 2024-08-14 07:09:52,092 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2538820.0, ans=0.0 2024-08-14 07:10:00,980 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2538820.0, ans=0.125 2024-08-14 07:10:23,092 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.62 vs. limit=15.0 2024-08-14 07:10:34,702 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2539020.0, ans=0.125 2024-08-14 07:10:39,278 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2539120.0, ans=0.1 2024-08-14 07:10:40,954 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2539120.0, ans=0.125 2024-08-14 07:10:43,499 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 18 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-14 07:10:54,717 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 7550, loss[loss=0.1273, beats_loss=0.0077, ecapa_loss=0.0001516, whisper_loss=0.1181, over 16970.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01074, ecapa_loss=0.000158, whisper_loss=0.09099, over 3901666.44 frames. ], batch size: 62, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:10:56,383 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 24 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-14 07:11:05,209 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 20 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-14 07:11:07,226 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2539220.0, ans=0.0 2024-08-14 07:11:24,637 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.99 vs. limit=15.0 2024-08-14 07:11:26,120 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.21 vs. limit=15.0 2024-08-14 07:11:28,403 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 36 from LS+wenet, 11 from Vox, 44 fro AS 2024-08-14 07:11:46,480 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.883e+01 2.298e+01 2.593e+01 2.946e+01 4.435e+01, threshold=5.186e+01, percent-clipped=0.0 2024-08-14 07:11:55,588 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.41 vs. limit=22.5 2024-08-14 07:12:04,867 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2539620.0, ans=0.0 2024-08-14 07:12:08,997 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2539620.0, ans=0.0 2024-08-14 07:12:15,442 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 7600, loss[loss=0.1121, beats_loss=0.008659, ecapa_loss=0.0001732, whisper_loss=0.1017, over 17312.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01067, ecapa_loss=0.0001577, whisper_loss=0.09148, over 3866889.84 frames. ], batch size: 67, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:12:28,246 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2539720.0, ans=0.0 2024-08-14 07:12:41,954 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2539820.0, ans=0.0 2024-08-14 07:12:43,335 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 18 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-14 07:12:49,606 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2539920.0, ans=0.07 2024-08-14 07:13:06,098 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 27 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-14 07:13:13,600 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=2540020.0, ans=10.0 2024-08-14 07:13:33,818 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 7650, loss[loss=0.1054, beats_loss=0.009836, ecapa_loss=0.0001487, whisper_loss=0.09411, over 20049.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01068, ecapa_loss=0.0001575, whisper_loss=0.09106, over 3859291.14 frames. ], batch size: 79, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:13:37,126 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=2540220.0, ans=15.0 2024-08-14 07:13:38,486 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 07:13:39,922 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2540220.0, ans=0.125 2024-08-14 07:13:46,144 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 19 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-14 07:13:57,447 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2540320.0, ans=0.125 2024-08-14 07:13:58,712 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2540320.0, ans=0.2 2024-08-14 07:14:12,856 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 28 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-14 07:14:24,130 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 28 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-14 07:14:25,152 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.824e+01 2.299e+01 2.494e+01 2.918e+01 5.997e+01, threshold=4.989e+01, percent-clipped=1.0 2024-08-14 07:14:28,196 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 37 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-14 07:14:34,492 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 12 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-14 07:14:53,652 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 7700, loss[loss=0.09319, beats_loss=0.0109, ecapa_loss=0.0001477, whisper_loss=0.08081, over 22610.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01066, ecapa_loss=0.0001575, whisper_loss=0.09097, over 3877192.54 frames. ], batch size: 93, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:15:08,526 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2540820.0, ans=0.0 2024-08-14 07:15:08,877 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.71 vs. limit=12.0 2024-08-14 07:15:28,847 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2540920.0, ans=0.125 2024-08-14 07:15:32,295 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2540920.0, ans=0.0 2024-08-14 07:15:53,197 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2541020.0, ans=0.0 2024-08-14 07:15:57,326 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-14 07:15:57,668 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2541120.0, ans=0.125 2024-08-14 07:15:58,985 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2541120.0, ans=0.125 2024-08-14 07:15:58,990 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2541120.0, ans=0.0 2024-08-14 07:16:13,433 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 7750, loss[loss=0.1095, beats_loss=0.01023, ecapa_loss=0.0001857, whisper_loss=0.09745, over 19020.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01074, ecapa_loss=0.000157, whisper_loss=0.09104, over 3897531.69 frames. ], batch size: 77, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:16:29,727 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2541320.0, ans=0.1 2024-08-14 07:16:41,742 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2541320.0, ans=0.0 2024-08-14 07:16:44,200 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 19 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-14 07:16:52,261 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 34 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-14 07:17:02,098 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-14 07:17:04,253 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.893e+01 2.379e+01 2.592e+01 2.812e+01 4.047e+01, threshold=5.183e+01, percent-clipped=0.0 2024-08-14 07:17:06,848 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.89 vs. limit=22.5 2024-08-14 07:17:15,071 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2541620.0, ans=0.04949747468305833 2024-08-14 07:17:19,005 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 27 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-14 07:17:20,331 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-14 07:17:23,489 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 14 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-14 07:17:27,005 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.09 vs. limit=15.0 2024-08-14 07:17:30,263 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 7800, loss[loss=0.09185, beats_loss=0.01058, ecapa_loss=0.0001758, whisper_loss=0.07951, over 16324.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0107, ecapa_loss=0.0001567, whisper_loss=0.09129, over 3895412.18 frames. ], batch size: 66, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:18:07,411 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2541920.0, ans=0.125 2024-08-14 07:18:12,069 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.393e+00 2024-08-14 07:18:22,099 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 24 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-14 07:18:31,689 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2542120.0, ans=0.125 2024-08-14 07:18:36,571 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 18 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-14 07:18:41,916 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.08 vs. limit=15.0 2024-08-14 07:18:43,543 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 7850, loss[loss=0.1216, beats_loss=0.01027, ecapa_loss=0.0001501, whisper_loss=0.1098, over 22874.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01066, ecapa_loss=0.0001574, whisper_loss=0.09166, over 3872044.24 frames. ], batch size: 91, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:18:43,676 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 29 from LS+wenet, 31 from Vox, 27 fro AS 2024-08-14 07:18:49,836 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2542220.0, ans=0.125 2024-08-14 07:19:01,306 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2542320.0, ans=0.125 2024-08-14 07:19:15,583 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2542420.0, ans=0.2 2024-08-14 07:19:18,477 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2542420.0, ans=0.0 2024-08-14 07:19:29,267 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.969e+01 2.387e+01 2.602e+01 2.902e+01 1.105e+02, threshold=5.203e+01, percent-clipped=1.0 2024-08-14 07:19:43,799 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2542620.0, ans=0.0 2024-08-14 07:19:46,525 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2542620.0, ans=0.0 2024-08-14 07:19:54,405 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 7900, loss[loss=0.08821, beats_loss=0.01059, ecapa_loss=0.0001275, whisper_loss=0.07635, over 22472.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.0107, ecapa_loss=0.0001577, whisper_loss=0.09124, over 3868429.35 frames. ], batch size: 87, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:20:00,441 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2542720.0, ans=0.125 2024-08-14 07:20:03,368 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2542720.0, ans=0.0 2024-08-14 07:20:08,031 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2542820.0, ans=0.1 2024-08-14 07:20:12,205 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2542820.0, ans=0.0 2024-08-14 07:20:18,243 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.48 vs. limit=15.0 2024-08-14 07:20:28,855 WARNING [optim.py:496] (1/4) Scaling gradients by 0.04889056831598282, model_norm_threshold=52.03104019165039 2024-08-14 07:20:29,022 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.0.norm.log_scale with proportion 0.15, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.668e+05, grad_sumsq=1.668e+05, orig_rms_sq=1.000e+00 2024-08-14 07:20:34,184 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2542920.0, ans=0.125 2024-08-14 07:20:39,540 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 15 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-14 07:20:41,936 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.92 vs. limit=22.5 2024-08-14 07:20:48,373 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 18 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-14 07:20:58,605 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2543120.0, ans=0.125 2024-08-14 07:21:00,927 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 29 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-14 07:21:06,576 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 7950, loss[loss=0.1085, beats_loss=0.01123, ecapa_loss=0.0001511, whisper_loss=0.09579, over 19877.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.0107, ecapa_loss=0.0001573, whisper_loss=0.09183, over 3879940.21 frames. ], batch size: 78, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:21:26,435 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2543320.0, ans=0.2 2024-08-14 07:21:28,093 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2543320.0, ans=0.125 2024-08-14 07:21:35,172 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-14 07:21:45,383 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 18 from LS+wenet, 10 from Vox, 34 fro AS 2024-08-14 07:21:50,286 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2543520.0, ans=0.1 2024-08-14 07:21:54,290 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.818e+01 2.373e+01 2.668e+01 3.252e+01 1.064e+03, threshold=5.336e+01, percent-clipped=2.0 2024-08-14 07:22:19,729 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 8000, loss[loss=0.1183, beats_loss=0.009821, ecapa_loss=0.0001696, whisper_loss=0.1068, over 22967.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01068, ecapa_loss=0.0001573, whisper_loss=0.09203, over 3911710.60 frames. ], batch size: 91, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:22:20,667 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.55 vs. limit=15.0 2024-08-14 07:22:56,846 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 36 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-14 07:22:58,960 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2543920.0, ans=0.125 2024-08-14 07:23:12,192 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2544020.0, ans=0.1 2024-08-14 07:23:14,230 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.17 vs. limit=6.0 2024-08-14 07:23:29,703 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2544120.0, ans=0.1 2024-08-14 07:23:33,339 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 8050, loss[loss=0.1106, beats_loss=0.009794, ecapa_loss=0.0001532, whisper_loss=0.09929, over 18694.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01067, ecapa_loss=0.000158, whisper_loss=0.09188, over 3907689.99 frames. ], batch size: 75, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:23:39,905 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2544220.0, ans=0.125 2024-08-14 07:23:43,701 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 37 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-14 07:23:48,318 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 19 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-14 07:24:11,832 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2544420.0, ans=0.2 2024-08-14 07:24:13,109 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2544420.0, ans=0.2 2024-08-14 07:24:16,802 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.66 vs. limit=10.0 2024-08-14 07:24:20,013 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.420e+01 2.579e+01 3.062e+01 1.369e+02, threshold=5.158e+01, percent-clipped=2.0 2024-08-14 07:24:22,012 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.48 vs. limit=15.0 2024-08-14 07:24:34,715 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.164e+01 2024-08-14 07:24:45,542 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 8100, loss[loss=0.08346, beats_loss=0.01459, ecapa_loss=0.0001097, whisper_loss=0.06777, over 21341.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01066, ecapa_loss=0.000158, whisper_loss=0.09159, over 3889471.82 frames. ], batch size: 86, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:24:54,739 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.95 vs. limit=12.0 2024-08-14 07:25:02,642 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 28 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-14 07:25:05,725 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2544820.0, ans=0.0 2024-08-14 07:25:13,168 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2544920.0, ans=0.0 2024-08-14 07:25:18,793 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2544920.0, ans=0.125 2024-08-14 07:25:20,214 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2544920.0, ans=0.0 2024-08-14 07:25:28,759 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2545020.0, ans=0.1 2024-08-14 07:25:32,646 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2545020.0, ans=0.0 2024-08-14 07:25:35,558 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=6.016e+00 2024-08-14 07:25:42,667 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.30 vs. limit=22.5 2024-08-14 07:25:50,871 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2545120.0, ans=0.2 2024-08-14 07:25:55,749 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 8150, loss[loss=0.08631, beats_loss=0.01219, ecapa_loss=0.0001687, whisper_loss=0.07243, over 22041.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01068, ecapa_loss=0.0001593, whisper_loss=0.09113, over 3889884.62 frames. ], batch size: 92, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:26:08,933 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2545320.0, ans=0.125 2024-08-14 07:26:12,656 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 23 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-14 07:26:26,530 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.40 vs. limit=10.0 2024-08-14 07:26:33,203 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 07:26:41,166 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.944e+01 2.412e+01 2.672e+01 3.051e+01 4.273e+01, threshold=5.344e+01, percent-clipped=0.0 2024-08-14 07:26:57,044 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2545620.0, ans=0.125 2024-08-14 07:26:59,563 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-14 07:27:06,479 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 8200, loss[loss=0.1252, beats_loss=0.008686, ecapa_loss=0.0001644, whisper_loss=0.1149, over 23695.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01065, ecapa_loss=0.0001588, whisper_loss=0.09127, over 3944521.64 frames. ], batch size: 93, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:27:06,592 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 15 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-14 07:27:09,424 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-14 07:27:22,150 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.66 vs. limit=15.0 2024-08-14 07:27:22,567 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 24 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-14 07:27:32,820 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 32 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-14 07:27:42,400 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 23 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-14 07:28:01,799 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 07:28:03,985 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 24 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-14 07:28:14,890 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.69 vs. limit=10.0 2024-08-14 07:28:18,251 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 8250, loss[loss=0.08878, beats_loss=0.0119, ecapa_loss=0.0001316, whisper_loss=0.07557, over 17723.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01066, ecapa_loss=0.0001582, whisper_loss=0.09111, over 3928546.19 frames. ], batch size: 70, lr: 3.47e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:28:18,886 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2546220.0, ans=0.1 2024-08-14 07:28:33,555 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 18 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-14 07:28:34,183 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.27 vs. limit=15.0 2024-08-14 07:28:41,628 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.85 vs. limit=15.0 2024-08-14 07:28:42,960 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.53 vs. limit=15.0 2024-08-14 07:28:48,395 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.74 vs. limit=15.0 2024-08-14 07:28:51,961 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 42 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-14 07:28:55,392 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.89 vs. limit=15.0 2024-08-14 07:29:03,853 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.865e+01 2.321e+01 2.631e+01 2.915e+01 1.588e+02, threshold=5.262e+01, percent-clipped=1.0 2024-08-14 07:29:23,584 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2546620.0, ans=0.125 2024-08-14 07:29:28,356 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2546620.0, ans=0.125 2024-08-14 07:29:32,255 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 8300, loss[loss=0.09155, beats_loss=0.00818, ecapa_loss=0.0001489, whisper_loss=0.08188, over 15146.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0107, ecapa_loss=0.0001576, whisper_loss=0.09079, over 3938653.44 frames. ], batch size: 55, lr: 3.47e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:29:32,634 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 21 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-14 07:29:48,838 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=15.30 vs. limit=15.0 2024-08-14 07:29:50,001 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.66 vs. limit=15.0 2024-08-14 07:29:55,463 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2546820.0, ans=0.125 2024-08-14 07:30:10,985 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2546920.0, ans=0.125 2024-08-14 07:30:11,987 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 33 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-14 07:30:17,467 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2546920.0, ans=0.1 2024-08-14 07:30:44,388 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 17 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-14 07:30:48,532 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 8350, loss[loss=0.09893, beats_loss=0.01216, ecapa_loss=0.0001412, whisper_loss=0.08536, over 21004.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01073, ecapa_loss=0.0001567, whisper_loss=0.09106, over 3910175.43 frames. ], batch size: 85, lr: 3.47e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:30:53,290 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2547220.0, ans=0.0 2024-08-14 07:30:53,292 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2547220.0, ans=0.0 2024-08-14 07:31:18,686 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.17 vs. limit=15.0 2024-08-14 07:31:22,035 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 21 from LS+wenet, 32 from Vox, 37 fro AS 2024-08-14 07:31:35,333 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.881e+01 2.329e+01 2.543e+01 2.806e+01 3.860e+01, threshold=5.087e+01, percent-clipped=0.0 2024-08-14 07:31:56,512 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2547620.0, ans=0.125 2024-08-14 07:32:01,782 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 8400, loss[loss=0.1095, beats_loss=0.01065, ecapa_loss=0.000144, whisper_loss=0.09739, over 23636.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01075, ecapa_loss=0.0001557, whisper_loss=0.09158, over 3931053.99 frames. ], batch size: 92, lr: 3.47e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:32:29,935 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 19 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-14 07:32:32,829 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 20 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-14 07:32:47,827 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 23 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-14 07:32:48,545 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.46 vs. limit=15.0 2024-08-14 07:33:03,026 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2548120.0, ans=0.07 2024-08-14 07:33:13,438 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=2548220.0, ans=6.0 2024-08-14 07:33:13,944 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 8450, loss[loss=0.1107, beats_loss=0.01077, ecapa_loss=0.0001685, whisper_loss=0.09826, over 22839.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01073, ecapa_loss=0.0001559, whisper_loss=0.09131, over 3912541.24 frames. ], batch size: 92, lr: 3.47e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:33:15,437 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 23 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-14 07:33:41,577 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 13 from Vox, 45 fro AS 2024-08-14 07:33:59,486 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.934e+01 2.373e+01 2.582e+01 3.046e+01 4.610e+01, threshold=5.164e+01, percent-clipped=0.0 2024-08-14 07:34:17,559 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2548620.0, ans=0.125 2024-08-14 07:34:25,521 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 8500, loss[loss=0.09215, beats_loss=0.01282, ecapa_loss=0.0001353, whisper_loss=0.07798, over 14503.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01077, ecapa_loss=0.0001551, whisper_loss=0.09112, over 3923968.71 frames. ], batch size: 57, lr: 3.47e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:34:40,234 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2548820.0, ans=0.0 2024-08-14 07:34:53,675 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 24 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-14 07:35:07,120 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.62 vs. limit=22.5 2024-08-14 07:35:26,678 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2549120.0, ans=0.07 2024-08-14 07:35:30,710 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2549120.0, ans=0.125 2024-08-14 07:35:36,229 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 8550, loss[loss=0.1219, beats_loss=0.00682, ecapa_loss=0.0001426, whisper_loss=0.1136, over 15402.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01069, ecapa_loss=0.000157, whisper_loss=0.09155, over 3915662.65 frames. ], batch size: 54, lr: 3.47e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:35:45,166 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2549220.0, ans=0.0 2024-08-14 07:35:46,206 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 23 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-14 07:35:50,817 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2549320.0, ans=0.2 2024-08-14 07:36:07,897 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 32 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-14 07:36:22,557 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.968e+01 2.413e+01 2.608e+01 2.932e+01 1.178e+02, threshold=5.217e+01, percent-clipped=2.0 2024-08-14 07:36:23,402 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2549520.0, ans=0.125 2024-08-14 07:36:25,094 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2549520.0, ans=0.0 2024-08-14 07:36:31,344 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 14 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-14 07:36:44,299 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.13 vs. limit=12.0 2024-08-14 07:36:47,770 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 33 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-14 07:36:50,408 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 8600, loss[loss=0.1118, beats_loss=0.01062, ecapa_loss=0.0001288, whisper_loss=0.09991, over 23079.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01073, ecapa_loss=0.0001549, whisper_loss=0.09131, over 3906209.12 frames. ], batch size: 89, lr: 3.47e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:37:10,196 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 26 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-14 07:37:42,683 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 19 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-14 07:38:09,090 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 8650, loss[loss=0.1025, beats_loss=0.01256, ecapa_loss=0.0001485, whisper_loss=0.08844, over 19114.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01071, ecapa_loss=0.0001563, whisper_loss=0.09126, over 3886890.75 frames. ], batch size: 79, lr: 3.46e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:38:42,315 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2550420.0, ans=0.125 2024-08-14 07:38:56,360 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.012e+01 2.324e+01 2.530e+01 2.821e+01 3.799e+01, threshold=5.059e+01, percent-clipped=0.0 2024-08-14 07:39:18,974 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.67 vs. limit=15.0 2024-08-14 07:39:20,835 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 8700, loss[loss=0.09359, beats_loss=0.009649, ecapa_loss=0.0002299, whisper_loss=0.08164, over 21658.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01074, ecapa_loss=0.0001566, whisper_loss=0.09108, over 3882643.57 frames. ], batch size: 97, lr: 3.46e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:39:32,490 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 38 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-14 07:39:49,644 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2550920.0, ans=0.125 2024-08-14 07:39:58,628 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.32 vs. limit=15.0 2024-08-14 07:40:19,357 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2551120.0, ans=0.125 2024-08-14 07:40:31,588 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 8750, loss[loss=0.08406, beats_loss=0.0134, ecapa_loss=0.0001587, whisper_loss=0.06907, over 21680.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01074, ecapa_loss=0.0001572, whisper_loss=0.09086, over 3867322.95 frames. ], batch size: 93, lr: 3.46e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:40:54,976 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 37 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-14 07:41:07,744 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2551420.0, ans=0.2 2024-08-14 07:41:09,165 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2551420.0, ans=0.125 2024-08-14 07:41:12,133 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2551420.0, ans=0.125 2024-08-14 07:41:16,947 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.853e+01 2.282e+01 2.583e+01 2.856e+01 3.464e+01, threshold=5.167e+01, percent-clipped=0.0 2024-08-14 07:41:25,384 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 26 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-14 07:41:42,157 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 8800, loss[loss=0.09778, beats_loss=0.01038, ecapa_loss=0.0001478, whisper_loss=0.08592, over 15424.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01076, ecapa_loss=0.0001568, whisper_loss=0.09106, over 3868936.02 frames. ], batch size: 59, lr: 3.46e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:41:47,946 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 24 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-14 07:42:16,495 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 24 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-14 07:42:18,901 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.50 vs. limit=15.0 2024-08-14 07:42:20,273 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.32 vs. limit=6.0 2024-08-14 07:42:31,216 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 07:42:32,845 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2552020.0, ans=0.125 2024-08-14 07:42:43,797 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.52 vs. limit=15.0 2024-08-14 07:42:54,281 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 8850, loss[loss=0.1108, beats_loss=0.0104, ecapa_loss=0.0001828, whisper_loss=0.09861, over 21041.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01072, ecapa_loss=0.0001559, whisper_loss=0.09206, over 3886360.12 frames. ], batch size: 87, lr: 3.46e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:43:01,371 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 22 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-14 07:43:07,518 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2552320.0, ans=0.125 2024-08-14 07:43:08,753 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 19 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-14 07:43:09,274 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2552320.0, ans=0.125 2024-08-14 07:43:27,291 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.96 vs. limit=12.0 2024-08-14 07:43:28,476 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2552420.0, ans=0.125 2024-08-14 07:43:39,538 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.949e+01 2.372e+01 2.652e+01 3.112e+01 4.829e+01, threshold=5.305e+01, percent-clipped=0.0 2024-08-14 07:43:40,131 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2552520.0, ans=0.125 2024-08-14 07:43:45,058 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 25 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-14 07:44:05,164 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 8900, loss[loss=0.1339, beats_loss=0.009435, ecapa_loss=0.0001327, whisper_loss=0.1232, over 24138.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.0108, ecapa_loss=0.0001564, whisper_loss=0.09175, over 3918466.41 frames. ], batch size: 88, lr: 3.46e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:44:18,949 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 25 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-14 07:44:20,909 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.08 vs. limit=15.0 2024-08-14 07:44:25,898 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 22 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-14 07:44:31,061 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.26 vs. limit=22.5 2024-08-14 07:44:33,221 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2552920.0, ans=0.2 2024-08-14 07:44:34,711 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2552920.0, ans=0.0 2024-08-14 07:44:54,709 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 17 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-14 07:45:01,129 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2553020.0, ans=0.125 2024-08-14 07:45:03,851 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2553120.0, ans=0.0 2024-08-14 07:45:09,232 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2553120.0, ans=0.125 2024-08-14 07:45:16,853 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 8950, loss[loss=0.1075, beats_loss=0.009518, ecapa_loss=0.0002199, whisper_loss=0.09582, over 18429.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0108, ecapa_loss=0.0001563, whisper_loss=0.09138, over 3917809.46 frames. ], batch size: 79, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:45:21,791 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2553220.0, ans=0.125 2024-08-14 07:45:22,037 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.55 vs. limit=15.0 2024-08-14 07:45:48,777 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2553420.0, ans=0.1 2024-08-14 07:45:54,487 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2553420.0, ans=0.125 2024-08-14 07:46:03,605 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.450e+01 2.761e+01 3.148e+01 4.518e+01, threshold=5.522e+01, percent-clipped=0.0 2024-08-14 07:46:04,047 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2553520.0, ans=0.0 2024-08-14 07:46:06,702 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 25 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-14 07:46:15,862 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2553620.0, ans=0.125 2024-08-14 07:46:17,035 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2553620.0, ans=0.125 2024-08-14 07:46:27,841 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 9000, loss[loss=0.09899, beats_loss=0.01071, ecapa_loss=0.0001703, whisper_loss=0.08657, over 22700.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01076, ecapa_loss=0.0001568, whisper_loss=0.09102, over 3910005.78 frames. ], batch size: 92, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:46:27,842 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-14 07:47:08,592 INFO [train_multi_KD3.py:1149] (1/4) Epoch 18, validation on ASR_libri: loss=0.2528, beats_loss=0, ecapa_loss=0.0005502, whisper_loss=0.2473, over 922467.00 frames. 2024-08-14 07:47:28,484 INFO [train_multi_KD3.py:1149] (1/4) Epoch 18, validation on SV_voxceleb1: loss=0.004391, beats_loss=0, ecapa_loss=0.0004391, whisper_loss=0, over 939242.00 frames. 2024-08-14 07:49:28,223 INFO [train_multi_KD3.py:1149] (1/4) Epoch 18, validation on AT_audioset: loss=0.02358, beats_loss=0.02358, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 07:49:28,227 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 30639MB 2024-08-14 07:49:28,558 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2553720.0, ans=0.1 2024-08-14 07:49:29,818 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-14 07:49:34,012 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 13 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-14 07:49:35,657 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2553720.0, ans=0.125 2024-08-14 07:49:39,476 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 30 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 07:49:45,334 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2553820.0, ans=0.125 2024-08-14 07:50:16,617 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2554020.0, ans=0.2 2024-08-14 07:50:17,597 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 20 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-14 07:50:23,600 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2554120.0, ans=0.09899494936611666 2024-08-14 07:50:28,846 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 21 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-14 07:50:29,657 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.31 vs. limit=15.0 2024-08-14 07:50:30,118 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 15 from LS+wenet, 23 from Vox, 19 fro AS 2024-08-14 07:50:34,552 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 33 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-14 07:50:38,782 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 9050, loss[loss=0.08133, beats_loss=0.01183, ecapa_loss=0.0001708, whisper_loss=0.06779, over 17327.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01071, ecapa_loss=0.000157, whisper_loss=0.09149, over 3912824.05 frames. ], batch size: 72, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:50:56,637 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2554320.0, ans=0.125 2024-08-14 07:51:01,602 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.04 vs. limit=15.0 2024-08-14 07:51:23,368 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2554520.0, ans=0.1 2024-08-14 07:51:25,135 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.02 vs. limit=22.5 2024-08-14 07:51:27,688 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.000e+01 2.420e+01 2.680e+01 3.001e+01 5.357e+01, threshold=5.359e+01, percent-clipped=0.0 2024-08-14 07:51:33,471 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2554520.0, ans=0.125 2024-08-14 07:51:55,912 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 9100, loss[loss=0.1005, beats_loss=0.01254, ecapa_loss=0.0001437, whisper_loss=0.08647, over 21706.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01069, ecapa_loss=0.000158, whisper_loss=0.09133, over 3928356.27 frames. ], batch size: 85, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:52:04,866 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2554720.0, ans=0.125 2024-08-14 07:52:07,811 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.53 vs. limit=15.0 2024-08-14 07:52:11,810 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2554820.0, ans=0.0 2024-08-14 07:52:15,816 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-14 07:52:40,537 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2554920.0, ans=0.125 2024-08-14 07:52:45,734 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 31 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-14 07:52:46,477 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=2555020.0, ans=15.0 2024-08-14 07:52:57,434 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 30 from LS+wenet, 11 from Vox, 34 fro AS 2024-08-14 07:53:01,470 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=2555120.0, ans=15.0 2024-08-14 07:53:09,647 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2555120.0, ans=0.0 2024-08-14 07:53:11,762 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 9150, loss[loss=0.1062, beats_loss=0.009528, ecapa_loss=0.0001601, whisper_loss=0.09506, over 20246.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01077, ecapa_loss=0.0001561, whisper_loss=0.09107, over 3953088.46 frames. ], batch size: 83, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:53:25,000 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 23 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-14 07:53:29,367 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2555320.0, ans=0.2 2024-08-14 07:53:31,680 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 26 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-14 07:53:38,789 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 22 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-14 07:53:47,645 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 23 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-14 07:53:57,994 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.064e+01 2.410e+01 2.700e+01 3.056e+01 6.075e+01, threshold=5.399e+01, percent-clipped=3.0 2024-08-14 07:54:21,930 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 9200, loss[loss=0.1083, beats_loss=0.006361, ecapa_loss=0.0001813, whisper_loss=0.1002, over 17338.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01073, ecapa_loss=0.0001567, whisper_loss=0.09102, over 3918951.09 frames. ], batch size: 66, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:54:23,528 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 23 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-14 07:54:24,927 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 18 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-14 07:54:31,373 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.13 vs. limit=22.5 2024-08-14 07:54:48,687 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.39 vs. limit=15.0 2024-08-14 07:55:02,585 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2555920.0, ans=0.125 2024-08-14 07:55:11,227 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2556020.0, ans=0.0 2024-08-14 07:55:26,080 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-14 07:55:27,722 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2556120.0, ans=0.125 2024-08-14 07:55:33,298 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 9250, loss[loss=0.09593, beats_loss=0.008943, ecapa_loss=0.0001844, whisper_loss=0.08514, over 19291.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01076, ecapa_loss=0.0001579, whisper_loss=0.09035, over 3894932.85 frames. ], batch size: 79, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:55:33,869 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2556220.0, ans=0.2 2024-08-14 07:55:40,985 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2556220.0, ans=0.125 2024-08-14 07:55:49,480 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2556320.0, ans=0.0 2024-08-14 07:56:02,031 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 07:56:06,076 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 25 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-14 07:56:09,451 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2556420.0, ans=0.125 2024-08-14 07:56:18,460 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.64 vs. limit=15.0 2024-08-14 07:56:20,120 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.326e+01 2.739e+01 3.130e+01 4.617e+01, threshold=5.478e+01, percent-clipped=0.0 2024-08-14 07:56:20,742 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2556520.0, ans=0.125 2024-08-14 07:56:43,857 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 9300, loss[loss=0.08812, beats_loss=0.01313, ecapa_loss=0.0001434, whisper_loss=0.07356, over 22012.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01071, ecapa_loss=0.0001583, whisper_loss=0.0912, over 3921325.88 frames. ], batch size: 92, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:57:12,006 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2556920.0, ans=0.015 2024-08-14 07:57:23,521 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-14 07:57:48,207 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2557120.0, ans=0.2 2024-08-14 07:57:51,355 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.60 vs. limit=12.0 2024-08-14 07:57:55,086 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 20 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-14 07:57:56,454 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 9350, loss[loss=0.08839, beats_loss=0.0118, ecapa_loss=0.0001894, whisper_loss=0.07469, over 21182.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01072, ecapa_loss=0.0001582, whisper_loss=0.09069, over 3886746.66 frames. ], batch size: 87, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:57:56,940 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2557220.0, ans=0.125 2024-08-14 07:57:57,376 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.14 vs. limit=10.0 2024-08-14 07:58:15,324 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2557320.0, ans=0.1 2024-08-14 07:58:23,657 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2557420.0, ans=0.125 2024-08-14 07:58:25,998 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 19 from LS+wenet, 24 from Vox, 49 fro AS 2024-08-14 07:58:27,308 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 18 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-14 07:58:32,136 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.04 vs. limit=15.0 2024-08-14 07:58:42,300 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.320e+01 2.600e+01 2.954e+01 6.976e+01, threshold=5.200e+01, percent-clipped=1.0 2024-08-14 07:58:47,019 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 20 from LS+wenet, 20 from Vox, 52 fro AS 2024-08-14 07:58:47,537 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2557520.0, ans=0.1 2024-08-14 07:58:57,464 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2557620.0, ans=0.125 2024-08-14 07:59:03,209 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2557620.0, ans=0.125 2024-08-14 07:59:06,065 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2557720.0, ans=0.125 2024-08-14 07:59:06,883 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 9400, loss[loss=0.09556, beats_loss=0.01236, ecapa_loss=0.0001659, whisper_loss=0.08154, over 20816.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01074, ecapa_loss=0.0001572, whisper_loss=0.09025, over 3870023.92 frames. ], batch size: 91, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:59:24,027 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 15 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-14 07:59:37,711 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.71 vs. limit=15.0 2024-08-14 07:59:45,959 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=2557920.0, ans=15.0 2024-08-14 08:00:01,713 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 18 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-14 08:00:04,395 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 41 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-14 08:00:13,260 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2558120.0, ans=0.07 2024-08-14 08:00:17,373 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 9450, loss[loss=0.1101, beats_loss=0.01434, ecapa_loss=0.0001203, whisper_loss=0.09456, over 23467.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01084, ecapa_loss=0.0001563, whisper_loss=0.08984, over 3873033.09 frames. ], batch size: 92, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:00:30,514 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-14 08:00:33,319 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2558320.0, ans=0.125 2024-08-14 08:00:33,769 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.40 vs. limit=15.0 2024-08-14 08:00:41,829 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 17 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-14 08:00:47,792 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2558420.0, ans=0.125 2024-08-14 08:01:04,818 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.392e+01 2.683e+01 2.998e+01 2.159e+02, threshold=5.366e+01, percent-clipped=1.0 2024-08-14 08:01:04,983 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 18 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-14 08:01:10,836 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 19 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-14 08:01:27,475 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 18 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-14 08:01:28,855 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 9500, loss[loss=0.09076, beats_loss=0.01041, ecapa_loss=0.0002119, whisper_loss=0.07823, over 16197.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01082, ecapa_loss=0.0001577, whisper_loss=0.09026, over 3889571.83 frames. ], batch size: 69, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:01:55,010 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2558820.0, ans=0.0 2024-08-14 08:02:17,352 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2559020.0, ans=0.125 2024-08-14 08:02:17,357 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2559020.0, ans=0.1 2024-08-14 08:02:20,135 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2559020.0, ans=0.2 2024-08-14 08:02:33,195 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2559120.0, ans=0.2 2024-08-14 08:02:39,860 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 9550, loss[loss=0.1073, beats_loss=0.01246, ecapa_loss=0.0001358, whisper_loss=0.09349, over 23450.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01082, ecapa_loss=0.0001578, whisper_loss=0.08935, over 3885006.28 frames. ], batch size: 92, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:02:40,392 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2559220.0, ans=0.125 2024-08-14 08:02:51,101 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 19 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-14 08:03:01,676 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.79 vs. limit=12.0 2024-08-14 08:03:03,286 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.26 vs. limit=22.5 2024-08-14 08:03:11,082 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2559420.0, ans=0.1 2024-08-14 08:03:25,278 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 22 from LS+wenet, 31 from Vox, 38 fro AS 2024-08-14 08:03:26,439 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.824e+01 2.380e+01 2.664e+01 3.149e+01 1.810e+02, threshold=5.328e+01, percent-clipped=2.0 2024-08-14 08:03:30,416 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=8.47 vs. limit=12.0 2024-08-14 08:03:50,389 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 9600, loss[loss=0.1178, beats_loss=0.01088, ecapa_loss=0.0001384, whisper_loss=0.1055, over 22399.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0108, ecapa_loss=0.0001571, whisper_loss=0.08987, over 3879028.85 frames. ], batch size: 88, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:03:53,673 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-14 08:03:54,185 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2559720.0, ans=0.2 2024-08-14 08:04:00,986 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2559720.0, ans=0.125 2024-08-14 08:04:06,551 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 34 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-14 08:04:20,558 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=13.31 vs. limit=15.0 2024-08-14 08:04:46,226 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 21 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-14 08:05:00,996 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2560120.0, ans=0.125 2024-08-14 08:05:01,863 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 28 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-14 08:05:06,101 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 9650, loss[loss=0.1069, beats_loss=0.01205, ecapa_loss=0.0001566, whisper_loss=0.09329, over 22864.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01078, ecapa_loss=0.0001583, whisper_loss=0.08937, over 3826766.94 frames. ], batch size: 92, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:05:13,578 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2560220.0, ans=0.2 2024-08-14 08:05:17,896 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=2560220.0, ans=10.0 2024-08-14 08:05:19,310 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2560320.0, ans=0.0 2024-08-14 08:05:22,273 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2560320.0, ans=0.125 2024-08-14 08:05:29,574 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.87 vs. limit=15.0 2024-08-14 08:05:32,403 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.98 vs. limit=15.0 2024-08-14 08:05:39,055 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2560420.0, ans=0.1 2024-08-14 08:05:44,866 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2560420.0, ans=0.125 2024-08-14 08:05:46,680 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.29 vs. limit=15.0 2024-08-14 08:05:52,947 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.378e+01 2.605e+01 3.057e+01 7.649e+01, threshold=5.209e+01, percent-clipped=3.0 2024-08-14 08:06:01,169 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 16 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-14 08:06:11,184 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 28 from LS+wenet, 28 from Vox, 23 fro AS 2024-08-14 08:06:16,718 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 9700, loss[loss=0.1186, beats_loss=0.01063, ecapa_loss=0.0001835, whisper_loss=0.1062, over 22488.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01066, ecapa_loss=0.0001604, whisper_loss=0.08968, over 3801152.33 frames. ], batch size: 93, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:06:31,728 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.76 vs. limit=6.0 2024-08-14 08:06:38,277 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2560820.0, ans=0.125 2024-08-14 08:06:40,878 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 22 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-14 08:06:48,711 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2560920.0, ans=0.125 2024-08-14 08:06:51,814 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2560920.0, ans=0.0 2024-08-14 08:07:00,038 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 17 from LS+wenet, 13 from Vox, 39 fro AS 2024-08-14 08:07:10,220 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 25 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-14 08:07:27,684 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2561220.0, ans=0.125 2024-08-14 08:07:28,662 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 9750, loss[loss=0.0983, beats_loss=0.01189, ecapa_loss=0.0001475, whisper_loss=0.08494, over 23636.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0107, ecapa_loss=0.0001592, whisper_loss=0.08944, over 3820855.81 frames. ], batch size: 93, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:07:47,402 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 27 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-14 08:08:06,189 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 20 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-14 08:08:16,147 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.895e+01 2.202e+01 2.404e+01 2.626e+01 3.852e+01, threshold=4.808e+01, percent-clipped=0.0 2024-08-14 08:08:25,229 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2561620.0, ans=0.0 2024-08-14 08:08:35,256 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2561620.0, ans=0.1 2024-08-14 08:08:40,261 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 9800, loss[loss=0.1026, beats_loss=0.01041, ecapa_loss=0.0001653, whisper_loss=0.09052, over 18860.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0107, ecapa_loss=0.000159, whisper_loss=0.09022, over 3861925.20 frames. ], batch size: 75, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:09:04,884 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2561820.0, ans=0.125 2024-08-14 08:09:12,135 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2561920.0, ans=0.125 2024-08-14 08:09:17,901 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.47 vs. limit=22.5 2024-08-14 08:09:28,424 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 15 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-14 08:09:35,938 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2562120.0, ans=0.125 2024-08-14 08:09:38,025 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 24 from LS+wenet, 19 from Vox, 52 fro AS 2024-08-14 08:09:46,113 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 33 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-14 08:09:50,571 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 9850, loss[loss=0.1143, beats_loss=0.01, ecapa_loss=0.0001393, whisper_loss=0.1029, over 15925.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01065, ecapa_loss=0.0001596, whisper_loss=0.09182, over 3872249.45 frames. ], batch size: 61, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:09:52,163 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 20 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-14 08:10:04,532 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 31 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-14 08:10:16,008 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 26 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-14 08:10:20,827 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2562420.0, ans=0.1 2024-08-14 08:10:36,601 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.966e+01 2.409e+01 2.672e+01 2.970e+01 5.427e+01, threshold=5.345e+01, percent-clipped=1.0 2024-08-14 08:10:38,326 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-14 08:10:48,449 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2562620.0, ans=0.125 2024-08-14 08:11:00,326 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 9900, loss[loss=0.1283, beats_loss=0.008846, ecapa_loss=0.0001632, whisper_loss=0.1178, over 23052.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01066, ecapa_loss=0.0001595, whisper_loss=0.09186, over 3895876.85 frames. ], batch size: 91, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:11:02,342 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2562720.0, ans=0.0 2024-08-14 08:11:08,231 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2562720.0, ans=0.0 2024-08-14 08:11:16,561 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2562820.0, ans=0.0 2024-08-14 08:11:25,235 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2562820.0, ans=0.125 2024-08-14 08:11:46,746 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2563020.0, ans=0.0 2024-08-14 08:12:03,357 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2563120.0, ans=0.125 2024-08-14 08:12:11,577 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 9950, loss[loss=0.09962, beats_loss=0.01232, ecapa_loss=0.0001298, whisper_loss=0.08601, over 17340.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01069, ecapa_loss=0.0001591, whisper_loss=0.09189, over 3895514.97 frames. ], batch size: 68, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:12:13,505 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=2563220.0, ans=0.95 2024-08-14 08:12:27,857 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2563320.0, ans=0.125 2024-08-14 08:12:42,046 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.83 vs. limit=10.0 2024-08-14 08:12:42,714 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 23 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-14 08:12:58,423 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.310e+01 2.516e+01 2.952e+01 4.420e+01, threshold=5.033e+01, percent-clipped=0.0 2024-08-14 08:13:04,629 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2563520.0, ans=0.125 2024-08-14 08:13:08,385 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 25 from LS+wenet, 29 from Vox, 40 fro AS 2024-08-14 08:13:21,685 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=2563720.0, ans=0.5 2024-08-14 08:13:22,565 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 10000, loss[loss=0.1266, beats_loss=0.008582, ecapa_loss=0.0001583, whisper_loss=0.1164, over 18578.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01072, ecapa_loss=0.0001592, whisper_loss=0.09186, over 3893751.95 frames. ], batch size: 71, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:13:29,551 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.96 vs. limit=15.0 2024-08-14 08:13:55,231 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 28 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-14 08:14:03,462 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.77 vs. limit=15.0 2024-08-14 08:14:14,371 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 19 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-14 08:14:16,642 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.25 vs. limit=15.0 2024-08-14 08:14:23,121 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.72 vs. limit=22.5 2024-08-14 08:14:33,673 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 10050, loss[loss=0.1301, beats_loss=0.01024, ecapa_loss=0.0001663, whisper_loss=0.1182, over 19027.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01067, ecapa_loss=0.0001586, whisper_loss=0.09174, over 3895983.99 frames. ], batch size: 76, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:14:36,461 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 26 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-14 08:14:49,009 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.05 vs. limit=15.0 2024-08-14 08:14:52,190 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.80 vs. limit=22.5 2024-08-14 08:14:57,256 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2564320.0, ans=0.125 2024-08-14 08:15:02,889 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2564420.0, ans=0.1 2024-08-14 08:15:07,585 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2564420.0, ans=0.5 2024-08-14 08:15:17,272 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2564520.0, ans=0.1 2024-08-14 08:15:22,199 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.780e+01 2.332e+01 2.576e+01 2.987e+01 4.902e+01, threshold=5.151e+01, percent-clipped=0.0 2024-08-14 08:15:27,508 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2564520.0, ans=0.0 2024-08-14 08:15:28,452 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 22 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-14 08:15:29,272 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.57 vs. limit=22.5 2024-08-14 08:15:40,509 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2564620.0, ans=0.025 2024-08-14 08:15:41,547 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-14 08:15:42,680 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 13 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-14 08:15:43,060 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2564620.0, ans=0.2 2024-08-14 08:15:47,019 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 10100, loss[loss=0.1026, beats_loss=0.01188, ecapa_loss=0.0001541, whisper_loss=0.08915, over 18207.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01073, ecapa_loss=0.0001577, whisper_loss=0.09153, over 3906302.19 frames. ], batch size: 72, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:15:56,609 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-14 08:16:17,792 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 25 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-14 08:16:28,097 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 23 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-14 08:16:36,539 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2565020.0, ans=0.125 2024-08-14 08:17:03,887 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-14 08:17:12,497 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 10150, loss[loss=0.09776, beats_loss=0.01128, ecapa_loss=0.0001601, whisper_loss=0.08488, over 16160.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01068, ecapa_loss=0.0001588, whisper_loss=0.09144, over 3893650.43 frames. ], batch size: 69, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:17:26,454 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 30 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-14 08:17:34,407 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-14 08:17:46,970 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 23 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-14 08:18:03,221 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2565520.0, ans=0.125 2024-08-14 08:18:08,627 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.786e+01 2.364e+01 2.632e+01 2.890e+01 4.484e+01, threshold=5.264e+01, percent-clipped=0.0 2024-08-14 08:18:26,682 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.65 vs. limit=22.5 2024-08-14 08:18:31,148 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.14 vs. limit=15.0 2024-08-14 08:18:36,660 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 10200, loss[loss=0.09638, beats_loss=0.007161, ecapa_loss=0.000202, whisper_loss=0.0872, over 15028.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01058, ecapa_loss=0.0001593, whisper_loss=0.09152, over 3871481.46 frames. ], batch size: 62, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:18:42,936 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.29 vs. limit=15.0 2024-08-14 08:18:45,316 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 24 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-14 08:18:59,398 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2565820.0, ans=0.125 2024-08-14 08:19:08,635 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2565820.0, ans=0.125 2024-08-14 08:19:18,757 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2565920.0, ans=0.125 2024-08-14 08:19:20,237 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=5.530e+01 2024-08-14 08:19:27,176 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2565920.0, ans=0.125 2024-08-14 08:19:28,975 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2566020.0, ans=0.125 2024-08-14 08:19:29,128 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.33 vs. limit=22.5 2024-08-14 08:19:40,135 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2566020.0, ans=0.0 2024-08-14 08:19:56,177 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 34 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-14 08:19:57,442 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-14 08:19:59,983 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2566120.0, ans=0.95 2024-08-14 08:20:04,435 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 10250, loss[loss=0.1109, beats_loss=0.009823, ecapa_loss=0.0001568, whisper_loss=0.09947, over 22233.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01055, ecapa_loss=0.0001596, whisper_loss=0.09222, over 3907380.56 frames. ], batch size: 87, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:20:28,634 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 13 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-14 08:20:36,435 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 24 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-14 08:20:40,066 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2566420.0, ans=0.0 2024-08-14 08:20:50,113 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2566420.0, ans=0.125 2024-08-14 08:20:57,477 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.331e+01 2.528e+01 2.980e+01 4.721e+01, threshold=5.056e+01, percent-clipped=0.0 2024-08-14 08:21:02,376 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2566520.0, ans=0.0 2024-08-14 08:21:08,921 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-14 08:21:21,066 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2566620.0, ans=0.125 2024-08-14 08:21:22,100 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-14 08:21:26,744 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 10300, loss[loss=0.1161, beats_loss=0.00734, ecapa_loss=0.0001901, whisper_loss=0.1068, over 15206.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01054, ecapa_loss=0.0001601, whisper_loss=0.09197, over 3897977.46 frames. ], batch size: 59, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:21:43,209 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2566820.0, ans=0.0 2024-08-14 08:21:46,526 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.52 vs. limit=15.0 2024-08-14 08:21:57,214 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2566820.0, ans=0.125 2024-08-14 08:22:12,173 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2566920.0, ans=0.05 2024-08-14 08:22:19,763 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.21 vs. limit=8.0 2024-08-14 08:22:35,075 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2567120.0, ans=0.2 2024-08-14 08:22:51,137 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 10350, loss[loss=0.1041, beats_loss=0.01251, ecapa_loss=0.000169, whisper_loss=0.08992, over 20987.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01064, ecapa_loss=0.0001595, whisper_loss=0.09145, over 3911736.16 frames. ], batch size: 85, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:23:00,843 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 26 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-14 08:23:02,858 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2567220.0, ans=0.125 2024-08-14 08:23:06,930 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.36 vs. limit=22.5 2024-08-14 08:23:25,868 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 19 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-14 08:23:49,080 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 32 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-14 08:23:51,933 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.017e+01 2.354e+01 2.584e+01 2.935e+01 4.636e+01, threshold=5.168e+01, percent-clipped=0.0 2024-08-14 08:24:08,726 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2567620.0, ans=0.0 2024-08-14 08:24:22,225 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 10400, loss[loss=0.09119, beats_loss=0.01095, ecapa_loss=0.0001704, whisper_loss=0.07853, over 19939.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01068, ecapa_loss=0.0001587, whisper_loss=0.09084, over 3911647.75 frames. ], batch size: 83, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:24:32,035 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 22 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-14 08:24:48,684 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.459e+05 2024-08-14 08:24:52,621 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.61 vs. limit=22.5 2024-08-14 08:24:58,463 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2567920.0, ans=0.035 2024-08-14 08:25:04,037 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2567920.0, ans=0.125 2024-08-14 08:25:06,012 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2567920.0, ans=0.125 2024-08-14 08:25:14,724 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2568020.0, ans=0.125 2024-08-14 08:25:27,798 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2568020.0, ans=0.125 2024-08-14 08:25:31,382 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 23 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-14 08:25:35,516 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 23 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-14 08:25:43,212 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 26 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-14 08:25:49,534 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 10450, loss[loss=0.1064, beats_loss=0.01004, ecapa_loss=0.0001589, whisper_loss=0.09477, over 21337.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0107, ecapa_loss=0.0001572, whisper_loss=0.09029, over 3911987.73 frames. ], batch size: 88, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:25:51,887 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 15 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-14 08:25:52,289 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2568220.0, ans=0.04949747468305833 2024-08-14 08:26:01,065 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2568220.0, ans=0.0 2024-08-14 08:26:16,959 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2568320.0, ans=0.2 2024-08-14 08:26:41,410 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.325e+01 2.620e+01 2.982e+01 4.539e+01, threshold=5.240e+01, percent-clipped=0.0 2024-08-14 08:27:00,012 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2568620.0, ans=0.125 2024-08-14 08:27:08,335 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 10500, loss[loss=0.1078, beats_loss=0.009433, ecapa_loss=0.0001862, whisper_loss=0.09654, over 16931.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01072, ecapa_loss=0.0001575, whisper_loss=0.09012, over 3900397.76 frames. ], batch size: 68, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:27:15,857 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2568720.0, ans=0.0 2024-08-14 08:27:23,589 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 16 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-14 08:27:23,963 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2568820.0, ans=0.2 2024-08-14 08:27:23,965 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2568820.0, ans=0.09899494936611666 2024-08-14 08:27:40,893 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2568820.0, ans=0.125 2024-08-14 08:28:04,328 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 25 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-14 08:28:09,081 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2569020.0, ans=0.125 2024-08-14 08:28:15,559 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2569020.0, ans=0.1 2024-08-14 08:28:25,265 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2569120.0, ans=10.0 2024-08-14 08:28:41,517 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 10550, loss[loss=0.1173, beats_loss=0.00998, ecapa_loss=0.000146, whisper_loss=0.1059, over 23502.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01069, ecapa_loss=0.0001568, whisper_loss=0.0907, over 3913749.87 frames. ], batch size: 93, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:28:41,690 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-14 08:28:48,727 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2569220.0, ans=0.125 2024-08-14 08:29:07,866 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.48 vs. limit=15.0 2024-08-14 08:29:18,783 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-14 08:29:27,795 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 14 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-14 08:29:41,004 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.912e+01 2.277e+01 2.540e+01 2.895e+01 1.094e+02, threshold=5.080e+01, percent-clipped=1.0 2024-08-14 08:29:53,504 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2569620.0, ans=0.0 2024-08-14 08:29:56,715 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2569620.0, ans=0.1 2024-08-14 08:30:06,830 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 10600, loss[loss=0.09248, beats_loss=0.009815, ecapa_loss=0.0001769, whisper_loss=0.0809, over 18211.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0106, ecapa_loss=0.0001565, whisper_loss=0.09146, over 3914116.66 frames. ], batch size: 77, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:30:12,420 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2569720.0, ans=0.125 2024-08-14 08:30:12,518 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2569720.0, ans=0.125 2024-08-14 08:30:13,477 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-14 08:30:31,989 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2569820.0, ans=0.2 2024-08-14 08:30:43,175 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2569920.0, ans=0.125 2024-08-14 08:30:45,716 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 22 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-14 08:30:50,997 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2569920.0, ans=0.125 2024-08-14 08:31:00,780 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2570020.0, ans=0.2 2024-08-14 08:31:03,833 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=2570020.0, ans=0.2 2024-08-14 08:31:11,640 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2570120.0, ans=0.125 2024-08-14 08:31:13,237 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2570120.0, ans=0.2 2024-08-14 08:31:13,598 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.21 vs. limit=15.0 2024-08-14 08:31:24,563 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 10650, loss[loss=0.126, beats_loss=0.007829, ecapa_loss=0.0001748, whisper_loss=0.1164, over 23842.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01066, ecapa_loss=0.0001556, whisper_loss=0.09121, over 3922559.10 frames. ], batch size: 94, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:31:27,221 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.50 vs. limit=15.0 2024-08-14 08:31:49,898 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2570320.0, ans=0.025 2024-08-14 08:31:52,098 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.72 vs. limit=15.0 2024-08-14 08:31:53,928 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2570320.0, ans=0.125 2024-08-14 08:32:04,405 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2570420.0, ans=0.1 2024-08-14 08:32:08,823 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2570420.0, ans=0.0 2024-08-14 08:32:11,918 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.429e-01 2024-08-14 08:32:12,899 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 30 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-14 08:32:13,451 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2570420.0, ans=0.0 2024-08-14 08:32:18,863 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.563e-03 2024-08-14 08:32:21,094 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.360e+01 2.616e+01 3.033e+01 9.241e+01, threshold=5.233e+01, percent-clipped=1.0 2024-08-14 08:32:33,567 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.12 vs. limit=12.0 2024-08-14 08:32:35,245 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2570620.0, ans=0.125 2024-08-14 08:32:52,955 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 10700, loss[loss=0.09968, beats_loss=0.01124, ecapa_loss=0.0001551, whisper_loss=0.08689, over 18418.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.0106, ecapa_loss=0.0001554, whisper_loss=0.09204, over 3930955.50 frames. ], batch size: 73, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:33:06,814 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2570720.0, ans=0.0 2024-08-14 08:33:15,973 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 24 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-14 08:33:47,896 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2571020.0, ans=0.1 2024-08-14 08:33:48,023 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2571020.0, ans=0.1 2024-08-14 08:33:50,436 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-14 08:34:06,014 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 08:34:16,014 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.13 vs. limit=12.0 2024-08-14 08:34:21,005 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 10750, loss[loss=0.1141, beats_loss=0.0117, ecapa_loss=0.0001471, whisper_loss=0.1009, over 15367.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01063, ecapa_loss=0.0001556, whisper_loss=0.0921, over 3926504.97 frames. ], batch size: 60, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:34:21,162 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-14 08:34:25,894 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2571220.0, ans=0.125 2024-08-14 08:34:29,886 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 26 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-14 08:34:33,487 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 28 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-14 08:34:38,993 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2571320.0, ans=0.2 2024-08-14 08:34:48,152 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 26 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-14 08:34:55,227 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2571420.0, ans=0.125 2024-08-14 08:35:00,028 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.26 vs. limit=15.0 2024-08-14 08:35:13,229 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.456e+01 2.714e+01 3.010e+01 3.209e+02, threshold=5.428e+01, percent-clipped=1.0 2024-08-14 08:35:13,699 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2571520.0, ans=0.07 2024-08-14 08:35:18,363 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2571520.0, ans=0.125 2024-08-14 08:35:22,574 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.58 vs. limit=6.0 2024-08-14 08:35:26,489 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 27 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-14 08:35:38,269 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 10800, loss[loss=0.09012, beats_loss=0.01208, ecapa_loss=0.0001669, whisper_loss=0.07636, over 18665.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01055, ecapa_loss=0.0001563, whisper_loss=0.0926, over 3949764.17 frames. ], batch size: 76, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:35:55,548 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.76 vs. limit=15.0 2024-08-14 08:35:55,783 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.91 vs. limit=22.5 2024-08-14 08:36:20,351 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2571920.0, ans=0.125 2024-08-14 08:36:52,611 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 10850, loss[loss=0.1022, beats_loss=0.01173, ecapa_loss=0.0001399, whisper_loss=0.0891, over 16612.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01063, ecapa_loss=0.0001559, whisper_loss=0.09207, over 3932839.04 frames. ], batch size: 67, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:36:52,762 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 27 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-14 08:37:13,923 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.17 vs. limit=15.0 2024-08-14 08:37:24,123 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2572420.0, ans=0.125 2024-08-14 08:37:45,438 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.787e+01 2.429e+01 2.769e+01 3.241e+01 1.860e+02, threshold=5.537e+01, percent-clipped=2.0 2024-08-14 08:37:46,602 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-14 08:38:00,245 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2572620.0, ans=0.0 2024-08-14 08:38:06,873 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-14 08:38:17,640 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 10900, loss[loss=0.1267, beats_loss=0.01014, ecapa_loss=0.0001562, whisper_loss=0.115, over 20839.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01064, ecapa_loss=0.0001552, whisper_loss=0.09254, over 3952650.15 frames. ], batch size: 80, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:38:36,101 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 29 from LS+wenet, 29 from Vox, 22 fro AS 2024-08-14 08:38:42,094 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.52 vs. limit=22.5 2024-08-14 08:38:54,144 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.39 vs. limit=22.5 2024-08-14 08:39:09,734 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2572920.0, ans=0.2 2024-08-14 08:39:13,794 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2573020.0, ans=0.0 2024-08-14 08:39:22,653 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.23 vs. limit=12.0 2024-08-14 08:39:25,955 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2573020.0, ans=0.1 2024-08-14 08:39:33,314 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2573120.0, ans=0.0 2024-08-14 08:39:47,399 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 10950, loss[loss=0.1369, beats_loss=0.008984, ecapa_loss=0.0001959, whisper_loss=0.126, over 22570.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01055, ecapa_loss=0.0001565, whisper_loss=0.09315, over 3917182.63 frames. ], batch size: 92, lr: 3.45e-03, grad_scale: 1.152921504606847e+18 2024-08-14 08:40:02,090 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2573320.0, ans=0.125 2024-08-14 08:40:03,820 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2573320.0, ans=0.1 2024-08-14 08:40:16,086 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2573420.0, ans=0.0 2024-08-14 08:40:18,101 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.59 vs. limit=15.0 2024-08-14 08:40:19,355 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.14 vs. limit=12.0 2024-08-14 08:40:21,707 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 19 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-14 08:40:24,995 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2573420.0, ans=0.0 2024-08-14 08:40:26,163 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2573420.0, ans=0.125 2024-08-14 08:40:29,885 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 33 from Vox, 30 fro AS 2024-08-14 08:40:37,050 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.991e+01 2.432e+01 2.668e+01 2.934e+01 4.215e+01, threshold=5.335e+01, percent-clipped=0.0 2024-08-14 08:40:56,277 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 20 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-14 08:41:05,865 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 11000, loss[loss=0.08158, beats_loss=0.0105, ecapa_loss=0.0002169, whisper_loss=0.06891, over 15214.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01059, ecapa_loss=0.0001582, whisper_loss=0.09251, over 3898742.89 frames. ], batch size: 67, lr: 3.45e-03, grad_scale: 1.152921504606847e+18 2024-08-14 08:41:06,437 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2573720.0, ans=0.0 2024-08-14 08:41:29,582 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2573820.0, ans=0.0 2024-08-14 08:42:06,434 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2574020.0, ans=0.125 2024-08-14 08:42:09,407 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2574020.0, ans=0.025 2024-08-14 08:42:09,420 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2574020.0, ans=0.125 2024-08-14 08:42:34,328 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-14 08:42:37,754 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 15 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-14 08:42:38,887 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 11050, loss[loss=0.08825, beats_loss=0.01021, ecapa_loss=0.0001406, whisper_loss=0.07664, over 14863.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.0106, ecapa_loss=0.0001575, whisper_loss=0.09227, over 3931945.66 frames. ], batch size: 55, lr: 3.45e-03, grad_scale: 1.152921504606847e+18 2024-08-14 08:42:52,476 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-14 08:43:01,047 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=2574320.0, ans=0.5 2024-08-14 08:43:09,737 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 19 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-14 08:43:09,928 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2574320.0, ans=0.035 2024-08-14 08:43:17,813 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2574420.0, ans=0.125 2024-08-14 08:43:26,828 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.86 vs. limit=10.0 2024-08-14 08:43:36,313 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2574520.0, ans=0.025 2024-08-14 08:43:41,587 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.738e+01 2.313e+01 2.557e+01 2.807e+01 4.067e+01, threshold=5.114e+01, percent-clipped=0.0 2024-08-14 08:43:58,703 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 16 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-14 08:44:20,953 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 11100, loss[loss=0.07367, beats_loss=0.01272, ecapa_loss=0.0001608, whisper_loss=0.05935, over 19592.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01056, ecapa_loss=0.0001587, whisper_loss=0.09145, over 3918343.66 frames. ], batch size: 81, lr: 3.45e-03, grad_scale: 1.152921504606847e+18 2024-08-14 08:45:00,273 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2574820.0, ans=0.125 2024-08-14 08:45:37,517 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2575020.0, ans=0.5 2024-08-14 08:45:37,594 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.133e+02 2024-08-14 08:45:48,155 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.80 vs. limit=22.5 2024-08-14 08:46:09,949 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 31 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-14 08:46:13,799 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 11150, loss[loss=0.1218, beats_loss=0.009951, ecapa_loss=0.0001763, whisper_loss=0.1101, over 19215.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01061, ecapa_loss=0.0001582, whisper_loss=0.09131, over 3885942.31 frames. ], batch size: 75, lr: 3.45e-03, grad_scale: 1.152921504606847e+18 2024-08-14 08:46:41,686 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 21 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-14 08:46:54,278 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.57 vs. limit=15.0 2024-08-14 08:46:55,929 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2575320.0, ans=0.1 2024-08-14 08:47:04,722 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 23 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-14 08:47:29,593 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.938e+01 2.286e+01 2.573e+01 3.032e+01 5.380e+01, threshold=5.147e+01, percent-clipped=1.0 2024-08-14 08:47:33,737 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 30 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-14 08:47:40,776 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 22 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-14 08:48:05,374 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 26 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-14 08:48:07,970 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 34 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-14 08:48:11,299 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 11200, loss[loss=0.07126, beats_loss=0.01321, ecapa_loss=0.0001657, whisper_loss=0.05639, over 14118.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01061, ecapa_loss=0.0001585, whisper_loss=0.09116, over 3885926.37 frames. ], batch size: 58, lr: 3.45e-03, grad_scale: 1.152921504606847e+18 2024-08-14 08:48:24,713 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.07 vs. limit=10.0 2024-08-14 08:48:28,361 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-14 08:48:31,114 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-14 08:48:47,878 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2575820.0, ans=0.0 2024-08-14 08:49:02,285 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-14 08:49:04,076 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-14 08:49:08,083 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.03 vs. limit=22.5 2024-08-14 08:49:33,969 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2576120.0, ans=0.1 2024-08-14 08:49:36,315 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 11250, loss[loss=0.08841, beats_loss=0.01217, ecapa_loss=0.0001454, whisper_loss=0.07478, over 21848.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01067, ecapa_loss=0.0001588, whisper_loss=0.09083, over 3902755.90 frames. ], batch size: 89, lr: 3.45e-03, grad_scale: 1.152921504606847e+18 2024-08-14 08:49:53,726 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 34 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-14 08:49:55,314 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2576320.0, ans=0.035 2024-08-14 08:49:58,056 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2576320.0, ans=0.0 2024-08-14 08:49:58,687 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.83 vs. limit=6.0 2024-08-14 08:50:21,198 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.21 vs. limit=12.0 2024-08-14 08:50:26,304 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.795e+01 2.439e+01 2.714e+01 2.976e+01 1.044e+02, threshold=5.429e+01, percent-clipped=2.0 2024-08-14 08:50:35,182 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2576520.0, ans=0.0 2024-08-14 08:50:48,313 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2576620.0, ans=0.125 2024-08-14 08:50:54,394 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 11300, loss[loss=0.1096, beats_loss=0.009123, ecapa_loss=0.0001365, whisper_loss=0.0991, over 16884.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01064, ecapa_loss=0.0001585, whisper_loss=0.09086, over 3908888.75 frames. ], batch size: 65, lr: 3.45e-03, grad_scale: 1.152921504606847e+18 2024-08-14 08:50:55,155 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 19 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-14 08:50:58,520 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 14 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-14 08:50:58,840 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 08:51:00,013 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 24 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-14 08:51:00,215 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2576720.0, ans=0.125 2024-08-14 08:51:08,252 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2576720.0, ans=0.1 2024-08-14 08:51:09,269 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 23 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-14 08:51:26,964 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2576920.0, ans=0.0 2024-08-14 08:51:32,573 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2576920.0, ans=0.125 2024-08-14 08:51:37,075 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 20 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-14 08:51:43,229 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 25 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-14 08:51:43,704 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2577020.0, ans=0.125 2024-08-14 08:51:45,107 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2577020.0, ans=0.125 2024-08-14 08:51:47,130 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.99 vs. limit=6.0 2024-08-14 08:51:51,873 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2577020.0, ans=0.0 2024-08-14 08:52:16,120 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 11350, loss[loss=0.09019, beats_loss=0.01235, ecapa_loss=0.0001536, whisper_loss=0.0763, over 21520.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01068, ecapa_loss=0.0001583, whisper_loss=0.09017, over 3883312.84 frames. ], batch size: 89, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:52:20,017 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-14 08:52:23,502 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 26 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-14 08:52:24,897 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 27 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-14 08:52:42,884 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 17 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-14 08:52:48,966 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2577320.0, ans=0.125 2024-08-14 08:53:02,422 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2577420.0, ans=0.125 2024-08-14 08:53:14,111 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.788e+01 2.319e+01 2.550e+01 2.857e+01 6.146e+01, threshold=5.100e+01, percent-clipped=1.0 2024-08-14 08:53:15,970 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 21 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-14 08:53:19,869 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2577520.0, ans=0.125 2024-08-14 08:53:29,948 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2577620.0, ans=0.0 2024-08-14 08:53:40,261 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 11400, loss[loss=0.1133, beats_loss=0.01013, ecapa_loss=0.0001397, whisper_loss=0.1017, over 22764.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0107, ecapa_loss=0.0001583, whisper_loss=0.0899, over 3866624.86 frames. ], batch size: 88, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:53:41,450 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.58 vs. limit=15.0 2024-08-14 08:53:42,344 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2577720.0, ans=0.07 2024-08-14 08:53:47,796 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2577720.0, ans=0.2 2024-08-14 08:54:10,543 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2577820.0, ans=0.1 2024-08-14 08:54:11,872 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2577920.0, ans=0.125 2024-08-14 08:54:13,429 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2577920.0, ans=0.125 2024-08-14 08:54:17,337 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 24 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-14 08:54:29,439 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2578020.0, ans=0.1 2024-08-14 08:54:32,571 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2578020.0, ans=0.1 2024-08-14 08:54:32,572 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2578020.0, ans=0.95 2024-08-14 08:54:32,665 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.033e-02 2024-08-14 08:54:41,607 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2578020.0, ans=0.0 2024-08-14 08:54:46,056 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2578120.0, ans=0.125 2024-08-14 08:54:54,524 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2578120.0, ans=0.0 2024-08-14 08:54:58,257 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 11450, loss[loss=0.108, beats_loss=0.01043, ecapa_loss=0.0001468, whisper_loss=0.09614, over 21355.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01075, ecapa_loss=0.0001572, whisper_loss=0.09033, over 3874823.23 frames. ], batch size: 84, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:55:00,536 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2578220.0, ans=0.1 2024-08-14 08:55:00,633 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2578220.0, ans=0.125 2024-08-14 08:55:13,848 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2578320.0, ans=0.07 2024-08-14 08:55:47,226 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2578520.0, ans=0.125 2024-08-14 08:55:48,040 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.836e+01 2.444e+01 2.679e+01 2.887e+01 5.368e+01, threshold=5.358e+01, percent-clipped=1.0 2024-08-14 08:55:50,779 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2578520.0, ans=0.1 2024-08-14 08:56:11,258 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2578620.0, ans=0.2 2024-08-14 08:56:13,501 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 11500, loss[loss=0.1231, beats_loss=0.01143, ecapa_loss=0.0001584, whisper_loss=0.1101, over 19885.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01071, ecapa_loss=0.0001572, whisper_loss=0.0905, over 3871677.78 frames. ], batch size: 81, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:56:18,848 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.01 vs. limit=22.5 2024-08-14 08:56:24,552 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2578720.0, ans=0.2 2024-08-14 08:56:25,907 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2578720.0, ans=0.125 2024-08-14 08:56:36,568 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2578820.0, ans=0.125 2024-08-14 08:56:38,268 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 34 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-14 08:56:42,818 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-14 08:56:45,024 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2578920.0, ans=0.125 2024-08-14 08:56:55,051 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2578920.0, ans=0.2 2024-08-14 08:57:15,670 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 25 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-14 08:57:28,735 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 11550, loss[loss=0.11, beats_loss=0.01002, ecapa_loss=0.0001539, whisper_loss=0.09841, over 14429.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01068, ecapa_loss=0.0001573, whisper_loss=0.09109, over 3883731.25 frames. ], batch size: 57, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:57:39,055 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-14 08:57:40,642 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-14 08:57:51,965 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 18 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-14 08:58:08,523 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2579420.0, ans=0.0 2024-08-14 08:58:18,802 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.840e+01 2.371e+01 2.639e+01 3.011e+01 4.840e+01, threshold=5.278e+01, percent-clipped=0.0 2024-08-14 08:58:24,791 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2579520.0, ans=0.0 2024-08-14 08:58:32,114 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2579620.0, ans=0.2 2024-08-14 08:58:41,430 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 11600, loss[loss=0.08889, beats_loss=0.00964, ecapa_loss=0.0001513, whisper_loss=0.07774, over 18023.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01071, ecapa_loss=0.0001568, whisper_loss=0.09153, over 3888095.14 frames. ], batch size: 71, lr: 3.44e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:58:44,612 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-14 08:58:46,431 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2579720.0, ans=0.0 2024-08-14 08:58:58,972 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.72 vs. limit=15.0 2024-08-14 08:59:01,328 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2579820.0, ans=0.125 2024-08-14 08:59:09,499 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 21 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-14 08:59:15,360 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2579920.0, ans=0.0 2024-08-14 08:59:25,258 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=2580020.0, ans=0.5 2024-08-14 08:59:30,738 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2580020.0, ans=0.1 2024-08-14 08:59:36,308 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2580020.0, ans=0.0 2024-08-14 08:59:52,142 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2580220.0, ans=0.5 2024-08-14 08:59:53,119 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 11650, loss[loss=0.09031, beats_loss=0.0125, ecapa_loss=0.0001379, whisper_loss=0.07643, over 21441.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01079, ecapa_loss=0.0001567, whisper_loss=0.09132, over 3897654.93 frames. ], batch size: 88, lr: 3.44e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:59:56,774 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2580220.0, ans=0.2 2024-08-14 09:00:20,252 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.77 vs. limit=22.5 2024-08-14 09:00:23,974 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2580420.0, ans=10.0 2024-08-14 09:00:32,369 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.98 vs. limit=15.0 2024-08-14 09:00:37,833 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2580520.0, ans=0.125 2024-08-14 09:00:41,775 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.943e+01 2.343e+01 2.621e+01 2.855e+01 6.176e+01, threshold=5.243e+01, percent-clipped=1.0 2024-08-14 09:00:54,504 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 28 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-14 09:01:05,830 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2580720.0, ans=0.0 2024-08-14 09:01:06,902 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 11700, loss[loss=0.09712, beats_loss=0.01153, ecapa_loss=0.0001507, whisper_loss=0.08408, over 18147.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01085, ecapa_loss=0.0001569, whisper_loss=0.09111, over 3933570.03 frames. ], batch size: 69, lr: 3.44e-03, grad_scale: 5.764607523034235e+17 2024-08-14 09:01:18,968 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2580720.0, ans=0.2 2024-08-14 09:01:22,164 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.34 vs. limit=15.0 2024-08-14 09:01:27,182 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 26 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-14 09:01:31,799 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 25 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-14 09:01:36,165 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2580920.0, ans=0.1 2024-08-14 09:02:17,729 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.60 vs. limit=15.0 2024-08-14 09:02:18,077 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 11750, loss[loss=0.09494, beats_loss=0.01147, ecapa_loss=0.0001489, whisper_loss=0.08198, over 18839.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01084, ecapa_loss=0.0001571, whisper_loss=0.0909, over 3929117.48 frames. ], batch size: 77, lr: 3.44e-03, grad_scale: 5.764607523034235e+17 2024-08-14 09:02:27,826 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.88 vs. limit=15.0 2024-08-14 09:02:46,718 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=19.56 vs. limit=22.5 2024-08-14 09:03:05,654 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 39 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-14 09:03:06,180 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2581520.0, ans=0.04949747468305833 2024-08-14 09:03:07,028 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.345e+01 2.656e+01 2.861e+01 8.705e+01, threshold=5.311e+01, percent-clipped=2.0 2024-08-14 09:03:07,623 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2581520.0, ans=0.125 2024-08-14 09:03:11,946 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.594e-03 2024-08-14 09:03:25,349 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2581620.0, ans=0.04949747468305833 2024-08-14 09:03:30,094 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 11800, loss[loss=0.1128, beats_loss=0.008149, ecapa_loss=0.000185, whisper_loss=0.1028, over 23399.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01085, ecapa_loss=0.0001579, whisper_loss=0.09093, over 3927668.74 frames. ], batch size: 94, lr: 3.44e-03, grad_scale: 5.764607523034235e+17 2024-08-14 09:04:19,657 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.32 vs. limit=15.0 2024-08-14 09:04:25,912 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.70 vs. limit=10.0 2024-08-14 09:04:31,913 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2582020.0, ans=0.05 2024-08-14 09:04:35,005 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2582020.0, ans=0.125 2024-08-14 09:04:53,262 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2582120.0, ans=0.125 2024-08-14 09:04:57,146 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2582220.0, ans=0.125 2024-08-14 09:04:58,262 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 11850, loss[loss=0.08435, beats_loss=0.011, ecapa_loss=0.0001791, whisper_loss=0.07156, over 21142.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01085, ecapa_loss=0.0001577, whisper_loss=0.09075, over 3937121.43 frames. ], batch size: 88, lr: 3.44e-03, grad_scale: 5.764607523034235e+17 2024-08-14 09:05:08,872 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 24 from LS+wenet, 33 from Vox, 32 fro AS 2024-08-14 09:05:22,337 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2582320.0, ans=0.0 2024-08-14 09:05:36,818 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2582420.0, ans=0.0 2024-08-14 09:05:38,256 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=2582420.0, ans=0.95 2024-08-14 09:05:41,047 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-14 09:05:49,144 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=2582420.0, ans=22.5 2024-08-14 09:05:57,422 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.66 vs. limit=15.0 2024-08-14 09:06:01,225 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.908e+01 2.374e+01 2.631e+01 2.932e+01 6.705e+01, threshold=5.263e+01, percent-clipped=1.0 2024-08-14 09:06:03,662 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2582520.0, ans=0.0 2024-08-14 09:06:30,899 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 11900, loss[loss=0.1194, beats_loss=0.006283, ecapa_loss=0.0002073, whisper_loss=0.1111, over 19096.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01081, ecapa_loss=0.0001577, whisper_loss=0.09112, over 3951024.13 frames. ], batch size: 81, lr: 3.44e-03, grad_scale: 5.764607523034235e+17 2024-08-14 09:06:31,505 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2582720.0, ans=0.125 2024-08-14 09:06:35,097 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2582720.0, ans=0.125 2024-08-14 09:06:54,259 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 24 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-14 09:07:04,690 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2582820.0, ans=0.0 2024-08-14 09:07:24,156 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2582920.0, ans=0.2 2024-08-14 09:07:52,301 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-14 09:08:01,772 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 11950, loss[loss=0.08675, beats_loss=0.01182, ecapa_loss=0.0001713, whisper_loss=0.07322, over 17061.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01075, ecapa_loss=0.0001587, whisper_loss=0.0912, over 3924212.46 frames. ], batch size: 73, lr: 3.44e-03, grad_scale: 5.764607523034235e+17 2024-08-14 09:08:10,663 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 17 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-14 09:08:26,755 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-14 09:08:34,923 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.59 vs. limit=15.0 2024-08-14 09:08:50,916 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 22 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-14 09:08:51,517 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2583520.0, ans=0.125 2024-08-14 09:08:51,546 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2583520.0, ans=0.1 2024-08-14 09:08:55,745 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.707e+01 2.350e+01 2.661e+01 2.995e+01 4.363e+01, threshold=5.322e+01, percent-clipped=0.0 2024-08-14 09:09:03,144 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 24 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-14 09:09:12,503 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2583620.0, ans=0.0 2024-08-14 09:09:15,277 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2583620.0, ans=0.125 2024-08-14 09:09:16,641 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2583620.0, ans=0.1 2024-08-14 09:09:22,792 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 12000, loss[loss=0.1104, beats_loss=0.01011, ecapa_loss=0.000188, whisper_loss=0.09841, over 18124.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01087, ecapa_loss=0.0001567, whisper_loss=0.09042, over 3891431.70 frames. ], batch size: 74, lr: 3.44e-03, grad_scale: 5.764607523034235e+17 2024-08-14 09:09:22,792 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-14 09:10:01,047 INFO [train_multi_KD3.py:1149] (1/4) Epoch 18, validation on ASR_libri: loss=0.2533, beats_loss=0, ecapa_loss=0.0005459, whisper_loss=0.2479, over 922467.00 frames. 2024-08-14 09:10:19,103 INFO [train_multi_KD3.py:1149] (1/4) Epoch 18, validation on SV_voxceleb1: loss=0.004372, beats_loss=0, ecapa_loss=0.0004372, whisper_loss=0, over 939242.00 frames. 2024-08-14 09:12:09,240 INFO [train_multi_KD3.py:1149] (1/4) Epoch 18, validation on AT_audioset: loss=0.02349, beats_loss=0.02349, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 09:12:09,244 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 30639MB 2024-08-14 09:12:18,580 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 22 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-14 09:12:32,597 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.83 vs. limit=15.0 2024-08-14 09:12:38,667 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.70 vs. limit=15.0 2024-08-14 09:12:39,699 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2583920.0, ans=0.0 2024-08-14 09:12:46,258 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 19 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-14 09:13:17,438 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2584120.0, ans=0.125 2024-08-14 09:13:20,485 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2584120.0, ans=0.0 2024-08-14 09:13:27,578 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 12050, loss[loss=0.1119, beats_loss=0.01117, ecapa_loss=0.0001457, whisper_loss=0.09932, over 20649.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0108, ecapa_loss=0.0001571, whisper_loss=0.09059, over 3875463.24 frames. ], batch size: 81, lr: 3.44e-03, grad_scale: 5.764607523034235e+17 2024-08-14 09:14:01,033 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 15 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-14 09:14:04,319 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2584420.0, ans=0.1 2024-08-14 09:14:05,703 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2584420.0, ans=0.125 2024-08-14 09:14:11,135 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 24 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-14 09:14:20,239 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.845e+01 2.316e+01 2.502e+01 2.940e+01 4.119e+01, threshold=5.004e+01, percent-clipped=0.0 2024-08-14 09:14:21,025 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2584520.0, ans=0.125 2024-08-14 09:14:21,094 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2584520.0, ans=0.0 2024-08-14 09:14:38,349 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.27 vs. limit=12.0 2024-08-14 09:14:44,912 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 12100, loss[loss=0.08503, beats_loss=0.01073, ecapa_loss=0.0001401, whisper_loss=0.07289, over 21523.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0107, ecapa_loss=0.0001578, whisper_loss=0.09083, over 3852681.84 frames. ], batch size: 84, lr: 3.44e-03, grad_scale: 5.764607523034235e+17 2024-08-14 09:14:48,442 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2584720.0, ans=0.125 2024-08-14 09:14:57,305 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2584720.0, ans=0.125 2024-08-14 09:15:04,748 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2584820.0, ans=0.1 2024-08-14 09:16:00,122 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 12150, loss[loss=0.1161, beats_loss=0.01032, ecapa_loss=0.0001564, whisper_loss=0.1042, over 23644.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01069, ecapa_loss=0.000157, whisper_loss=0.09102, over 3859245.05 frames. ], batch size: 93, lr: 3.44e-03, grad_scale: 5.764607523034235e+17 2024-08-14 09:16:08,031 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 15 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-14 09:16:35,177 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-14 09:16:36,916 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2585420.0, ans=0.125 2024-08-14 09:16:46,048 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.23 vs. limit=15.0 2024-08-14 09:16:50,849 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.916e+01 2.391e+01 2.592e+01 3.075e+01 2.876e+02, threshold=5.185e+01, percent-clipped=6.0 2024-08-14 09:16:53,159 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2585520.0, ans=0.125 2024-08-14 09:17:03,212 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 32 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-14 09:17:04,718 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 21 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-14 09:17:08,879 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2585620.0, ans=0.125 2024-08-14 09:17:10,333 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2585620.0, ans=0.125 2024-08-14 09:17:15,293 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 12200, loss[loss=0.06171, beats_loss=0.01532, ecapa_loss=0.0001589, whisper_loss=0.0448, over 13685.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01066, ecapa_loss=0.0001568, whisper_loss=0.09131, over 3842991.46 frames. ], batch size: 60, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:17:27,555 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 21 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-14 09:17:32,422 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 28 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-14 09:17:39,307 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 30 from LS+wenet, 29 from Vox, 24 fro AS 2024-08-14 09:17:53,672 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 30 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-14 09:17:53,979 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2585920.0, ans=0.125 2024-08-14 09:17:57,684 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2585920.0, ans=0.125 2024-08-14 09:18:10,560 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 20 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-14 09:18:16,187 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 26 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-14 09:18:22,035 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 28 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-14 09:18:23,352 WARNING [optim.py:496] (1/4) Scaling gradients by 0.06917443126440048, model_norm_threshold=51.84561538696289 2024-08-14 09:18:23,547 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.22, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.256e+05, grad_sumsq=1.256e+05, orig_rms_sq=1.000e+00 2024-08-14 09:18:24,069 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 17 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-14 09:18:29,323 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 12250, loss[loss=0.1036, beats_loss=0.01107, ecapa_loss=0.0001435, whisper_loss=0.0911, over 22956.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01057, ecapa_loss=0.0001578, whisper_loss=0.09196, over 3853497.32 frames. ], batch size: 92, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:18:37,232 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 23 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-14 09:18:42,021 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2586220.0, ans=0.2 2024-08-14 09:19:23,654 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.783e+01 2.394e+01 2.719e+01 3.099e+01 7.495e+02, threshold=5.439e+01, percent-clipped=1.0 2024-08-14 09:19:31,886 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 33 from LS+wenet, 31 from Vox, 27 fro AS 2024-08-14 09:19:47,132 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 12300, loss[loss=0.08482, beats_loss=0.01028, ecapa_loss=0.0001273, whisper_loss=0.07327, over 15323.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01063, ecapa_loss=0.0001579, whisper_loss=0.09204, over 3872391.55 frames. ], batch size: 57, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:19:50,066 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 29 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-14 09:19:51,557 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 21 from LS+wenet, 32 from Vox, 31 fro AS 2024-08-14 09:19:53,576 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 31 from LS+wenet, 30 from Vox, 26 fro AS 2024-08-14 09:19:59,435 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2586720.0, ans=0.125 2024-08-14 09:20:06,182 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2586820.0, ans=0.125 2024-08-14 09:20:16,173 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2586820.0, ans=0.0 2024-08-14 09:20:32,656 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2586920.0, ans=0.0 2024-08-14 09:21:22,161 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 12350, loss[loss=0.09782, beats_loss=0.009544, ecapa_loss=0.0001583, whisper_loss=0.08669, over 16149.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01058, ecapa_loss=0.0001577, whisper_loss=0.09221, over 3880993.45 frames. ], batch size: 63, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:21:25,697 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2587220.0, ans=0.1 2024-08-14 09:21:35,022 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 26 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-14 09:22:06,414 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 19 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-14 09:22:16,671 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 24 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-14 09:22:24,034 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.766e+01 2.397e+01 2.618e+01 2.960e+01 3.782e+01, threshold=5.235e+01, percent-clipped=0.0 2024-08-14 09:22:32,573 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=2587620.0, ans=15.0 2024-08-14 09:22:37,810 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 22 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-14 09:22:47,831 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 12400, loss[loss=0.09487, beats_loss=0.01307, ecapa_loss=0.0001435, whisper_loss=0.08036, over 15736.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01059, ecapa_loss=0.0001574, whisper_loss=0.09166, over 3876794.55 frames. ], batch size: 62, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:23:04,636 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2587820.0, ans=0.125 2024-08-14 09:23:06,722 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.60 vs. limit=12.0 2024-08-14 09:23:08,160 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2587820.0, ans=0.125 2024-08-14 09:23:09,489 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2587820.0, ans=0.1 2024-08-14 09:23:19,598 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 25 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-14 09:23:27,680 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2587920.0, ans=0.0 2024-08-14 09:23:47,408 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2588120.0, ans=0.2 2024-08-14 09:23:49,139 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.48 vs. limit=15.0 2024-08-14 09:23:59,095 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2588120.0, ans=0.0 2024-08-14 09:24:01,817 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 34 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-14 09:24:02,884 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 12450, loss[loss=0.1217, beats_loss=0.009548, ecapa_loss=0.0001676, whisper_loss=0.1104, over 24238.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01066, ecapa_loss=0.0001578, whisper_loss=0.09103, over 3887681.94 frames. ], batch size: 94, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:24:04,922 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2588220.0, ans=0.0 2024-08-14 09:24:20,055 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.04 vs. limit=6.0 2024-08-14 09:24:22,774 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 32 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-14 09:24:23,139 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 09:24:25,748 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 27 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-14 09:24:31,829 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2588420.0, ans=0.0 2024-08-14 09:24:48,177 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 24 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-14 09:24:50,047 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2588520.0, ans=0.125 2024-08-14 09:24:55,283 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.811e+01 2.420e+01 2.657e+01 3.140e+01 9.625e+01, threshold=5.314e+01, percent-clipped=1.0 2024-08-14 09:25:18,888 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 12500, loss[loss=0.08528, beats_loss=0.01411, ecapa_loss=0.0001386, whisper_loss=0.06978, over 21889.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0107, ecapa_loss=0.000157, whisper_loss=0.0906, over 3910092.53 frames. ], batch size: 92, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:25:30,544 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2588720.0, ans=0.125 2024-08-14 09:25:38,086 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2588820.0, ans=0.025 2024-08-14 09:25:50,386 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2588920.0, ans=0.1 2024-08-14 09:25:50,412 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2588920.0, ans=0.0 2024-08-14 09:25:54,741 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2588920.0, ans=0.125 2024-08-14 09:25:57,558 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2588920.0, ans=0.125 2024-08-14 09:26:24,821 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2589120.0, ans=0.0 2024-08-14 09:26:26,216 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2589120.0, ans=0.0 2024-08-14 09:26:26,477 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.60 vs. limit=15.0 2024-08-14 09:26:30,896 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2589120.0, ans=0.1 2024-08-14 09:26:35,412 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 12550, loss[loss=0.1186, beats_loss=0.01014, ecapa_loss=0.0001567, whisper_loss=0.1069, over 19701.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01071, ecapa_loss=0.0001574, whisper_loss=0.09076, over 3912379.45 frames. ], batch size: 77, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:26:53,892 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2589320.0, ans=0.1 2024-08-14 09:26:55,249 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2589320.0, ans=0.1 2024-08-14 09:27:29,343 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.797e+01 2.466e+01 2.734e+01 3.063e+01 5.302e+01, threshold=5.468e+01, percent-clipped=0.0 2024-08-14 09:27:36,797 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.025e+01 2024-08-14 09:27:52,010 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 14 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-14 09:27:54,985 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 12600, loss[loss=0.1028, beats_loss=0.01083, ecapa_loss=0.0001556, whisper_loss=0.09044, over 20808.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0107, ecapa_loss=0.0001571, whisper_loss=0.09052, over 3880960.63 frames. ], batch size: 83, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:28:03,128 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2589720.0, ans=0.125 2024-08-14 09:28:10,612 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2589720.0, ans=0.125 2024-08-14 09:28:36,993 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.18 vs. limit=22.5 2024-08-14 09:29:07,481 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.518e-03 2024-08-14 09:29:12,804 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 15 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-14 09:29:28,703 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2590220.0, ans=0.0 2024-08-14 09:29:29,736 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 12650, loss[loss=0.0971, beats_loss=0.01245, ecapa_loss=0.0001372, whisper_loss=0.08328, over 23535.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01077, ecapa_loss=0.0001569, whisper_loss=0.09091, over 3907593.96 frames. ], batch size: 95, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:30:00,235 INFO [train_multi_KD3.py:844] (1/4) A total of 96 cuts. 28 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-14 09:30:10,616 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2590420.0, ans=0.125 2024-08-14 09:30:18,894 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.40 vs. limit=15.0 2024-08-14 09:30:19,593 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 29 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-14 09:30:25,892 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2590420.0, ans=0.1 2024-08-14 09:30:27,310 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2590520.0, ans=0.1 2024-08-14 09:30:31,026 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 32 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-14 09:30:37,390 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.924e+01 2.252e+01 2.572e+01 2.889e+01 4.246e+01, threshold=5.144e+01, percent-clipped=0.0 2024-08-14 09:30:45,280 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2590620.0, ans=0.0 2024-08-14 09:30:57,726 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.465e+01 2024-08-14 09:31:01,484 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 12700, loss[loss=0.1127, beats_loss=0.006982, ecapa_loss=0.0001496, whisper_loss=0.1042, over 15255.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01076, ecapa_loss=0.0001566, whisper_loss=0.0911, over 3900659.53 frames. ], batch size: 58, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:31:19,139 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 17 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-14 09:31:24,145 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2590820.0, ans=0.1 2024-08-14 09:32:15,724 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 12750, loss[loss=0.0866, beats_loss=0.01346, ecapa_loss=0.0001303, whisper_loss=0.07184, over 21784.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01085, ecapa_loss=0.0001557, whisper_loss=0.0903, over 3875452.10 frames. ], batch size: 89, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:32:21,032 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.72 vs. limit=15.0 2024-08-14 09:32:22,329 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.01 vs. limit=15.0 2024-08-14 09:32:28,102 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2591220.0, ans=0.1 2024-08-14 09:32:37,330 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2591320.0, ans=0.1 2024-08-14 09:32:52,012 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2591420.0, ans=0.125 2024-08-14 09:32:54,472 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-14 09:33:05,078 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2591520.0, ans=0.1 2024-08-14 09:33:07,386 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.072e+01 2.400e+01 2.605e+01 3.000e+01 2.756e+02, threshold=5.209e+01, percent-clipped=1.0 2024-08-14 09:33:12,519 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2591520.0, ans=0.125 2024-08-14 09:33:18,929 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2591620.0, ans=0.125 2024-08-14 09:33:21,399 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 21 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-14 09:33:30,126 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 12800, loss[loss=0.1162, beats_loss=0.008425, ecapa_loss=0.0002044, whisper_loss=0.1057, over 20782.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01082, ecapa_loss=0.0001576, whisper_loss=0.09098, over 3863655.02 frames. ], batch size: 87, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:34:19,814 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 09:34:21,035 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 22 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-14 09:34:22,666 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2592020.0, ans=0.125 2024-08-14 09:34:25,637 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2592020.0, ans=0.09899494936611666 2024-08-14 09:34:31,382 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 25 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-14 09:34:31,759 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2592120.0, ans=0.0 2024-08-14 09:34:31,986 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.30 vs. limit=15.0 2024-08-14 09:35:17,922 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 12850, loss[loss=0.09511, beats_loss=0.01003, ecapa_loss=0.0001959, whisper_loss=0.08312, over 15149.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01083, ecapa_loss=0.0001579, whisper_loss=0.09027, over 3846716.51 frames. ], batch size: 63, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:35:34,796 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.51 vs. limit=15.0 2024-08-14 09:35:43,337 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2592320.0, ans=0.0 2024-08-14 09:35:45,530 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.04 vs. limit=15.0 2024-08-14 09:35:49,173 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.97 vs. limit=15.0 2024-08-14 09:35:54,883 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2592320.0, ans=0.0 2024-08-14 09:36:05,030 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 19 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-14 09:36:05,385 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2592420.0, ans=0.2 2024-08-14 09:36:09,441 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2592420.0, ans=0.05 2024-08-14 09:36:10,642 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 19 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-14 09:36:13,893 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2592420.0, ans=0.2 2024-08-14 09:36:23,590 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 2.339e+01 2.541e+01 2.791e+01 1.384e+02, threshold=5.082e+01, percent-clipped=3.0 2024-08-14 09:36:42,148 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2592620.0, ans=0.0 2024-08-14 09:36:46,333 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 12900, loss[loss=0.1161, beats_loss=0.008126, ecapa_loss=0.0002104, whisper_loss=0.1059, over 18452.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01077, ecapa_loss=0.0001593, whisper_loss=0.08997, over 3822105.89 frames. ], batch size: 80, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:36:50,065 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2592720.0, ans=0.2 2024-08-14 09:37:00,432 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2592720.0, ans=0.125 2024-08-14 09:37:15,987 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2592820.0, ans=0.1 2024-08-14 09:37:23,003 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 17 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-14 09:37:29,260 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 22 from LS+wenet, 18 from Vox, 15 fro AS 2024-08-14 09:38:08,757 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 25 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-14 09:38:23,012 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2593120.0, ans=0.1 2024-08-14 09:38:37,806 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 12950, loss[loss=0.1107, beats_loss=0.009305, ecapa_loss=0.0001752, whisper_loss=0.09969, over 21064.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01073, ecapa_loss=0.000159, whisper_loss=0.09015, over 3821846.78 frames. ], batch size: 85, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:38:38,468 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2593220.0, ans=0.125 2024-08-14 09:38:43,101 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2593220.0, ans=0.125 2024-08-14 09:39:00,673 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2593320.0, ans=0.125 2024-08-14 09:39:39,052 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.28 vs. limit=10.0 2024-08-14 09:39:50,468 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.001e+01 2.412e+01 2.710e+01 3.075e+01 4.932e+01, threshold=5.420e+01, percent-clipped=0.0 2024-08-14 09:40:12,097 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2593620.0, ans=0.0 2024-08-14 09:40:13,723 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 14 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-14 09:40:27,822 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 13000, loss[loss=0.09671, beats_loss=0.008923, ecapa_loss=0.0001973, whisper_loss=0.08581, over 18583.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01075, ecapa_loss=0.0001583, whisper_loss=0.09049, over 3841754.23 frames. ], batch size: 80, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:40:29,982 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2593720.0, ans=0.0 2024-08-14 09:40:34,858 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2593720.0, ans=0.125 2024-08-14 09:40:47,875 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2593720.0, ans=0.0 2024-08-14 09:40:57,284 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 18 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-14 09:41:06,917 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2593820.0, ans=0.1 2024-08-14 09:41:15,928 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2593920.0, ans=0.1 2024-08-14 09:41:15,934 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2593920.0, ans=0.125 2024-08-14 09:41:40,399 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2594020.0, ans=0.1 2024-08-14 09:41:44,826 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2594020.0, ans=0.125 2024-08-14 09:42:14,520 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2594120.0, ans=0.125 2024-08-14 09:42:22,878 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 13050, loss[loss=0.1067, beats_loss=0.01201, ecapa_loss=0.0001718, whisper_loss=0.09298, over 21343.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01078, ecapa_loss=0.0001573, whisper_loss=0.09036, over 3853252.79 frames. ], batch size: 90, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:42:33,532 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2594220.0, ans=0.0 2024-08-14 09:42:50,540 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2594320.0, ans=0.0 2024-08-14 09:43:02,640 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2594420.0, ans=0.0 2024-08-14 09:43:07,135 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 22 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-14 09:43:20,427 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-14 09:43:32,893 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.734e+01 2.338e+01 2.607e+01 2.942e+01 4.688e+01, threshold=5.215e+01, percent-clipped=0.0 2024-08-14 09:43:40,379 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2594520.0, ans=0.125 2024-08-14 09:43:42,070 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2594620.0, ans=0.125 2024-08-14 09:43:45,538 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2594620.0, ans=0.125 2024-08-14 09:43:52,910 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2594620.0, ans=0.125 2024-08-14 09:44:01,396 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 09:44:03,060 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 13100, loss[loss=0.1145, beats_loss=0.009309, ecapa_loss=0.000161, whisper_loss=0.1036, over 21480.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01082, ecapa_loss=0.0001567, whisper_loss=0.08981, over 3860730.41 frames. ], batch size: 84, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:44:18,552 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2594720.0, ans=0.125 2024-08-14 09:44:24,906 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.88 vs. limit=15.0 2024-08-14 09:44:26,923 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2594820.0, ans=0.125 2024-08-14 09:44:32,846 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2594820.0, ans=0.125 2024-08-14 09:44:52,194 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2594920.0, ans=0.125 2024-08-14 09:44:59,759 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.00 vs. limit=12.0 2024-08-14 09:45:01,037 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2595020.0, ans=0.125 2024-08-14 09:45:12,039 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=7.825e-02 2024-08-14 09:45:29,475 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 13150, loss[loss=0.07255, beats_loss=0.009671, ecapa_loss=0.0001893, whisper_loss=0.06098, over 15241.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01076, ecapa_loss=0.0001563, whisper_loss=0.08986, over 3829862.41 frames. ], batch size: 61, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:45:32,416 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 18 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-14 09:45:36,491 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 15 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-14 09:45:42,912 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2595320.0, ans=0.0 2024-08-14 09:45:45,464 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 19 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-14 09:45:58,886 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 09:46:05,791 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2595420.0, ans=0.0 2024-08-14 09:46:19,409 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2595520.0, ans=0.0 2024-08-14 09:46:20,204 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.778e+01 2.308e+01 2.613e+01 2.975e+01 4.681e+01, threshold=5.226e+01, percent-clipped=0.0 2024-08-14 09:46:42,161 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 13200, loss[loss=0.09417, beats_loss=0.01167, ecapa_loss=0.0001714, whisper_loss=0.08078, over 20287.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01073, ecapa_loss=0.0001582, whisper_loss=0.08963, over 3846775.18 frames. ], batch size: 87, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:46:54,035 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2595720.0, ans=0.125 2024-08-14 09:47:11,362 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 19 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-14 09:47:35,434 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-14 09:47:37,563 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2596020.0, ans=0.125 2024-08-14 09:47:49,207 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2596020.0, ans=0.1 2024-08-14 09:47:49,236 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2596020.0, ans=0.0 2024-08-14 09:48:00,681 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.23 vs. limit=15.0 2024-08-14 09:48:04,050 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2596120.0, ans=0.125 2024-08-14 09:48:07,285 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.82 vs. limit=6.0 2024-08-14 09:48:08,884 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.32 vs. limit=15.0 2024-08-14 09:48:16,360 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 13250, loss[loss=0.07347, beats_loss=0.009602, ecapa_loss=0.0001387, whisper_loss=0.06248, over 15708.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01074, ecapa_loss=0.0001584, whisper_loss=0.08936, over 3819632.70 frames. ], batch size: 61, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:48:57,033 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2596420.0, ans=0.2 2024-08-14 09:49:12,542 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 16 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-14 09:49:19,305 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.29 vs. limit=15.0 2024-08-14 09:49:20,586 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2596520.0, ans=0.125 2024-08-14 09:49:21,888 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 26 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-14 09:49:26,598 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.946e+01 2.423e+01 2.644e+01 3.025e+01 2.002e+02, threshold=5.289e+01, percent-clipped=3.0 2024-08-14 09:49:38,942 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 24 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-14 09:49:53,271 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 22 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-14 09:49:57,293 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 13300, loss[loss=0.09306, beats_loss=0.0103, ecapa_loss=0.0001568, whisper_loss=0.0812, over 14842.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01068, ecapa_loss=0.0001578, whisper_loss=0.09018, over 3804583.90 frames. ], batch size: 60, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:50:17,387 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2596820.0, ans=0.5 2024-08-14 09:50:25,945 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.30 vs. limit=12.0 2024-08-14 09:50:36,973 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2596820.0, ans=0.1 2024-08-14 09:50:37,401 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2596820.0, ans=0.0 2024-08-14 09:51:16,728 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2597120.0, ans=0.0 2024-08-14 09:51:18,786 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.62 vs. limit=15.0 2024-08-14 09:51:23,547 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2597120.0, ans=0.125 2024-08-14 09:51:31,050 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 13350, loss[loss=0.1103, beats_loss=0.01096, ecapa_loss=0.0001462, whisper_loss=0.09785, over 14914.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01068, ecapa_loss=0.0001577, whisper_loss=0.09027, over 3839289.46 frames. ], batch size: 57, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:51:34,372 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 20 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-14 09:52:22,279 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 28 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-14 09:52:24,383 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2597520.0, ans=0.125 2024-08-14 09:52:24,669 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.96 vs. limit=10.0 2024-08-14 09:52:25,066 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.040e+01 2.409e+01 2.670e+01 3.063e+01 5.921e+01, threshold=5.339e+01, percent-clipped=1.0 2024-08-14 09:52:30,709 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2597520.0, ans=0.0 2024-08-14 09:52:45,737 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2597620.0, ans=0.125 2024-08-14 09:52:47,930 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 13400, loss[loss=0.1043, beats_loss=0.01052, ecapa_loss=0.0001685, whisper_loss=0.09207, over 22334.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01063, ecapa_loss=0.0001574, whisper_loss=0.09091, over 3847015.96 frames. ], batch size: 92, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:52:53,635 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2597720.0, ans=0.0 2024-08-14 09:52:59,777 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2597720.0, ans=0.125 2024-08-14 09:53:15,165 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2597820.0, ans=0.125 2024-08-14 09:53:16,793 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2597820.0, ans=0.125 2024-08-14 09:53:32,664 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2597920.0, ans=0.035 2024-08-14 09:53:37,071 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2598020.0, ans=0.07 2024-08-14 09:53:42,062 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2598020.0, ans=0.125 2024-08-14 09:53:54,950 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-14 09:53:59,515 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2598120.0, ans=0.125 2024-08-14 09:54:06,826 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 13450, loss[loss=0.09066, beats_loss=0.01146, ecapa_loss=0.0001058, whisper_loss=0.07814, over 16858.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01073, ecapa_loss=0.0001561, whisper_loss=0.09027, over 3854613.83 frames. ], batch size: 64, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:54:18,332 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 31 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-14 09:54:24,956 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2598320.0, ans=0.125 2024-08-14 09:54:35,961 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 16 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-14 09:54:48,486 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2598420.0, ans=0.125 2024-08-14 09:54:59,305 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.865e+01 2.340e+01 2.558e+01 2.955e+01 5.061e+01, threshold=5.115e+01, percent-clipped=0.0 2024-08-14 09:55:09,115 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2598620.0, ans=0.0 2024-08-14 09:55:17,196 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-14 09:55:20,854 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 13500, loss[loss=0.1113, beats_loss=0.009713, ecapa_loss=0.0001429, whisper_loss=0.1002, over 22586.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01066, ecapa_loss=0.0001567, whisper_loss=0.09062, over 3852404.89 frames. ], batch size: 84, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:55:53,769 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2598920.0, ans=0.125 2024-08-14 09:56:01,217 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2598920.0, ans=0.125 2024-08-14 09:56:08,558 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2599020.0, ans=0.125 2024-08-14 09:56:09,956 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2599020.0, ans=0.125 2024-08-14 09:56:14,046 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 20 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-14 09:56:15,433 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 16 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-14 09:56:33,442 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 13550, loss[loss=0.1179, beats_loss=0.008715, ecapa_loss=0.000181, whisper_loss=0.1073, over 17127.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0107, ecapa_loss=0.0001563, whisper_loss=0.09063, over 3876608.94 frames. ], batch size: 67, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:56:44,238 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2599220.0, ans=0.1 2024-08-14 09:56:54,129 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 13 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-14 09:56:56,011 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.77 vs. limit=22.5 2024-08-14 09:56:56,655 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 27 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-14 09:57:24,010 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.012e+01 2.303e+01 2.544e+01 2.977e+01 7.464e+01, threshold=5.088e+01, percent-clipped=1.0 2024-08-14 09:57:38,520 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 18 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-14 09:57:38,917 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2599620.0, ans=0.125 2024-08-14 09:57:40,731 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.86 vs. limit=15.0 2024-08-14 09:57:45,655 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 13600, loss[loss=0.09303, beats_loss=0.01157, ecapa_loss=0.0002024, whisper_loss=0.07944, over 20442.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01072, ecapa_loss=0.0001563, whisper_loss=0.09011, over 3844252.98 frames. ], batch size: 85, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:58:12,676 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2599820.0, ans=0.2 2024-08-14 09:58:23,234 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.97 vs. limit=15.0 2024-08-14 09:58:23,975 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 22 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-14 09:58:31,098 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 24 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-14 09:58:34,054 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2600020.0, ans=0.125 2024-08-14 09:58:36,639 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 20 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-14 09:58:39,811 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2600020.0, ans=0.1 2024-08-14 09:58:44,018 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2600020.0, ans=0.125 2024-08-14 09:58:49,900 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 21 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-14 09:58:55,811 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 20 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-14 09:59:01,281 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 13650, loss[loss=0.1229, beats_loss=0.008123, ecapa_loss=0.0001644, whisper_loss=0.1131, over 22485.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01079, ecapa_loss=0.0001556, whisper_loss=0.08991, over 3838711.99 frames. ], batch size: 89, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:59:01,557 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 35 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-14 09:59:14,616 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 12 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-14 09:59:17,593 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2600320.0, ans=0.125 2024-08-14 09:59:18,989 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2600320.0, ans=0.0 2024-08-14 09:59:19,026 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2600320.0, ans=0.05 2024-08-14 09:59:20,378 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2600320.0, ans=0.125 2024-08-14 09:59:24,879 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2600320.0, ans=0.2 2024-08-14 09:59:28,121 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.01 vs. limit=15.0 2024-08-14 09:59:33,039 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 31 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-14 09:59:38,552 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-14 09:59:51,096 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.604e+01 2.361e+01 2.642e+01 3.028e+01 5.099e+01, threshold=5.285e+01, percent-clipped=1.0 2024-08-14 10:00:07,643 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 17 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-14 10:00:08,179 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.930e+01 2024-08-14 10:00:13,501 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 13700, loss[loss=0.1094, beats_loss=0.01041, ecapa_loss=0.0001384, whisper_loss=0.09757, over 16907.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.0108, ecapa_loss=0.0001559, whisper_loss=0.09113, over 3889957.61 frames. ], batch size: 63, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 10:00:28,534 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2600820.0, ans=0.09899494936611666 2024-08-14 10:00:31,924 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.45 vs. limit=15.0 2024-08-14 10:01:00,248 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2601020.0, ans=0.1 2024-08-14 10:01:01,498 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 21 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-14 10:01:22,770 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.44 vs. limit=10.0 2024-08-14 10:01:26,308 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 13750, loss[loss=0.1233, beats_loss=0.008766, ecapa_loss=0.0001632, whisper_loss=0.1129, over 22658.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01074, ecapa_loss=0.0001574, whisper_loss=0.09194, over 3891684.77 frames. ], batch size: 89, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 10:01:26,878 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2601220.0, ans=0.125 2024-08-14 10:01:30,821 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 23 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-14 10:01:33,649 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 23 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-14 10:01:36,460 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 22 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-14 10:01:40,928 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 19 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-14 10:01:55,880 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 27 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-14 10:02:17,139 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.453e+01 2.720e+01 3.144e+01 4.957e+01, threshold=5.441e+01, percent-clipped=0.0 2024-08-14 10:02:17,847 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2601520.0, ans=0.125 2024-08-14 10:02:33,178 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.81 vs. limit=15.0 2024-08-14 10:02:39,707 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 13800, loss[loss=0.1201, beats_loss=0.01109, ecapa_loss=0.0001246, whisper_loss=0.1077, over 19577.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01079, ecapa_loss=0.0001557, whisper_loss=0.09129, over 3845474.31 frames. ], batch size: 73, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 10:02:54,351 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2601820.0, ans=0.125 2024-08-14 10:03:14,352 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2601920.0, ans=0.2 2024-08-14 10:03:25,678 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.97 vs. limit=10.0 2024-08-14 10:03:30,342 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 32 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-14 10:03:46,156 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 20 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-14 10:03:51,495 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 13850, loss[loss=0.1207, beats_loss=0.009043, ecapa_loss=0.0001756, whisper_loss=0.1099, over 22200.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01076, ecapa_loss=0.0001555, whisper_loss=0.09109, over 3871662.39 frames. ], batch size: 86, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 10:03:51,834 INFO [train_multi_KD3.py:844] (1/4) A total of 96 cuts. 28 from LS+wenet, 33 from Vox, 35 fro AS 2024-08-14 10:03:53,282 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 34 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-14 10:04:08,851 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-14 10:04:20,529 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2602420.0, ans=0.125 2024-08-14 10:04:29,265 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.91 vs. limit=15.0 2024-08-14 10:04:30,568 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.63 vs. limit=22.5 2024-08-14 10:04:34,319 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 18 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-14 10:04:40,996 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.987e+01 2.427e+01 2.695e+01 2.897e+01 4.823e+02, threshold=5.391e+01, percent-clipped=1.0 2024-08-14 10:04:59,268 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2602620.0, ans=0.125 2024-08-14 10:05:02,929 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 13900, loss[loss=0.1088, beats_loss=0.007494, ecapa_loss=0.0001641, whisper_loss=0.09963, over 17154.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01064, ecapa_loss=0.0001558, whisper_loss=0.09174, over 3876405.71 frames. ], batch size: 64, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 10:05:04,975 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2602720.0, ans=0.0 2024-08-14 10:05:13,523 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 20 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-14 10:05:16,647 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2602820.0, ans=0.125 2024-08-14 10:05:25,323 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2602820.0, ans=0.0 2024-08-14 10:05:35,490 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2602920.0, ans=0.0 2024-08-14 10:05:47,586 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.99 vs. limit=15.0 2024-08-14 10:06:02,459 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-14 10:06:05,511 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2603120.0, ans=0.0 2024-08-14 10:06:06,877 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2603120.0, ans=0.125 2024-08-14 10:06:14,457 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.50 vs. limit=22.5 2024-08-14 10:06:15,024 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 13950, loss[loss=0.09079, beats_loss=0.009944, ecapa_loss=0.0001901, whisper_loss=0.07895, over 14965.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01071, ecapa_loss=0.0001558, whisper_loss=0.09197, over 3907128.59 frames. ], batch size: 62, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 10:06:21,157 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2603220.0, ans=0.0 2024-08-14 10:06:37,056 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2603320.0, ans=0.1 2024-08-14 10:06:39,350 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 24 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-14 10:06:49,365 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2603420.0, ans=0.0 2024-08-14 10:06:56,945 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2603520.0, ans=0.05 2024-08-14 10:07:04,687 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.996e+01 2.381e+01 2.587e+01 2.937e+01 5.454e+01, threshold=5.174e+01, percent-clipped=1.0 2024-08-14 10:07:06,662 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2603520.0, ans=0.125 2024-08-14 10:07:24,914 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 19 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-14 10:07:26,091 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 14000, loss[loss=0.09005, beats_loss=0.01346, ecapa_loss=0.0001331, whisper_loss=0.07526, over 17107.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01077, ecapa_loss=0.0001541, whisper_loss=0.09136, over 3920393.92 frames. ], batch size: 69, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 10:07:30,566 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 14 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-14 10:07:34,377 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.21 vs. limit=15.0 2024-08-14 10:07:41,493 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2603820.0, ans=0.1 2024-08-14 10:07:52,400 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 21 from LS+wenet, 26 from Vox, 23 fro AS 2024-08-14 10:07:55,420 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2603920.0, ans=0.04949747468305833 2024-08-14 10:08:00,309 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.40 vs. limit=15.0 2024-08-14 10:08:02,821 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 26 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-14 10:08:12,849 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2604020.0, ans=0.125 2024-08-14 10:08:17,495 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2604020.0, ans=0.0 2024-08-14 10:08:26,386 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 28 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-14 10:08:38,160 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 25 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-14 10:08:39,327 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 14050, loss[loss=0.113, beats_loss=0.009457, ecapa_loss=0.0001777, whisper_loss=0.1017, over 18129.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0108, ecapa_loss=0.0001546, whisper_loss=0.09092, over 3900528.38 frames. ], batch size: 72, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 10:08:39,658 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 16 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-14 10:08:39,927 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2604220.0, ans=0.125 2024-08-14 10:08:46,615 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 30 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-14 10:08:51,273 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2604220.0, ans=0.125 2024-08-14 10:09:07,221 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2604420.0, ans=0.125 2024-08-14 10:09:28,933 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2604520.0, ans=0.125 2024-08-14 10:09:29,556 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.346e+01 2.567e+01 2.904e+01 5.000e+01, threshold=5.134e+01, percent-clipped=0.0 2024-08-14 10:09:38,796 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2604620.0, ans=0.125 2024-08-14 10:09:50,854 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 14100, loss[loss=0.1059, beats_loss=0.01194, ecapa_loss=0.000162, whisper_loss=0.09237, over 19404.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01075, ecapa_loss=0.0001542, whisper_loss=0.09131, over 3883621.29 frames. ], batch size: 79, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 10:09:52,797 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=2604720.0, ans=10.0 2024-08-14 10:10:06,291 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.39 vs. limit=15.0 2024-08-14 10:10:10,624 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.41 vs. limit=15.0 2024-08-14 10:10:25,272 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 22 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-14 10:10:28,508 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2604920.0, ans=0.0 2024-08-14 10:10:44,765 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2605020.0, ans=0.125 2024-08-14 10:11:00,581 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 34 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-14 10:11:03,248 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 14150, loss[loss=0.08716, beats_loss=0.01064, ecapa_loss=0.0001733, whisper_loss=0.07479, over 13511.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01082, ecapa_loss=0.0001547, whisper_loss=0.09055, over 3865705.56 frames. ], batch size: 56, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 10:11:05,568 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.70 vs. limit=6.0 2024-08-14 10:11:11,862 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 34 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-14 10:11:30,999 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-14 10:11:31,278 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2605420.0, ans=0.125 2024-08-14 10:11:46,841 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2605520.0, ans=0.2 2024-08-14 10:11:53,304 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.758e+01 2.390e+01 2.553e+01 2.829e+01 7.364e+01, threshold=5.106e+01, percent-clipped=2.0 2024-08-14 10:11:56,890 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2605520.0, ans=0.125 2024-08-14 10:11:59,508 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2605620.0, ans=0.125 2024-08-14 10:12:10,480 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2605620.0, ans=0.125 2024-08-14 10:12:15,750 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 14200, loss[loss=0.1176, beats_loss=0.01128, ecapa_loss=0.0001449, whisper_loss=0.1049, over 23593.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01075, ecapa_loss=0.0001546, whisper_loss=0.09162, over 3857707.37 frames. ], batch size: 90, lr: 3.43e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:12:22,967 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 10:12:42,175 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.00 vs. limit=15.0 2024-08-14 10:12:44,591 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2605920.0, ans=0.0 2024-08-14 10:12:47,201 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 25 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-14 10:12:54,538 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2605920.0, ans=0.125 2024-08-14 10:12:58,941 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 20 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-14 10:13:07,783 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2606020.0, ans=0.125 2024-08-14 10:13:08,920 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 17 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-14 10:13:27,332 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 14250, loss[loss=0.09684, beats_loss=0.01125, ecapa_loss=0.000155, whisper_loss=0.08404, over 18119.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01067, ecapa_loss=0.0001557, whisper_loss=0.09231, over 3866346.91 frames. ], batch size: 71, lr: 3.43e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:13:30,576 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2606220.0, ans=0.1 2024-08-14 10:13:31,981 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2606220.0, ans=0.0 2024-08-14 10:13:36,512 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 33 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-14 10:13:41,074 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2606320.0, ans=0.0 2024-08-14 10:13:47,883 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 25 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-14 10:13:58,671 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.63 vs. limit=15.0 2024-08-14 10:14:17,279 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2606520.0, ans=0.0 2024-08-14 10:14:17,729 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.92 vs. limit=15.0 2024-08-14 10:14:18,359 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.034e+01 2.438e+01 2.661e+01 3.044e+01 6.273e+01, threshold=5.322e+01, percent-clipped=2.0 2024-08-14 10:14:27,859 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.79 vs. limit=15.0 2024-08-14 10:14:39,890 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 14300, loss[loss=0.1054, beats_loss=0.009809, ecapa_loss=0.000184, whisper_loss=0.09379, over 21765.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01073, ecapa_loss=0.0001556, whisper_loss=0.09127, over 3878738.25 frames. ], batch size: 89, lr: 3.43e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:15:07,417 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 26 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-14 10:15:17,681 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 22 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-14 10:15:19,737 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2606920.0, ans=0.0 2024-08-14 10:15:55,700 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 14350, loss[loss=0.09341, beats_loss=0.009454, ecapa_loss=0.0001428, whisper_loss=0.08253, over 22377.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01069, ecapa_loss=0.000156, whisper_loss=0.09095, over 3851329.69 frames. ], batch size: 89, lr: 3.43e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:15:58,025 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2607220.0, ans=0.1 2024-08-14 10:16:01,504 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2607220.0, ans=0.125 2024-08-14 10:16:05,057 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2607220.0, ans=0.125 2024-08-14 10:16:11,608 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 15 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-14 10:16:27,930 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.87 vs. limit=10.0 2024-08-14 10:16:34,286 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 19 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-14 10:16:34,758 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2607420.0, ans=0.0 2024-08-14 10:16:56,422 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.823e+01 2.361e+01 2.607e+01 2.997e+01 4.259e+01, threshold=5.213e+01, percent-clipped=0.0 2024-08-14 10:17:02,990 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2607520.0, ans=0.0 2024-08-14 10:17:04,644 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.44 vs. limit=12.0 2024-08-14 10:17:05,511 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-14 10:17:23,456 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 14400, loss[loss=0.1002, beats_loss=0.009551, ecapa_loss=0.0001651, whisper_loss=0.08899, over 17136.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01072, ecapa_loss=0.0001569, whisper_loss=0.09089, over 3884904.14 frames. ], batch size: 70, lr: 3.43e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:18:05,574 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=14.02 vs. limit=15.0 2024-08-14 10:18:08,614 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2607920.0, ans=0.1 2024-08-14 10:18:23,091 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 15 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-14 10:18:26,961 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.48 vs. limit=22.5 2024-08-14 10:18:29,280 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2608020.0, ans=0.125 2024-08-14 10:18:45,771 INFO [train_multi_KD3.py:1116] (1/4) Epoch 18, batch 14450, loss[loss=0.08364, beats_loss=0.0127, ecapa_loss=0.0001273, whisper_loss=0.06967, over 20809.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01074, ecapa_loss=0.0001567, whisper_loss=0.09019, over 3903354.39 frames. ], batch size: 84, lr: 3.43e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:18:54,461 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2608220.0, ans=0.2 2024-08-14 10:18:59,709 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 32 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-14 10:19:09,498 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 24 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-14 10:19:55,473 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 0, loss[loss=0.07162, beats_loss=0.01587, ecapa_loss=0.0001428, whisper_loss=0.05432, over 15131.00 frames. ], tot_loss[loss=0.07162, beats_loss=0.01587, ecapa_loss=0.0001428, whisper_loss=0.05432, over 15131.00 frames. ], batch size: 62, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:19:55,474 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-14 10:20:37,915 INFO [train_multi_KD3.py:1149] (1/4) Epoch 19, validation on ASR_libri: loss=0.2539, beats_loss=0, ecapa_loss=0.0005486, whisper_loss=0.2484, over 922467.00 frames. 2024-08-14 10:20:53,910 INFO [train_multi_KD3.py:1149] (1/4) Epoch 19, validation on SV_voxceleb1: loss=0.004382, beats_loss=0, ecapa_loss=0.0004382, whisper_loss=0, over 939242.00 frames. 2024-08-14 10:22:56,777 INFO [train_multi_KD3.py:1149] (1/4) Epoch 19, validation on AT_audioset: loss=0.02338, beats_loss=0.02338, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 10:22:56,780 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 30639MB 2024-08-14 10:23:08,805 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=17.21 vs. limit=15.0 2024-08-14 10:23:09,189 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.979e+01 2.393e+01 2.576e+01 3.022e+01 6.974e+01, threshold=5.152e+01, percent-clipped=1.0 2024-08-14 10:23:28,983 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-14 10:23:50,655 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2608720.0, ans=0.125 2024-08-14 10:24:08,335 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 16 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-14 10:24:11,381 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.727e+01 2024-08-14 10:24:18,245 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-14 10:24:24,051 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2608820.0, ans=0.125 2024-08-14 10:24:29,754 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 27 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-14 10:25:06,148 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 50, loss[loss=0.1161, beats_loss=0.009708, ecapa_loss=0.0001482, whisper_loss=0.1049, over 22384.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01005, ecapa_loss=0.0001649, whisper_loss=0.08897, over 858475.12 frames. ], batch size: 85, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:25:09,382 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2609020.0, ans=0.2 2024-08-14 10:25:11,540 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 19 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-14 10:25:42,361 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2609120.0, ans=0.125 2024-08-14 10:25:47,068 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2609120.0, ans=0.0 2024-08-14 10:26:14,702 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2609220.0, ans=0.1 2024-08-14 10:26:44,437 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 24 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-14 10:26:54,106 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2609420.0, ans=0.125 2024-08-14 10:26:58,674 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2609420.0, ans=0.0 2024-08-14 10:27:04,948 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 100, loss[loss=0.09179, beats_loss=0.01133, ecapa_loss=0.0001323, whisper_loss=0.07913, over 22320.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.009831, ecapa_loss=0.0001603, whisper_loss=0.08892, over 1500245.18 frames. ], batch size: 89, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:27:16,384 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.117e+01 2.614e+01 2.833e+01 3.144e+01 8.943e+01, threshold=5.666e+01, percent-clipped=3.0 2024-08-14 10:27:23,936 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 22 from LS+wenet, 26 from Vox, 22 fro AS 2024-08-14 10:27:24,235 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2609520.0, ans=0.0 2024-08-14 10:27:24,566 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.10 vs. limit=15.0 2024-08-14 10:28:05,204 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2609720.0, ans=0.2 2024-08-14 10:28:13,960 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2609820.0, ans=0.125 2024-08-14 10:28:19,469 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 25 from LS+wenet, 12 from Vox, 40 fro AS 2024-08-14 10:28:31,988 INFO [train_multi_KD3.py:844] (1/4) A total of 96 cuts. 37 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-14 10:28:36,242 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2609920.0, ans=0.125 2024-08-14 10:28:56,361 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 150, loss[loss=0.109, beats_loss=0.009721, ecapa_loss=0.0001582, whisper_loss=0.09766, over 14022.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.009842, ecapa_loss=0.0001612, whisper_loss=0.08918, over 1970170.61 frames. ], batch size: 56, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:29:13,314 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2610120.0, ans=0.125 2024-08-14 10:29:34,341 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-14 10:29:40,060 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2610220.0, ans=0.125 2024-08-14 10:30:00,241 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 24 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-14 10:30:01,838 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2610420.0, ans=0.125 2024-08-14 10:30:18,286 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 200, loss[loss=0.1013, beats_loss=0.01053, ecapa_loss=0.0001782, whisper_loss=0.08901, over 14278.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.009904, ecapa_loss=0.0001616, whisper_loss=0.09108, over 2381675.26 frames. ], batch size: 56, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:30:25,920 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.004e+01 2.407e+01 2.774e+01 3.039e+01 4.574e+01, threshold=5.548e+01, percent-clipped=0.0 2024-08-14 10:30:40,699 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2610620.0, ans=0.125 2024-08-14 10:30:47,524 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=2610720.0, ans=0.05 2024-08-14 10:30:58,117 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2610720.0, ans=0.09899494936611666 2024-08-14 10:31:09,973 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2610820.0, ans=0.1 2024-08-14 10:31:16,398 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2610820.0, ans=0.0 2024-08-14 10:31:23,908 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 20 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-14 10:31:36,916 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2611020.0, ans=0.1 2024-08-14 10:31:37,724 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 250, loss[loss=0.1201, beats_loss=0.009524, ecapa_loss=0.000158, whisper_loss=0.109, over 23869.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0101, ecapa_loss=0.0001611, whisper_loss=0.09087, over 2682533.87 frames. ], batch size: 91, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:31:37,893 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 13 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-14 10:32:06,830 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 23 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-14 10:32:13,033 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 17 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-14 10:32:14,505 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 24 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-14 10:32:15,070 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2611220.0, ans=0.125 2024-08-14 10:32:27,662 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.47 vs. limit=6.0 2024-08-14 10:32:31,082 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2611320.0, ans=0.125 2024-08-14 10:33:01,453 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 300, loss[loss=0.1249, beats_loss=0.009412, ecapa_loss=0.0001587, whisper_loss=0.1139, over 23121.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01029, ecapa_loss=0.0001608, whisper_loss=0.09029, over 2929132.15 frames. ], batch size: 88, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:33:10,138 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.860e+01 2.366e+01 2.598e+01 2.945e+01 2.183e+02, threshold=5.197e+01, percent-clipped=2.0 2024-08-14 10:33:37,576 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 19 from LS+wenet, 10 from Vox, 28 fro AS 2024-08-14 10:33:46,007 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.27 vs. limit=15.0 2024-08-14 10:33:47,637 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 17 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-14 10:33:58,411 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2611820.0, ans=0.125 2024-08-14 10:34:00,290 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2611820.0, ans=0.125 2024-08-14 10:34:03,491 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2611820.0, ans=0.2 2024-08-14 10:34:18,340 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2611920.0, ans=0.1 2024-08-14 10:34:19,272 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 20 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-14 10:34:22,578 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2611920.0, ans=0.1 2024-08-14 10:34:25,982 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 15 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-14 10:34:29,469 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 350, loss[loss=0.1173, beats_loss=0.01071, ecapa_loss=0.0001417, whisper_loss=0.1052, over 18857.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01042, ecapa_loss=0.0001591, whisper_loss=0.09024, over 3116042.40 frames. ], batch size: 72, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:34:50,370 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=2612120.0, ans=22.5 2024-08-14 10:34:56,564 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff3.min_abs, batch_count=2612120.0, ans=0.2 2024-08-14 10:35:24,366 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2612320.0, ans=0.0 2024-08-14 10:35:24,455 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2612320.0, ans=0.0 2024-08-14 10:35:40,874 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 19 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-14 10:35:45,853 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.68 vs. limit=6.0 2024-08-14 10:35:53,724 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 400, loss[loss=0.1196, beats_loss=0.009199, ecapa_loss=0.0001571, whisper_loss=0.1089, over 21663.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01056, ecapa_loss=0.0001578, whisper_loss=0.08955, over 3266026.45 frames. ], batch size: 85, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:36:01,859 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.645e+01 2.324e+01 2.549e+01 2.797e+01 3.225e+02, threshold=5.099e+01, percent-clipped=2.0 2024-08-14 10:36:14,789 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 22 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-14 10:36:32,925 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2612720.0, ans=0.1 2024-08-14 10:36:35,414 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2612720.0, ans=0.125 2024-08-14 10:36:51,846 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2612820.0, ans=0.0 2024-08-14 10:36:59,860 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 29 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-14 10:37:00,158 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2612920.0, ans=0.125 2024-08-14 10:37:09,464 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 23 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-14 10:37:13,421 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2612920.0, ans=0.2 2024-08-14 10:37:17,647 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 450, loss[loss=0.09042, beats_loss=0.0107, ecapa_loss=0.0001706, whisper_loss=0.07802, over 14843.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01053, ecapa_loss=0.000157, whisper_loss=0.08989, over 3387056.09 frames. ], batch size: 60, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:37:38,783 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2613120.0, ans=0.125 2024-08-14 10:37:38,845 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2613120.0, ans=0.125 2024-08-14 10:37:49,153 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2613120.0, ans=0.0 2024-08-14 10:37:52,607 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2613220.0, ans=0.1 2024-08-14 10:38:06,744 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 31 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-14 10:38:13,902 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.52 vs. limit=8.0 2024-08-14 10:38:40,894 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.71 vs. limit=15.0 2024-08-14 10:38:47,407 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 500, loss[loss=0.1022, beats_loss=0.0107, ecapa_loss=0.0001595, whisper_loss=0.08988, over 15587.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01042, ecapa_loss=0.0001564, whisper_loss=0.0911, over 3502376.31 frames. ], batch size: 64, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:38:56,739 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.290e+01 2.547e+01 2.928e+01 5.420e+01, threshold=5.093e+01, percent-clipped=1.0 2024-08-14 10:39:11,824 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2613620.0, ans=0.04949747468305833 2024-08-14 10:39:12,147 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.74 vs. limit=22.5 2024-08-14 10:39:12,702 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 20 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-14 10:39:19,395 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.20 vs. limit=22.5 2024-08-14 10:40:02,799 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.10 vs. limit=22.5 2024-08-14 10:40:08,023 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2613920.0, ans=0.125 2024-08-14 10:40:18,816 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 550, loss[loss=0.1517, beats_loss=0.00672, ecapa_loss=0.0001646, whisper_loss=0.1433, over 22460.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0105, ecapa_loss=0.000155, whisper_loss=0.09117, over 3593935.21 frames. ], batch size: 83, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:40:22,879 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 23 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-14 10:40:28,934 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2614020.0, ans=0.0 2024-08-14 10:40:34,078 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.85 vs. limit=22.5 2024-08-14 10:40:35,811 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.04 vs. limit=15.0 2024-08-14 10:40:46,200 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=2614120.0, ans=0.5 2024-08-14 10:41:03,621 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 21 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-14 10:41:29,258 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 23 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-14 10:41:34,920 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.01 vs. limit=22.5 2024-08-14 10:41:36,366 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2614420.0, ans=0.1 2024-08-14 10:41:46,263 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 600, loss[loss=0.1139, beats_loss=0.009524, ecapa_loss=0.0001646, whisper_loss=0.1027, over 22311.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01054, ecapa_loss=0.0001533, whisper_loss=0.09066, over 3644980.63 frames. ], batch size: 87, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:41:48,230 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2614520.0, ans=0.0 2024-08-14 10:41:51,128 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2614520.0, ans=0.0 2024-08-14 10:41:53,782 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.288e+01 2.520e+01 2.805e+01 9.045e+01, threshold=5.041e+01, percent-clipped=2.0 2024-08-14 10:42:00,024 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2614520.0, ans=0.0 2024-08-14 10:42:09,348 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2614620.0, ans=0.125 2024-08-14 10:42:16,489 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.61 vs. limit=15.0 2024-08-14 10:42:20,882 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 26 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-14 10:42:46,715 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 37 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-14 10:42:48,275 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2614920.0, ans=0.125 2024-08-14 10:42:49,302 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 16 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-14 10:42:57,633 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2614920.0, ans=0.0 2024-08-14 10:43:03,943 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 650, loss[loss=0.1057, beats_loss=0.01135, ecapa_loss=0.0001475, whisper_loss=0.09285, over 21131.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01061, ecapa_loss=0.0001537, whisper_loss=0.09031, over 3689907.89 frames. ], batch size: 83, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:43:06,074 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2615020.0, ans=0.0 2024-08-14 10:43:07,729 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2615020.0, ans=0.0 2024-08-14 10:43:10,553 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2615020.0, ans=0.0 2024-08-14 10:43:24,984 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 16 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-14 10:43:28,305 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 23 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-14 10:43:35,159 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2615220.0, ans=0.1 2024-08-14 10:43:39,433 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.57 vs. limit=15.0 2024-08-14 10:43:40,077 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 35 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-14 10:43:42,946 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2615220.0, ans=0.0 2024-08-14 10:43:46,561 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 27 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-14 10:43:49,573 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 20 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-14 10:44:05,735 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-14 10:44:07,390 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2615420.0, ans=0.125 2024-08-14 10:44:08,909 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2615420.0, ans=0.125 2024-08-14 10:44:10,693 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2615420.0, ans=0.2 2024-08-14 10:44:13,279 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 700, loss[loss=0.1149, beats_loss=0.01113, ecapa_loss=0.0001338, whisper_loss=0.1025, over 23059.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01056, ecapa_loss=0.0001541, whisper_loss=0.09101, over 3757421.72 frames. ], batch size: 90, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:44:19,896 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.981e+01 2.455e+01 2.625e+01 2.898e+01 4.319e+01, threshold=5.251e+01, percent-clipped=0.0 2024-08-14 10:44:25,559 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2615620.0, ans=0.0 2024-08-14 10:44:29,677 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.69 vs. limit=15.0 2024-08-14 10:44:40,957 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.71 vs. limit=12.0 2024-08-14 10:44:42,843 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 25 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-14 10:44:47,779 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2615720.0, ans=0.125 2024-08-14 10:45:01,692 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2615820.0, ans=10.0 2024-08-14 10:45:02,786 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 24 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-14 10:45:04,391 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2615820.0, ans=0.125 2024-08-14 10:45:12,232 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2615920.0, ans=0.125 2024-08-14 10:45:13,854 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2615920.0, ans=0.125 2024-08-14 10:45:18,231 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2615920.0, ans=0.07 2024-08-14 10:45:20,522 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 750, loss[loss=0.1009, beats_loss=0.01246, ecapa_loss=0.0001779, whisper_loss=0.08661, over 15975.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01056, ecapa_loss=0.000153, whisper_loss=0.09088, over 3745871.62 frames. ], batch size: 67, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:45:32,716 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2616120.0, ans=0.0 2024-08-14 10:45:52,942 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2616220.0, ans=0.125 2024-08-14 10:46:01,052 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-14 10:46:04,173 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2616320.0, ans=0.1 2024-08-14 10:46:06,619 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2616320.0, ans=0.1 2024-08-14 10:46:07,835 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2616320.0, ans=0.125 2024-08-14 10:46:07,910 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2616320.0, ans=0.125 2024-08-14 10:46:15,687 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2616420.0, ans=0.0 2024-08-14 10:46:23,558 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 31 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-14 10:46:27,255 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 800, loss[loss=0.07154, beats_loss=0.01432, ecapa_loss=0.0001719, whisper_loss=0.0555, over 22330.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0106, ecapa_loss=0.0001522, whisper_loss=0.09107, over 3799400.25 frames. ], batch size: 95, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:46:27,669 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2616520.0, ans=0.1 2024-08-14 10:46:33,956 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.840e+01 2.304e+01 2.552e+01 2.845e+01 4.485e+01, threshold=5.104e+01, percent-clipped=0.0 2024-08-14 10:46:50,973 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2616620.0, ans=0.125 2024-08-14 10:46:56,077 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2616720.0, ans=0.2 2024-08-14 10:46:58,949 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.12 vs. limit=15.0 2024-08-14 10:47:00,057 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2616720.0, ans=0.1 2024-08-14 10:47:14,565 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 17 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-14 10:47:22,515 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-14 10:47:34,632 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 850, loss[loss=0.1019, beats_loss=0.009573, ecapa_loss=0.0001838, whisper_loss=0.09051, over 19443.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01058, ecapa_loss=0.0001526, whisper_loss=0.09073, over 3804825.25 frames. ], batch size: 81, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:47:36,193 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 29 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-14 10:47:39,542 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.81 vs. limit=15.0 2024-08-14 10:47:45,854 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-14 10:47:56,730 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.25 vs. limit=15.0 2024-08-14 10:48:03,190 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2617220.0, ans=0.125 2024-08-14 10:48:05,805 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 26 from LS+wenet, 30 from Vox, 25 fro AS 2024-08-14 10:48:11,539 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2617220.0, ans=0.0 2024-08-14 10:48:16,096 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 16 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-14 10:48:42,391 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 900, loss[loss=0.09465, beats_loss=0.008702, ecapa_loss=0.0001695, whisper_loss=0.08425, over 14506.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01057, ecapa_loss=0.0001532, whisper_loss=0.09037, over 3799282.88 frames. ], batch size: 60, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:48:49,482 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.308e+01 2.548e+01 2.901e+01 4.285e+01, threshold=5.097e+01, percent-clipped=0.0 2024-08-14 10:48:49,671 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-14 10:48:49,890 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2617520.0, ans=0.0 2024-08-14 10:49:21,327 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 35 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-14 10:49:32,547 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2617820.0, ans=0.0 2024-08-14 10:49:35,286 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2617920.0, ans=0.125 2024-08-14 10:49:44,202 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-14 10:49:45,801 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2617920.0, ans=0.1 2024-08-14 10:49:49,505 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 950, loss[loss=0.1203, beats_loss=0.01069, ecapa_loss=0.0001601, whisper_loss=0.108, over 23889.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01057, ecapa_loss=0.0001533, whisper_loss=0.09064, over 3799895.70 frames. ], batch size: 93, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:50:11,652 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2618120.0, ans=0.1 2024-08-14 10:50:15,598 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 20 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-14 10:50:15,920 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2618220.0, ans=0.0 2024-08-14 10:50:46,563 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2618420.0, ans=0.125 2024-08-14 10:50:47,602 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 19 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-14 10:50:50,332 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2618420.0, ans=0.125 2024-08-14 10:50:54,835 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 10:50:57,238 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 1000, loss[loss=0.07812, beats_loss=0.0103, ecapa_loss=0.0001522, whisper_loss=0.0663, over 16223.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01058, ecapa_loss=0.0001527, whisper_loss=0.09064, over 3763785.21 frames. ], batch size: 64, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:51:03,038 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.36 vs. limit=10.0 2024-08-14 10:51:03,741 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.735e+01 2.412e+01 2.681e+01 3.043e+01 1.164e+02, threshold=5.362e+01, percent-clipped=2.0 2024-08-14 10:51:12,183 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.85 vs. limit=15.0 2024-08-14 10:51:12,946 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2618620.0, ans=0.125 2024-08-14 10:51:16,701 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 25 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-14 10:51:28,192 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.01 vs. limit=22.5 2024-08-14 10:51:37,018 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 26 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-14 10:51:38,600 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2618820.0, ans=0.0 2024-08-14 10:51:42,551 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2618820.0, ans=0.0 2024-08-14 10:51:51,797 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2618920.0, ans=0.1 2024-08-14 10:51:59,820 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 19 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-14 10:52:01,699 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.51 vs. limit=15.0 2024-08-14 10:52:03,790 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 1050, loss[loss=0.09567, beats_loss=0.01115, ecapa_loss=0.0001496, whisper_loss=0.08302, over 18139.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01059, ecapa_loss=0.0001519, whisper_loss=0.0902, over 3722560.02 frames. ], batch size: 70, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:52:17,269 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2619120.0, ans=0.5 2024-08-14 10:52:27,023 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.11 vs. limit=15.0 2024-08-14 10:52:29,438 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2619220.0, ans=0.1 2024-08-14 10:52:34,681 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2619220.0, ans=0.125 2024-08-14 10:52:47,124 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2619320.0, ans=0.5 2024-08-14 10:52:48,664 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2619320.0, ans=0.0 2024-08-14 10:53:05,737 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 35 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-14 10:53:11,171 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 1100, loss[loss=0.107, beats_loss=0.01132, ecapa_loss=0.0001251, whisper_loss=0.09443, over 23332.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01055, ecapa_loss=0.0001528, whisper_loss=0.09122, over 3766007.43 frames. ], batch size: 90, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:53:17,212 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.365e+01 2.665e+01 2.962e+01 1.430e+02, threshold=5.329e+01, percent-clipped=2.0 2024-08-14 10:53:23,363 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.29 vs. limit=12.0 2024-08-14 10:53:28,084 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 21 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-14 10:53:57,276 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 24 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-14 10:53:59,922 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 19 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-14 10:54:08,821 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2619920.0, ans=0.125 2024-08-14 10:54:17,473 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 1150, loss[loss=0.1021, beats_loss=0.009529, ecapa_loss=0.0001552, whisper_loss=0.09104, over 22779.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0106, ecapa_loss=0.0001524, whisper_loss=0.09097, over 3782112.71 frames. ], batch size: 92, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:54:48,758 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 19 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-14 10:54:48,993 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2620220.0, ans=0.125 2024-08-14 10:54:59,364 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 23 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-14 10:55:03,488 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 17 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-14 10:55:06,494 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2620320.0, ans=0.0 2024-08-14 10:55:07,431 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 14 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-14 10:55:07,716 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2620320.0, ans=0.125 2024-08-14 10:55:13,385 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.78 vs. limit=12.0 2024-08-14 10:55:14,017 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-14 10:55:24,656 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 1200, loss[loss=0.1037, beats_loss=0.01163, ecapa_loss=0.0001136, whisper_loss=0.09091, over 23047.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01067, ecapa_loss=0.0001516, whisper_loss=0.09094, over 3804600.78 frames. ], batch size: 91, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:55:31,584 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.782e+01 2.349e+01 2.616e+01 2.854e+01 5.362e+01, threshold=5.231e+01, percent-clipped=1.0 2024-08-14 10:55:43,554 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 32 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-14 10:55:53,790 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 23 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-14 10:55:54,356 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.81 vs. limit=15.0 2024-08-14 10:55:55,431 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2620720.0, ans=0.125 2024-08-14 10:56:03,900 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.72 vs. limit=15.0 2024-08-14 10:56:04,488 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2620820.0, ans=0.125 2024-08-14 10:56:04,801 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.79 vs. limit=15.0 2024-08-14 10:56:12,487 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2620820.0, ans=0.1 2024-08-14 10:56:20,249 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 21 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-14 10:56:22,768 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-14 10:56:23,364 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.32 vs. limit=22.5 2024-08-14 10:56:24,386 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 10:56:31,870 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 1250, loss[loss=0.08359, beats_loss=0.01289, ecapa_loss=0.0001371, whisper_loss=0.06933, over 15174.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01078, ecapa_loss=0.000151, whisper_loss=0.09011, over 3811589.95 frames. ], batch size: 59, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:56:41,578 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 17 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-14 10:56:42,835 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 30 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-14 10:56:49,480 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2621120.0, ans=0.125 2024-08-14 10:56:57,758 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-14 10:57:05,660 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 17 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-14 10:57:11,458 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2621320.0, ans=0.125 2024-08-14 10:57:11,474 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2621320.0, ans=0.2 2024-08-14 10:57:39,396 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 1300, loss[loss=0.1067, beats_loss=0.008427, ecapa_loss=0.0001414, whisper_loss=0.09687, over 21280.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01071, ecapa_loss=0.0001514, whisper_loss=0.09018, over 3817892.19 frames. ], batch size: 77, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:57:45,720 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.787e+01 2.286e+01 2.497e+01 2.754e+01 3.684e+01, threshold=4.994e+01, percent-clipped=0.0 2024-08-14 10:57:48,995 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.21 vs. limit=15.0 2024-08-14 10:57:51,361 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2621620.0, ans=0.125 2024-08-14 10:57:53,801 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 16 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-14 10:57:58,135 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2621620.0, ans=0.1 2024-08-14 10:58:08,712 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 27 from LS+wenet, 13 from Vox, 47 fro AS 2024-08-14 10:58:10,639 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.31 vs. limit=10.0 2024-08-14 10:58:36,616 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.35 vs. limit=22.5 2024-08-14 10:58:39,584 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.83 vs. limit=6.0 2024-08-14 10:58:46,608 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 1350, loss[loss=0.08441, beats_loss=0.01037, ecapa_loss=0.0001439, whisper_loss=0.0726, over 18583.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01072, ecapa_loss=0.0001518, whisper_loss=0.08985, over 3792959.34 frames. ], batch size: 72, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:58:46,783 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 17 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-14 10:59:21,083 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 26 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-14 10:59:38,939 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2622420.0, ans=0.125 2024-08-14 10:59:48,235 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 21 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-14 10:59:48,818 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.71 vs. limit=15.0 2024-08-14 10:59:51,099 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2622420.0, ans=0.0 2024-08-14 10:59:53,314 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 1400, loss[loss=0.09276, beats_loss=0.01122, ecapa_loss=0.0001539, whisper_loss=0.08, over 15724.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0107, ecapa_loss=0.000152, whisper_loss=0.09002, over 3814067.54 frames. ], batch size: 61, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:59:59,980 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.798e+01 2.361e+01 2.575e+01 2.810e+01 4.774e+01, threshold=5.151e+01, percent-clipped=0.0 2024-08-14 11:00:04,747 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2622520.0, ans=0.125 2024-08-14 11:00:10,126 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2622620.0, ans=0.0 2024-08-14 11:00:22,064 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2622720.0, ans=0.125 2024-08-14 11:00:23,227 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 19 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-14 11:00:24,433 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 17 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-14 11:00:36,784 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2622820.0, ans=0.0 2024-08-14 11:00:44,825 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.58 vs. limit=22.5 2024-08-14 11:00:51,413 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2622920.0, ans=0.125 2024-08-14 11:00:54,839 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 22 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-14 11:00:55,070 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2622920.0, ans=0.025 2024-08-14 11:00:59,961 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 1450, loss[loss=0.09276, beats_loss=0.01253, ecapa_loss=0.0001529, whisper_loss=0.0787, over 19729.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01064, ecapa_loss=0.0001512, whisper_loss=0.0902, over 3808206.60 frames. ], batch size: 83, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:01:13,167 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2623020.0, ans=0.125 2024-08-14 11:01:15,906 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2623020.0, ans=0.125 2024-08-14 11:01:17,018 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 16 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-14 11:01:24,679 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2623020.0, ans=0.0 2024-08-14 11:01:44,305 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2623220.0, ans=0.0 2024-08-14 11:01:55,960 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.45 vs. limit=15.0 2024-08-14 11:01:59,615 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2623320.0, ans=0.07 2024-08-14 11:01:59,699 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2623320.0, ans=0.125 2024-08-14 11:02:03,740 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2623320.0, ans=0.1 2024-08-14 11:02:09,104 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 13 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-14 11:02:23,555 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 1500, loss[loss=0.112, beats_loss=0.009667, ecapa_loss=0.000166, whisper_loss=0.1006, over 15737.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01073, ecapa_loss=0.0001507, whisper_loss=0.08927, over 3807708.54 frames. ], batch size: 61, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:02:30,956 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 2.366e+01 2.617e+01 2.967e+01 6.359e+01, threshold=5.234e+01, percent-clipped=3.0 2024-08-14 11:02:31,414 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2623520.0, ans=0.125 2024-08-14 11:02:31,591 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=2623520.0, ans=15.0 2024-08-14 11:02:37,264 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.31 vs. limit=22.5 2024-08-14 11:02:37,347 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.99 vs. limit=15.0 2024-08-14 11:02:38,366 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2623620.0, ans=0.125 2024-08-14 11:02:39,935 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2623620.0, ans=0.0 2024-08-14 11:02:40,263 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.74 vs. limit=10.0 2024-08-14 11:03:08,417 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2623820.0, ans=0.0 2024-08-14 11:03:11,545 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.22 vs. limit=10.0 2024-08-14 11:03:18,050 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 20 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-14 11:03:18,334 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 11:03:37,884 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 1550, loss[loss=0.08767, beats_loss=0.01234, ecapa_loss=0.0001151, whisper_loss=0.07418, over 16522.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01069, ecapa_loss=0.000151, whisper_loss=0.08915, over 3827881.86 frames. ], batch size: 64, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:04:06,830 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2624220.0, ans=0.0 2024-08-14 11:04:09,307 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 23 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-14 11:04:12,552 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.69 vs. limit=15.0 2024-08-14 11:04:13,680 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 20 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-14 11:04:28,964 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2624320.0, ans=0.125 2024-08-14 11:04:33,783 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.70 vs. limit=12.0 2024-08-14 11:04:49,822 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.88 vs. limit=15.0 2024-08-14 11:04:54,177 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 1600, loss[loss=0.07807, beats_loss=0.01289, ecapa_loss=0.0001351, whisper_loss=0.06383, over 20065.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01059, ecapa_loss=0.0001524, whisper_loss=0.08961, over 3816689.04 frames. ], batch size: 78, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:05:01,274 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.852e+01 2.369e+01 2.524e+01 2.843e+01 4.192e+01, threshold=5.047e+01, percent-clipped=0.0 2024-08-14 11:05:02,166 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.30 vs. limit=10.0 2024-08-14 11:05:07,269 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 28 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-14 11:05:22,785 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 24 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-14 11:05:24,429 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2624720.0, ans=0.1 2024-08-14 11:05:31,554 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 32 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-14 11:05:33,307 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 19 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-14 11:05:46,556 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2624820.0, ans=0.1 2024-08-14 11:06:06,107 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 17 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-14 11:06:09,999 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 1650, loss[loss=0.07194, beats_loss=0.01391, ecapa_loss=0.0001647, whisper_loss=0.05639, over 20488.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01061, ecapa_loss=0.0001508, whisper_loss=0.08969, over 3831210.29 frames. ], batch size: 90, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:06:13,073 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=5.357e+00 2024-08-14 11:06:14,499 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2625020.0, ans=0.1 2024-08-14 11:06:22,022 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 33 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-14 11:06:22,195 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2625020.0, ans=0.2 2024-08-14 11:06:23,686 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2625120.0, ans=0.125 2024-08-14 11:06:52,755 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2625220.0, ans=0.0 2024-08-14 11:07:14,452 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 21 from LS+wenet, 24 from Vox, 48 fro AS 2024-08-14 11:07:14,656 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2625420.0, ans=0.125 2024-08-14 11:07:19,055 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 30 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-14 11:07:23,004 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.09 vs. limit=15.0 2024-08-14 11:07:25,605 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 1700, loss[loss=0.09844, beats_loss=0.01204, ecapa_loss=0.0001395, whisper_loss=0.08501, over 21088.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01055, ecapa_loss=0.0001505, whisper_loss=0.08985, over 3841205.47 frames. ], batch size: 85, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:07:27,535 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 17 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-14 11:07:32,944 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.731e+01 2.263e+01 2.524e+01 2.794e+01 4.972e+01, threshold=5.048e+01, percent-clipped=0.0 2024-08-14 11:07:38,848 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.64 vs. limit=15.0 2024-08-14 11:07:48,579 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-14 11:07:48,796 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2625620.0, ans=0.2 2024-08-14 11:08:21,308 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2625820.0, ans=0.1 2024-08-14 11:08:21,580 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.56 vs. limit=15.0 2024-08-14 11:08:32,389 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.79 vs. limit=15.0 2024-08-14 11:08:35,406 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.85 vs. limit=15.0 2024-08-14 11:08:41,416 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 1750, loss[loss=0.08481, beats_loss=0.01371, ecapa_loss=0.000126, whisper_loss=0.06983, over 14669.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01064, ecapa_loss=0.0001514, whisper_loss=0.08887, over 3822631.73 frames. ], batch size: 61, lr: 3.32e-03, grad_scale: 1.152921504606847e+18 2024-08-14 11:09:06,850 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 21 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-14 11:09:16,582 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.16 vs. limit=15.0 2024-08-14 11:09:31,214 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2626320.0, ans=0.1 2024-08-14 11:09:32,535 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2626320.0, ans=0.2 2024-08-14 11:09:48,212 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 32 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 11:09:51,705 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=2626420.0, ans=15.0 2024-08-14 11:09:53,049 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2626420.0, ans=0.1 2024-08-14 11:09:54,320 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2626520.0, ans=0.1 2024-08-14 11:09:55,059 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 1800, loss[loss=0.1104, beats_loss=0.0118, ecapa_loss=0.0001419, whisper_loss=0.09716, over 20424.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01059, ecapa_loss=0.0001518, whisper_loss=0.0889, over 3812006.94 frames. ], batch size: 80, lr: 3.32e-03, grad_scale: 1.152921504606847e+18 2024-08-14 11:10:05,037 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.754e+01 2.260e+01 2.552e+01 2.816e+01 4.964e+01, threshold=5.104e+01, percent-clipped=0.0 2024-08-14 11:10:10,958 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 20 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-14 11:10:11,441 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2626620.0, ans=0.2 2024-08-14 11:10:12,675 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-14 11:10:16,291 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2626620.0, ans=0.0 2024-08-14 11:10:36,546 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 35 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 11:10:40,416 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2626820.0, ans=0.1 2024-08-14 11:10:40,578 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.46 vs. limit=15.0 2024-08-14 11:10:48,812 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2626820.0, ans=0.04949747468305833 2024-08-14 11:11:11,055 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 1850, loss[loss=0.09327, beats_loss=0.009601, ecapa_loss=0.0001595, whisper_loss=0.08207, over 14949.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01045, ecapa_loss=0.0001532, whisper_loss=0.09022, over 3834953.10 frames. ], batch size: 60, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:11:14,475 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-14 11:11:18,399 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 25 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-14 11:11:20,205 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 11:11:24,887 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 31 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-14 11:11:38,965 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.56 vs. limit=15.0 2024-08-14 11:12:05,038 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2627320.0, ans=0.125 2024-08-14 11:12:07,543 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2627320.0, ans=0.125 2024-08-14 11:12:21,418 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.81 vs. limit=6.0 2024-08-14 11:12:25,095 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 1900, loss[loss=0.1015, beats_loss=0.01188, ecapa_loss=0.0001315, whisper_loss=0.08833, over 23138.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01055, ecapa_loss=0.0001519, whisper_loss=0.08998, over 3831452.15 frames. ], batch size: 90, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:12:33,410 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.746e+01 2.313e+01 2.505e+01 2.769e+01 4.411e+01, threshold=5.010e+01, percent-clipped=0.0 2024-08-14 11:13:08,052 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 19 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-14 11:13:13,343 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.86 vs. limit=15.0 2024-08-14 11:13:27,072 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-14 11:13:37,410 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=2627920.0, ans=6.0 2024-08-14 11:13:38,403 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2628020.0, ans=0.1 2024-08-14 11:13:39,282 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 1950, loss[loss=0.1026, beats_loss=0.01175, ecapa_loss=0.0001305, whisper_loss=0.08953, over 16078.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.0106, ecapa_loss=0.0001515, whisper_loss=0.08929, over 3805626.44 frames. ], batch size: 61, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:13:42,309 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2628020.0, ans=0.0 2024-08-14 11:13:46,483 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=12.87 vs. limit=15.0 2024-08-14 11:13:51,104 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 26 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-14 11:13:54,575 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2628120.0, ans=0.125 2024-08-14 11:14:08,692 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 21 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-14 11:14:15,028 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2628220.0, ans=0.125 2024-08-14 11:14:31,740 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.51 vs. limit=15.0 2024-08-14 11:14:32,667 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2628320.0, ans=0.125 2024-08-14 11:14:37,266 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 18 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-14 11:14:40,697 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 24 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-14 11:14:44,332 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2628420.0, ans=0.0 2024-08-14 11:14:51,140 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2628420.0, ans=0.125 2024-08-14 11:14:52,403 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 21 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-14 11:14:54,976 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 2000, loss[loss=0.1006, beats_loss=0.01019, ecapa_loss=0.0001734, whisper_loss=0.08863, over 19356.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.0106, ecapa_loss=0.0001511, whisper_loss=0.08923, over 3828038.53 frames. ], batch size: 78, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:15:04,510 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.926e+01 2.355e+01 2.639e+01 2.929e+01 2.426e+02, threshold=5.277e+01, percent-clipped=2.0 2024-08-14 11:15:13,512 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 27 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-14 11:15:13,774 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2628620.0, ans=0.0 2024-08-14 11:15:16,464 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 24 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-14 11:15:32,492 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.13 vs. limit=15.0 2024-08-14 11:15:53,291 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 28 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-14 11:16:12,429 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 2050, loss[loss=0.08182, beats_loss=0.01087, ecapa_loss=0.0001423, whisper_loss=0.06953, over 22615.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01062, ecapa_loss=0.0001511, whisper_loss=0.08911, over 3851719.69 frames. ], batch size: 89, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:16:17,022 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 20 from LS+wenet, 27 from Vox, 23 fro AS 2024-08-14 11:16:51,451 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 22 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-14 11:16:51,727 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2629220.0, ans=0.1 2024-08-14 11:17:04,432 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 21 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-14 11:17:18,031 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2629420.0, ans=0.125 2024-08-14 11:17:20,807 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2629420.0, ans=0.125 2024-08-14 11:17:23,538 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2629420.0, ans=0.0 2024-08-14 11:17:30,985 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 2100, loss[loss=0.1263, beats_loss=0.009622, ecapa_loss=0.0001691, whisper_loss=0.115, over 22420.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01055, ecapa_loss=0.0001511, whisper_loss=0.08966, over 3823748.65 frames. ], batch size: 91, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:17:31,111 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 12 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-14 11:17:39,371 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.688e+01 2.267e+01 2.457e+01 2.784e+01 3.709e+01, threshold=4.913e+01, percent-clipped=0.0 2024-08-14 11:17:59,309 WARNING [optim.py:496] (1/4) Scaling gradients by 0.09374744445085526, model_norm_threshold=49.13302230834961 2024-08-14 11:17:59,517 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.27, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.336e+04, grad_sumsq=7.336e+04, orig_rms_sq=1.000e+00 2024-08-14 11:17:59,984 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2629620.0, ans=0.1 2024-08-14 11:18:29,560 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2629820.0, ans=0.125 2024-08-14 11:18:35,898 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 22 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-14 11:18:39,292 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2629920.0, ans=0.0 2024-08-14 11:18:49,657 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 2150, loss[loss=0.07698, beats_loss=0.01253, ecapa_loss=0.0001775, whisper_loss=0.06268, over 15372.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01066, ecapa_loss=0.0001512, whisper_loss=0.08946, over 3830624.44 frames. ], batch size: 64, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:19:15,514 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 16 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-14 11:19:15,785 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2630120.0, ans=0.125 2024-08-14 11:19:29,330 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 21 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-14 11:19:30,083 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.28 vs. limit=15.0 2024-08-14 11:19:31,157 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2630220.0, ans=0.125 2024-08-14 11:19:33,897 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2630220.0, ans=0.2 2024-08-14 11:19:41,836 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2630320.0, ans=0.07 2024-08-14 11:19:55,616 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2630420.0, ans=0.2 2024-08-14 11:20:03,987 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 16 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-14 11:20:08,778 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 2200, loss[loss=0.105, beats_loss=0.01041, ecapa_loss=0.000186, whisper_loss=0.09273, over 21571.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01065, ecapa_loss=0.0001509, whisper_loss=0.09037, over 3821000.89 frames. ], batch size: 93, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:20:15,336 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2630520.0, ans=0.0 2024-08-14 11:20:16,863 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2630520.0, ans=0.125 2024-08-14 11:20:17,589 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.906e+01 2.405e+01 2.616e+01 2.970e+01 5.241e+02, threshold=5.232e+01, percent-clipped=2.0 2024-08-14 11:20:21,464 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2630520.0, ans=0.1 2024-08-14 11:20:28,870 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2630620.0, ans=0.125 2024-08-14 11:20:30,469 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2630620.0, ans=0.0 2024-08-14 11:20:36,831 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-14 11:20:44,398 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 16 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-14 11:20:47,180 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 33 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-14 11:21:11,643 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 20 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-14 11:21:15,067 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 29 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-14 11:21:18,431 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 28 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-14 11:21:21,177 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 22 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-14 11:21:21,414 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2630920.0, ans=0.0 2024-08-14 11:21:25,792 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.64 vs. limit=15.0 2024-08-14 11:21:29,095 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 2250, loss[loss=0.1068, beats_loss=0.01094, ecapa_loss=0.0001572, whisper_loss=0.09426, over 22115.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01073, ecapa_loss=0.0001509, whisper_loss=0.09075, over 3860154.17 frames. ], batch size: 91, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:21:30,477 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 29 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-14 11:21:32,102 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2631020.0, ans=0.07 2024-08-14 11:21:57,027 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2631120.0, ans=0.0 2024-08-14 11:21:59,600 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 21 from LS+wenet, 12 from Vox, 20 fro AS 2024-08-14 11:22:01,641 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 11:22:03,567 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2631220.0, ans=0.0 2024-08-14 11:22:03,780 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=2631220.0, ans=6.0 2024-08-14 11:22:27,066 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2631320.0, ans=0.125 2024-08-14 11:22:49,254 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 2300, loss[loss=0.08626, beats_loss=0.01186, ecapa_loss=0.0001447, whisper_loss=0.07296, over 23668.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01081, ecapa_loss=0.0001505, whisper_loss=0.09052, over 3864574.06 frames. ], batch size: 95, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:22:49,373 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-14 11:22:57,299 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-14 11:22:58,669 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.348e+01 2.609e+01 2.849e+01 2.533e+02, threshold=5.217e+01, percent-clipped=1.0 2024-08-14 11:23:08,891 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 29 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-14 11:23:32,124 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.03 vs. limit=15.0 2024-08-14 11:23:48,753 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2631820.0, ans=0.125 2024-08-14 11:23:57,575 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 24 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-14 11:24:08,791 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 2350, loss[loss=0.07556, beats_loss=0.01301, ecapa_loss=0.0001394, whisper_loss=0.06115, over 14224.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01075, ecapa_loss=0.0001508, whisper_loss=0.09119, over 3878851.19 frames. ], batch size: 55, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:24:16,404 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 9 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-14 11:24:21,716 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2632020.0, ans=0.125 2024-08-14 11:24:27,876 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 18 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-14 11:24:33,678 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 19 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-14 11:24:33,979 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2632120.0, ans=0.0 2024-08-14 11:24:47,045 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.66 vs. limit=15.0 2024-08-14 11:24:56,488 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2632320.0, ans=0.09899494936611666 2024-08-14 11:25:29,178 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 2400, loss[loss=0.07846, beats_loss=0.01235, ecapa_loss=0.0001606, whisper_loss=0.0645, over 15189.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01069, ecapa_loss=0.0001527, whisper_loss=0.0903, over 3855919.84 frames. ], batch size: 65, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:25:34,552 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2632520.0, ans=0.1 2024-08-14 11:25:38,056 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.318e+01 2.574e+01 2.948e+01 5.851e+01, threshold=5.149e+01, percent-clipped=1.0 2024-08-14 11:25:52,467 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.17 vs. limit=15.0 2024-08-14 11:25:53,729 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.06 vs. limit=22.5 2024-08-14 11:26:06,332 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-14 11:26:15,751 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2632820.0, ans=0.0 2024-08-14 11:26:26,173 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-14 11:26:41,100 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2632920.0, ans=0.1 2024-08-14 11:26:45,708 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 23 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-14 11:26:46,783 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 2450, loss[loss=0.1091, beats_loss=0.01094, ecapa_loss=0.000126, whisper_loss=0.09688, over 17009.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01064, ecapa_loss=0.0001527, whisper_loss=0.09034, over 3868402.89 frames. ], batch size: 64, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:27:08,376 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 32 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-14 11:27:16,062 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 28 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-14 11:27:31,412 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 31 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-14 11:27:33,175 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2633320.0, ans=0.2 2024-08-14 11:27:37,356 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2633320.0, ans=0.125 2024-08-14 11:28:03,464 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 2500, loss[loss=0.09844, beats_loss=0.01357, ecapa_loss=0.0001596, whisper_loss=0.08328, over 21871.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01059, ecapa_loss=0.0001537, whisper_loss=0.09064, over 3884923.44 frames. ], batch size: 92, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:28:11,999 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.237e+01 2.442e+01 2.717e+01 4.288e+01, threshold=4.884e+01, percent-clipped=0.0 2024-08-14 11:28:12,689 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.98 vs. limit=15.0 2024-08-14 11:28:15,552 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 16 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-14 11:28:17,886 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.44 vs. limit=6.0 2024-08-14 11:28:22,242 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.11 vs. limit=15.0 2024-08-14 11:28:26,661 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2633620.0, ans=0.125 2024-08-14 11:28:29,945 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 23 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-14 11:28:40,877 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 17 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-14 11:28:50,638 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.02 vs. limit=22.5 2024-08-14 11:29:08,038 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2633920.0, ans=0.0 2024-08-14 11:29:10,381 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2633920.0, ans=0.125 2024-08-14 11:29:20,818 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 2550, loss[loss=0.1012, beats_loss=0.01151, ecapa_loss=0.0001246, whisper_loss=0.08848, over 14387.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01054, ecapa_loss=0.0001546, whisper_loss=0.09078, over 3863543.63 frames. ], batch size: 54, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:29:24,814 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 25 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-14 11:29:32,466 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 20 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-14 11:29:55,952 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2634220.0, ans=0.125 2024-08-14 11:30:06,082 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2634220.0, ans=0.1 2024-08-14 11:30:11,374 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2634320.0, ans=0.2 2024-08-14 11:30:25,277 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.22 vs. limit=15.0 2024-08-14 11:30:30,613 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 24 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-14 11:30:35,430 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 26 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-14 11:30:40,413 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 2600, loss[loss=0.1081, beats_loss=0.01055, ecapa_loss=0.0001364, whisper_loss=0.09614, over 17706.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01056, ecapa_loss=0.0001553, whisper_loss=0.09079, over 3876099.31 frames. ], batch size: 67, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:30:46,512 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2634520.0, ans=0.125 2024-08-14 11:30:49,112 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.954e+01 2.475e+01 2.751e+01 3.111e+01 1.109e+02, threshold=5.502e+01, percent-clipped=3.0 2024-08-14 11:31:10,054 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2634720.0, ans=0.0 2024-08-14 11:31:16,128 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 26 from LS+wenet, 9 from Vox, 26 fro AS 2024-08-14 11:31:21,729 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 19 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 11:31:33,125 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 37 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-14 11:31:33,400 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2634820.0, ans=0.125 2024-08-14 11:31:42,467 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2634920.0, ans=0.0 2024-08-14 11:31:49,216 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 20 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-14 11:31:58,228 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 2650, loss[loss=0.101, beats_loss=0.01054, ecapa_loss=0.0001081, whisper_loss=0.0894, over 14535.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01051, ecapa_loss=0.0001564, whisper_loss=0.09069, over 3856970.03 frames. ], batch size: 54, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:31:59,725 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 22 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-14 11:32:05,781 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2635020.0, ans=0.125 2024-08-14 11:32:08,959 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2635020.0, ans=0.125 2024-08-14 11:32:08,997 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2635020.0, ans=0.1 2024-08-14 11:32:10,790 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.76 vs. limit=15.0 2024-08-14 11:32:17,724 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.164e+05 2024-08-14 11:32:20,583 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2635120.0, ans=0.125 2024-08-14 11:32:20,702 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2635120.0, ans=0.1 2024-08-14 11:32:22,417 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.66 vs. limit=12.0 2024-08-14 11:32:31,340 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2635220.0, ans=0.2 2024-08-14 11:32:34,513 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2635220.0, ans=0.2 2024-08-14 11:32:54,265 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 22 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-14 11:33:06,123 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 22 from LS+wenet, 16 from Vox, 48 fro AS 2024-08-14 11:33:12,337 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 16 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-14 11:33:13,702 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 2700, loss[loss=0.08733, beats_loss=0.01331, ecapa_loss=0.0001293, whisper_loss=0.07272, over 16979.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01066, ecapa_loss=0.0001543, whisper_loss=0.09005, over 3849355.02 frames. ], batch size: 66, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:33:22,075 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.781e+01 2.370e+01 2.672e+01 3.079e+01 4.287e+01, threshold=5.344e+01, percent-clipped=0.0 2024-08-14 11:33:23,175 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.74 vs. limit=22.5 2024-08-14 11:33:24,124 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-14 11:33:32,710 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 19 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-14 11:33:52,008 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 22 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-14 11:33:57,598 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.46 vs. limit=15.0 2024-08-14 11:33:59,390 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 26 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-14 11:34:10,757 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2635820.0, ans=0.125 2024-08-14 11:34:23,366 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2635920.0, ans=0.0 2024-08-14 11:34:37,822 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 2750, loss[loss=0.1065, beats_loss=0.008624, ecapa_loss=0.000195, whisper_loss=0.09593, over 20940.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0106, ecapa_loss=0.000155, whisper_loss=0.0905, over 3828038.21 frames. ], batch size: 87, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:35:14,620 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 34 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-14 11:36:01,729 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 25 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-14 11:36:07,550 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 2800, loss[loss=0.1381, beats_loss=0.008641, ecapa_loss=0.0001805, whisper_loss=0.1277, over 14625.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01059, ecapa_loss=0.0001544, whisper_loss=0.09088, over 3816045.79 frames. ], batch size: 57, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:36:08,530 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.05 vs. limit=15.0 2024-08-14 11:36:16,131 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-14 11:36:19,657 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.323e+01 2.596e+01 2.984e+01 3.829e+01, threshold=5.192e+01, percent-clipped=0.0 2024-08-14 11:36:48,827 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-14 11:36:58,734 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2636720.0, ans=0.125 2024-08-14 11:37:07,292 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 24 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-14 11:37:29,955 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2636920.0, ans=0.125 2024-08-14 11:37:39,791 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2636920.0, ans=0.125 2024-08-14 11:37:48,250 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 2850, loss[loss=0.1051, beats_loss=0.01171, ecapa_loss=0.000155, whisper_loss=0.09187, over 21597.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01061, ecapa_loss=0.000153, whisper_loss=0.0916, over 3843720.12 frames. ], batch size: 87, lr: 3.31e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:37:51,552 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2637020.0, ans=0.1 2024-08-14 11:37:58,296 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2637020.0, ans=0.2 2024-08-14 11:38:00,242 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 16 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-14 11:38:04,901 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.63 vs. limit=22.5 2024-08-14 11:38:16,477 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 15 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-14 11:38:38,253 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 24 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-14 11:38:56,555 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2637220.0, ans=0.1 2024-08-14 11:39:14,495 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2637320.0, ans=0.2 2024-08-14 11:39:38,547 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2637420.0, ans=0.2 2024-08-14 11:39:51,064 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 2900, loss[loss=0.1062, beats_loss=0.0112, ecapa_loss=0.0001297, whisper_loss=0.0937, over 21238.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01072, ecapa_loss=0.0001542, whisper_loss=0.09087, over 3847328.29 frames. ], batch size: 83, lr: 3.31e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:39:53,069 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 22 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-14 11:40:00,070 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 30 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-14 11:40:05,933 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.270e+01 2.545e+01 2.877e+01 7.977e+01, threshold=5.090e+01, percent-clipped=2.0 2024-08-14 11:40:10,389 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2637520.0, ans=0.2 2024-08-14 11:41:02,621 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 36 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-14 11:41:38,846 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 35 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-14 11:41:51,406 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 24 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-14 11:41:57,202 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 2950, loss[loss=0.1003, beats_loss=0.01207, ecapa_loss=0.000157, whisper_loss=0.08663, over 21610.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01077, ecapa_loss=0.0001552, whisper_loss=0.0909, over 3901381.61 frames. ], batch size: 89, lr: 3.31e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:43:43,238 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 11 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-14 11:43:53,474 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 15 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-14 11:43:56,132 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 3000, loss[loss=0.1052, beats_loss=0.01245, ecapa_loss=0.0001088, whisper_loss=0.09166, over 20041.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01086, ecapa_loss=0.0001546, whisper_loss=0.09, over 3900742.01 frames. ], batch size: 76, lr: 3.31e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:43:56,132 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-14 11:44:34,422 INFO [train_multi_KD3.py:1149] (1/4) Epoch 19, validation on ASR_libri: loss=0.2526, beats_loss=0, ecapa_loss=0.0005472, whisper_loss=0.2471, over 922467.00 frames. 2024-08-14 11:44:51,313 INFO [train_multi_KD3.py:1149] (1/4) Epoch 19, validation on SV_voxceleb1: loss=0.00425, beats_loss=0, ecapa_loss=0.000425, whisper_loss=0, over 939242.00 frames. 2024-08-14 11:46:45,983 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.7042, 3.0173, 2.1714, 3.3556], device='cuda:1') 2024-08-14 11:46:48,000 INFO [train_multi_KD3.py:1149] (1/4) Epoch 19, validation on AT_audioset: loss=0.02345, beats_loss=0.02345, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 11:46:48,004 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 30639MB 2024-08-14 11:46:48,428 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2638520.0, ans=0.1 2024-08-14 11:46:49,729 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2638520.0, ans=0.0 2024-08-14 11:46:52,752 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2638520.0, ans=0.125 2024-08-14 11:46:57,815 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.515e+01 2.846e+01 3.137e+01 6.212e+01, threshold=5.693e+01, percent-clipped=1.0 2024-08-14 11:47:00,412 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 34 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-14 11:47:19,586 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2638720.0, ans=0.125 2024-08-14 11:47:23,541 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 23 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-14 11:47:35,744 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2638820.0, ans=0.0 2024-08-14 11:47:36,915 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2638820.0, ans=0.125 2024-08-14 11:47:40,041 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2638820.0, ans=0.05 2024-08-14 11:47:43,033 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2638820.0, ans=0.05 2024-08-14 11:47:44,731 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.95 vs. limit=22.5 2024-08-14 11:47:46,025 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2638820.0, ans=0.0 2024-08-14 11:47:55,160 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2638920.0, ans=0.2 2024-08-14 11:47:56,600 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 28 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-14 11:48:02,774 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 18 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-14 11:48:04,839 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2638920.0, ans=0.0 2024-08-14 11:48:07,507 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 3050, loss[loss=0.1154, beats_loss=0.01003, ecapa_loss=0.0001565, whisper_loss=0.1038, over 23376.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01081, ecapa_loss=0.0001535, whisper_loss=0.0912, over 3905344.61 frames. ], batch size: 93, lr: 3.31e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:48:17,368 INFO [train_multi_KD3.py:844] (1/4) A total of 99 cuts. 23 from LS+wenet, 23 from Vox, 53 fro AS 2024-08-14 11:48:18,963 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 18 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-14 11:48:24,896 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.38 vs. limit=22.5 2024-08-14 11:49:10,135 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 18 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-14 11:49:21,146 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2639420.0, ans=0.2 2024-08-14 11:49:30,165 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 3100, loss[loss=0.09045, beats_loss=0.01164, ecapa_loss=0.000126, whisper_loss=0.07755, over 18984.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01084, ecapa_loss=0.0001533, whisper_loss=0.09084, over 3893958.56 frames. ], batch size: 75, lr: 3.31e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:49:39,671 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.912e+01 2.342e+01 2.628e+01 3.036e+01 4.820e+01, threshold=5.256e+01, percent-clipped=0.0 2024-08-14 11:49:40,084 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2639520.0, ans=0.125 2024-08-14 11:49:40,185 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2639520.0, ans=0.125 2024-08-14 11:50:12,559 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2639720.0, ans=0.04949747468305833 2024-08-14 11:50:20,229 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 23 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-14 11:50:29,670 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 30 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-14 11:50:31,341 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=2639920.0, ans=0.05 2024-08-14 11:50:35,803 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2639920.0, ans=0.1 2024-08-14 11:50:49,417 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 3150, loss[loss=0.1186, beats_loss=0.01042, ecapa_loss=0.0001761, whisper_loss=0.1064, over 22644.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01088, ecapa_loss=0.0001533, whisper_loss=0.09059, over 3901738.55 frames. ], batch size: 88, lr: 3.31e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:51:01,286 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 22 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-14 11:51:10,824 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2640120.0, ans=0.125 2024-08-14 11:51:16,949 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 15 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-14 11:51:34,529 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 31 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-14 11:51:36,861 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2640320.0, ans=0.125 2024-08-14 11:51:40,994 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2640320.0, ans=0.05 2024-08-14 11:52:06,887 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 3200, loss[loss=0.0847, beats_loss=0.01197, ecapa_loss=0.000126, whisper_loss=0.07147, over 19255.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01087, ecapa_loss=0.0001537, whisper_loss=0.09043, over 3879016.10 frames. ], batch size: 76, lr: 3.31e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:52:16,901 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.565e+01 2.359e+01 2.591e+01 2.913e+01 5.020e+01, threshold=5.181e+01, percent-clipped=0.0 2024-08-14 11:52:32,136 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-14 11:53:19,457 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2640920.0, ans=0.125 2024-08-14 11:53:22,230 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2641020.0, ans=0.1 2024-08-14 11:53:22,973 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 3250, loss[loss=0.1123, beats_loss=0.00882, ecapa_loss=0.0001491, whisper_loss=0.102, over 19186.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01084, ecapa_loss=0.0001542, whisper_loss=0.09089, over 3887897.49 frames. ], batch size: 72, lr: 3.31e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:53:33,180 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 32 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-14 11:53:35,558 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2641020.0, ans=0.125 2024-08-14 11:53:50,250 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2641120.0, ans=0.2 2024-08-14 11:54:00,163 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.538e+05 2024-08-14 11:54:06,003 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-14 11:54:11,127 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2641320.0, ans=0.0 2024-08-14 11:54:14,570 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2641320.0, ans=0.0 2024-08-14 11:54:32,265 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2641420.0, ans=0.0 2024-08-14 11:54:35,654 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2641420.0, ans=0.1 2024-08-14 11:54:41,625 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 11:54:43,993 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 3300, loss[loss=0.09466, beats_loss=0.01071, ecapa_loss=0.0001695, whisper_loss=0.08226, over 20290.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01079, ecapa_loss=0.0001552, whisper_loss=0.09144, over 3909200.03 frames. ], batch size: 87, lr: 3.31e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:54:54,010 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.805e+01 2.343e+01 2.686e+01 3.135e+01 1.274e+02, threshold=5.372e+01, percent-clipped=3.0 2024-08-14 11:54:56,754 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.79 vs. limit=12.0 2024-08-14 11:55:02,759 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-14 11:55:06,060 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2641620.0, ans=0.1 2024-08-14 11:55:28,121 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 27 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-14 11:55:32,978 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2641820.0, ans=0.125 2024-08-14 11:55:34,324 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2641820.0, ans=0.2 2024-08-14 11:55:52,884 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2641920.0, ans=0.125 2024-08-14 11:55:57,356 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 34 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-14 11:56:02,937 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 32 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-14 11:56:04,012 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 3350, loss[loss=0.1327, beats_loss=0.007133, ecapa_loss=0.0001996, whisper_loss=0.1235, over 18239.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01069, ecapa_loss=0.0001555, whisper_loss=0.09169, over 3885701.86 frames. ], batch size: 72, lr: 3.31e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:56:26,533 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2642120.0, ans=0.1 2024-08-14 11:56:42,098 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 33 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-14 11:56:54,054 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 31 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-14 11:57:17,012 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.27 vs. limit=6.0 2024-08-14 11:57:18,252 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.56 vs. limit=15.0 2024-08-14 11:57:23,065 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 3400, loss[loss=0.1066, beats_loss=0.009957, ecapa_loss=0.0001568, whisper_loss=0.09512, over 22583.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01077, ecapa_loss=0.0001553, whisper_loss=0.09094, over 3902223.48 frames. ], batch size: 91, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 11:57:30,567 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.32 vs. limit=15.0 2024-08-14 11:57:34,542 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.929e+01 2.486e+01 2.836e+01 3.327e+01 1.695e+02, threshold=5.673e+01, percent-clipped=4.0 2024-08-14 11:57:44,875 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2642620.0, ans=0.0 2024-08-14 11:57:50,490 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 17 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-14 11:57:56,514 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2642720.0, ans=0.0 2024-08-14 11:58:08,886 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2642720.0, ans=0.1 2024-08-14 11:58:22,704 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2642820.0, ans=0.125 2024-08-14 11:58:24,484 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2642820.0, ans=0.0 2024-08-14 11:58:25,429 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 32 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-14 11:58:43,967 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 3450, loss[loss=0.1109, beats_loss=0.01279, ecapa_loss=0.0001319, whisper_loss=0.09676, over 22750.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01087, ecapa_loss=0.0001556, whisper_loss=0.09021, over 3878872.44 frames. ], batch size: 91, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 11:58:47,266 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2643020.0, ans=0.125 2024-08-14 11:58:49,374 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.93 vs. limit=15.0 2024-08-14 11:58:57,794 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2643020.0, ans=0.2 2024-08-14 11:58:57,872 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2643020.0, ans=0.125 2024-08-14 11:59:02,077 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 24 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-14 11:59:05,110 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2643120.0, ans=0.125 2024-08-14 11:59:09,832 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 21 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-14 11:59:27,621 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2643220.0, ans=0.125 2024-08-14 11:59:30,317 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-14 11:59:33,955 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.29 vs. limit=12.0 2024-08-14 11:59:38,301 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2643320.0, ans=0.125 2024-08-14 11:59:42,834 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 24 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-14 11:59:43,070 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2643320.0, ans=0.125 2024-08-14 12:00:00,113 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.54 vs. limit=12.0 2024-08-14 12:00:03,600 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 3500, loss[loss=0.08642, beats_loss=0.01127, ecapa_loss=0.0001742, whisper_loss=0.07341, over 22180.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01084, ecapa_loss=0.0001563, whisper_loss=0.09006, over 3898142.02 frames. ], batch size: 95, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:00:08,122 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 16 from LS+wenet, 22 from Vox, 17 fro AS 2024-08-14 12:00:13,612 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2643520.0, ans=0.125 2024-08-14 12:00:16,084 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.678e+01 2.311e+01 2.583e+01 2.814e+01 3.893e+01, threshold=5.167e+01, percent-clipped=0.0 2024-08-14 12:00:45,137 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 24 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-14 12:01:00,674 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 12 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-14 12:01:10,324 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=2643920.0, ans=15.0 2024-08-14 12:01:23,958 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2643920.0, ans=0.0 2024-08-14 12:01:26,607 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 3550, loss[loss=0.1202, beats_loss=0.01128, ecapa_loss=0.0001185, whisper_loss=0.1078, over 23859.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01082, ecapa_loss=0.0001543, whisper_loss=0.09044, over 3912370.09 frames. ], batch size: 91, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:01:44,119 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 15 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-14 12:01:50,730 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 13 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-14 12:01:56,665 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2644120.0, ans=0.125 2024-08-14 12:02:06,347 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 16 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-14 12:02:14,562 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 27 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-14 12:02:35,543 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2644420.0, ans=0.1 2024-08-14 12:02:36,961 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2644420.0, ans=0.125 2024-08-14 12:02:45,041 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.72 vs. limit=15.0 2024-08-14 12:02:51,341 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 3600, loss[loss=0.1127, beats_loss=0.00834, ecapa_loss=0.0001736, whisper_loss=0.1026, over 15958.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01084, ecapa_loss=0.000154, whisper_loss=0.09048, over 3909191.31 frames. ], batch size: 62, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:03:01,250 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=2644520.0, ans=0.2 2024-08-14 12:03:01,451 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.27 vs. limit=15.0 2024-08-14 12:03:01,952 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.967e+01 2.481e+01 2.628e+01 2.850e+01 4.421e+01, threshold=5.257e+01, percent-clipped=0.0 2024-08-14 12:03:10,868 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2644620.0, ans=0.0 2024-08-14 12:03:13,967 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2644620.0, ans=0.2 2024-08-14 12:03:23,409 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2644720.0, ans=0.125 2024-08-14 12:03:25,204 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2644720.0, ans=0.125 2024-08-14 12:03:25,275 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 12:03:34,625 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2644720.0, ans=0.0 2024-08-14 12:04:08,658 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 3650, loss[loss=0.104, beats_loss=0.009882, ecapa_loss=0.0001085, whisper_loss=0.09306, over 24629.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01075, ecapa_loss=0.0001543, whisper_loss=0.09107, over 3913314.12 frames. ], batch size: 89, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:04:15,014 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 23 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-14 12:04:19,781 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.95 vs. limit=15.0 2024-08-14 12:05:24,165 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 3700, loss[loss=0.09148, beats_loss=0.0126, ecapa_loss=0.000172, whisper_loss=0.07715, over 22067.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01082, ecapa_loss=0.000154, whisper_loss=0.0907, over 3891060.91 frames. ], batch size: 91, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:05:24,664 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2645520.0, ans=0.0 2024-08-14 12:05:32,689 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 24 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-14 12:05:33,725 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.777e+01 2.298e+01 2.531e+01 2.738e+01 1.071e+02, threshold=5.062e+01, percent-clipped=1.0 2024-08-14 12:05:38,912 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2645620.0, ans=0.07 2024-08-14 12:05:50,033 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.76 vs. limit=22.5 2024-08-14 12:05:56,172 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 20 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-14 12:06:06,678 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 23 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-14 12:06:06,869 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2645820.0, ans=0.125 2024-08-14 12:06:31,368 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2645920.0, ans=0.035 2024-08-14 12:06:39,004 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 3750, loss[loss=0.08998, beats_loss=0.01156, ecapa_loss=0.0001671, whisper_loss=0.07675, over 15803.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01083, ecapa_loss=0.0001547, whisper_loss=0.09069, over 3881874.03 frames. ], batch size: 64, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:06:50,664 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 25 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-14 12:07:05,179 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 12 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-14 12:07:16,537 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.13 vs. limit=15.0 2024-08-14 12:07:20,744 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2646220.0, ans=0.1 2024-08-14 12:07:31,018 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2646320.0, ans=0.125 2024-08-14 12:07:39,422 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2646420.0, ans=0.2 2024-08-14 12:07:47,490 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2646420.0, ans=0.125 2024-08-14 12:07:56,002 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 3800, loss[loss=0.08942, beats_loss=0.01257, ecapa_loss=0.0001375, whisper_loss=0.07548, over 19330.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01085, ecapa_loss=0.0001556, whisper_loss=0.09049, over 3902835.65 frames. ], batch size: 77, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:08:05,967 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.055e+01 2.378e+01 2.670e+01 2.953e+01 4.426e+01, threshold=5.341e+01, percent-clipped=0.0 2024-08-14 12:08:20,326 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.60 vs. limit=12.0 2024-08-14 12:08:38,746 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2646720.0, ans=0.125 2024-08-14 12:08:53,587 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2646820.0, ans=0.125 2024-08-14 12:09:03,330 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.45 vs. limit=12.0 2024-08-14 12:09:14,120 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 3850, loss[loss=0.09195, beats_loss=0.01346, ecapa_loss=0.0001623, whisper_loss=0.07686, over 21949.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01084, ecapa_loss=0.0001553, whisper_loss=0.09067, over 3921511.42 frames. ], batch size: 91, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:09:17,630 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2647020.0, ans=0.125 2024-08-14 12:09:56,079 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2647220.0, ans=0.125 2024-08-14 12:09:56,521 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.32 vs. limit=15.0 2024-08-14 12:10:17,828 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-14 12:10:35,108 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 3900, loss[loss=0.1061, beats_loss=0.01068, ecapa_loss=0.0001763, whisper_loss=0.09366, over 16676.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01077, ecapa_loss=0.0001561, whisper_loss=0.09143, over 3907712.84 frames. ], batch size: 69, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:10:39,463 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=2647520.0, ans=0.2 2024-08-14 12:10:40,312 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-14 12:10:47,107 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2647520.0, ans=0.1 2024-08-14 12:10:48,098 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.360e+01 2.691e+01 2.914e+01 3.544e+02, threshold=5.383e+01, percent-clipped=1.0 2024-08-14 12:10:53,846 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.54 vs. limit=12.0 2024-08-14 12:11:00,991 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2647620.0, ans=0.125 2024-08-14 12:11:09,622 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2647720.0, ans=0.125 2024-08-14 12:11:14,671 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 24 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-14 12:11:26,180 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.04 vs. limit=6.0 2024-08-14 12:11:45,165 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 22 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-14 12:11:45,510 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2647920.0, ans=0.09899494936611666 2024-08-14 12:11:46,774 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 23 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-14 12:11:50,887 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-14 12:12:01,285 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 27 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-14 12:12:05,062 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 3950, loss[loss=0.107, beats_loss=0.01191, ecapa_loss=0.000157, whisper_loss=0.09355, over 22918.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01068, ecapa_loss=0.0001575, whisper_loss=0.09199, over 3889245.08 frames. ], batch size: 91, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:12:23,580 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.04 vs. limit=15.0 2024-08-14 12:12:33,510 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 27 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-14 12:13:04,941 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2648320.0, ans=0.125 2024-08-14 12:13:16,591 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 24 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-14 12:13:21,059 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2648320.0, ans=0.05 2024-08-14 12:13:34,280 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.25 vs. limit=22.5 2024-08-14 12:13:36,645 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2648420.0, ans=0.125 2024-08-14 12:13:41,408 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2648420.0, ans=0.125 2024-08-14 12:13:52,321 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 4000, loss[loss=0.103, beats_loss=0.01384, ecapa_loss=0.0001404, whisper_loss=0.08774, over 23033.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01066, ecapa_loss=0.0001578, whisper_loss=0.0919, over 3878465.03 frames. ], batch size: 94, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:14:07,567 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.071e+01 2.464e+01 2.683e+01 2.941e+01 4.279e+01, threshold=5.366e+01, percent-clipped=0.0 2024-08-14 12:14:31,691 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 33 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-14 12:14:49,434 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2648720.0, ans=0.0 2024-08-14 12:14:50,998 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-14 12:15:00,433 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2648820.0, ans=0.125 2024-08-14 12:15:21,973 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 13 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-14 12:15:51,679 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2649020.0, ans=0.1 2024-08-14 12:15:52,709 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 4050, loss[loss=0.1153, beats_loss=0.009101, ecapa_loss=0.0001638, whisper_loss=0.1045, over 15096.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01068, ecapa_loss=0.0001576, whisper_loss=0.09172, over 3871379.75 frames. ], batch size: 58, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:16:36,888 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2649220.0, ans=0.125 2024-08-14 12:16:57,463 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2649320.0, ans=0.1 2024-08-14 12:17:01,949 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 16 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-14 12:17:18,439 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2649420.0, ans=0.125 2024-08-14 12:17:20,840 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 4100, loss[loss=0.09254, beats_loss=0.00914, ecapa_loss=0.0001825, whisper_loss=0.08158, over 15247.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01076, ecapa_loss=0.0001574, whisper_loss=0.09099, over 3860106.16 frames. ], batch size: 64, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:17:27,267 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2649520.0, ans=0.09899494936611666 2024-08-14 12:17:33,408 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.700e+01 2.297e+01 2.541e+01 2.897e+01 6.382e+01, threshold=5.082e+01, percent-clipped=1.0 2024-08-14 12:17:36,299 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.71 vs. limit=10.0 2024-08-14 12:17:42,487 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2649620.0, ans=0.05 2024-08-14 12:18:43,220 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.93 vs. limit=15.0 2024-08-14 12:18:53,341 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 4150, loss[loss=0.1141, beats_loss=0.008179, ecapa_loss=0.0001965, whisper_loss=0.104, over 21022.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01065, ecapa_loss=0.0001586, whisper_loss=0.09178, over 3886029.64 frames. ], batch size: 84, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:19:02,471 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2650020.0, ans=0.1 2024-08-14 12:19:18,480 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2650120.0, ans=0.125 2024-08-14 12:19:18,562 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2650120.0, ans=0.2 2024-08-14 12:19:34,272 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 27 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-14 12:19:39,356 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 18 from LS+wenet, 32 from Vox, 36 fro AS 2024-08-14 12:20:08,143 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.52 vs. limit=15.0 2024-08-14 12:20:16,655 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 4200, loss[loss=0.1066, beats_loss=0.01056, ecapa_loss=0.0001595, whisper_loss=0.0944, over 19098.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01066, ecapa_loss=0.0001588, whisper_loss=0.0921, over 3913396.05 frames. ], batch size: 75, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:20:26,409 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 19 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-14 12:20:27,462 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.777e+01 2.390e+01 2.581e+01 2.872e+01 4.290e+01, threshold=5.161e+01, percent-clipped=0.0 2024-08-14 12:20:38,169 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.05 vs. limit=15.0 2024-08-14 12:20:44,018 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2650620.0, ans=0.125 2024-08-14 12:20:44,104 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2650620.0, ans=0.125 2024-08-14 12:21:02,480 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2650820.0, ans=0.125 2024-08-14 12:21:16,713 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2650820.0, ans=0.0 2024-08-14 12:21:36,308 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 4250, loss[loss=0.08164, beats_loss=0.01363, ecapa_loss=9.849e-05, whisper_loss=0.06702, over 15542.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01072, ecapa_loss=0.0001573, whisper_loss=0.09196, over 3914313.90 frames. ], batch size: 60, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:21:37,375 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2651020.0, ans=0.0 2024-08-14 12:21:47,062 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2651020.0, ans=0.1 2024-08-14 12:21:52,445 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2651120.0, ans=0.125 2024-08-14 12:22:03,613 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2651120.0, ans=0.0 2024-08-14 12:22:09,143 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2651220.0, ans=0.125 2024-08-14 12:22:44,586 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2651420.0, ans=0.125 2024-08-14 12:22:44,682 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2651420.0, ans=0.0 2024-08-14 12:22:55,519 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2651420.0, ans=0.125 2024-08-14 12:22:57,755 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 4300, loss[loss=0.06966, beats_loss=0.0117, ecapa_loss=0.0001283, whisper_loss=0.05667, over 15889.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01073, ecapa_loss=0.0001573, whisper_loss=0.09136, over 3890942.33 frames. ], batch size: 62, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:22:58,530 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2651520.0, ans=0.125 2024-08-14 12:23:08,849 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.823e+01 2.461e+01 2.630e+01 3.002e+01 3.746e+02, threshold=5.260e+01, percent-clipped=1.0 2024-08-14 12:23:25,257 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-14 12:23:28,512 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 32 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-14 12:23:33,437 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2651720.0, ans=0.125 2024-08-14 12:23:39,692 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.66 vs. limit=15.0 2024-08-14 12:23:54,065 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.80 vs. limit=15.0 2024-08-14 12:24:05,291 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.58 vs. limit=15.0 2024-08-14 12:24:07,400 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 18 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-14 12:24:15,544 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 4350, loss[loss=0.1068, beats_loss=0.006122, ecapa_loss=0.0001568, whisper_loss=0.09907, over 15386.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01066, ecapa_loss=0.0001573, whisper_loss=0.09086, over 3859376.44 frames. ], batch size: 55, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:24:20,455 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 16 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-14 12:24:36,027 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2652120.0, ans=0.1 2024-08-14 12:24:48,728 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-14 12:25:01,226 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 23 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-14 12:25:03,255 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2652320.0, ans=0.125 2024-08-14 12:25:05,877 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2652320.0, ans=0.125 2024-08-14 12:25:07,306 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 30 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-14 12:25:13,275 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 21 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-14 12:25:17,365 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 33 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-14 12:25:30,615 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 4400, loss[loss=0.1084, beats_loss=0.008985, ecapa_loss=0.0001829, whisper_loss=0.09762, over 16079.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01059, ecapa_loss=0.0001575, whisper_loss=0.09139, over 3870648.27 frames. ], batch size: 66, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:25:36,703 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 18 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-14 12:25:36,973 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2652520.0, ans=0.125 2024-08-14 12:25:40,709 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.793e+01 2.397e+01 2.574e+01 2.948e+01 5.281e+01, threshold=5.148e+01, percent-clipped=1.0 2024-08-14 12:25:46,675 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 14 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-14 12:25:51,827 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.70 vs. limit=22.5 2024-08-14 12:25:52,781 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2652620.0, ans=0.2 2024-08-14 12:25:54,251 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2652620.0, ans=0.1 2024-08-14 12:25:54,269 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2652620.0, ans=0.125 2024-08-14 12:26:03,906 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 33 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-14 12:26:07,772 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.12 vs. limit=22.5 2024-08-14 12:26:17,001 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-14 12:26:17,511 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2652820.0, ans=0.07 2024-08-14 12:26:22,787 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-14 12:26:41,748 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2652920.0, ans=0.125 2024-08-14 12:26:43,958 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 4450, loss[loss=0.08584, beats_loss=0.01124, ecapa_loss=0.0001631, whisper_loss=0.07297, over 17438.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01058, ecapa_loss=0.0001577, whisper_loss=0.09172, over 3863864.57 frames. ], batch size: 71, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:27:06,416 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2653120.0, ans=0.0 2024-08-14 12:27:13,720 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.20 vs. limit=15.0 2024-08-14 12:27:16,874 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2653220.0, ans=0.125 2024-08-14 12:27:23,103 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2653220.0, ans=0.0 2024-08-14 12:27:42,240 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.01 vs. limit=15.0 2024-08-14 12:27:43,893 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.60 vs. limit=15.0 2024-08-14 12:27:51,763 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 27 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-14 12:27:55,201 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2653420.0, ans=0.0 2024-08-14 12:27:57,775 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 4500, loss[loss=0.09924, beats_loss=0.01179, ecapa_loss=0.0001437, whisper_loss=0.08601, over 21153.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.0106, ecapa_loss=0.0001564, whisper_loss=0.09172, over 3876255.00 frames. ], batch size: 87, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:28:03,225 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=5.972e-02 2024-08-14 12:28:07,531 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2653520.0, ans=0.1 2024-08-14 12:28:08,302 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.507e+01 2.296e+01 2.547e+01 2.865e+01 4.084e+01, threshold=5.093e+01, percent-clipped=0.0 2024-08-14 12:28:53,304 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2653820.0, ans=0.125 2024-08-14 12:29:03,517 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 21 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-14 12:29:13,869 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 4550, loss[loss=0.0909, beats_loss=0.01254, ecapa_loss=0.0001439, whisper_loss=0.07693, over 15070.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01066, ecapa_loss=0.0001564, whisper_loss=0.0913, over 3855925.54 frames. ], batch size: 62, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:29:16,250 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2654020.0, ans=0.1 2024-08-14 12:29:49,481 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2654220.0, ans=0.0 2024-08-14 12:29:49,539 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2654220.0, ans=0.125 2024-08-14 12:30:01,591 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2654320.0, ans=0.125 2024-08-14 12:30:08,680 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2654320.0, ans=0.2 2024-08-14 12:30:12,955 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2654420.0, ans=0.125 2024-08-14 12:30:19,035 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.518e-03 2024-08-14 12:30:24,835 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 26 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-14 12:30:29,088 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 4600, loss[loss=0.1054, beats_loss=0.009279, ecapa_loss=0.0001658, whisper_loss=0.09447, over 16338.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01061, ecapa_loss=0.0001575, whisper_loss=0.09141, over 3860194.05 frames. ], batch size: 61, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:30:29,902 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2654520.0, ans=0.125 2024-08-14 12:30:30,982 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 29 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-14 12:30:35,575 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 19 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-14 12:30:35,984 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2654520.0, ans=0.2 2024-08-14 12:30:39,991 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.934e+01 2.344e+01 2.580e+01 2.840e+01 1.542e+02, threshold=5.160e+01, percent-clipped=2.0 2024-08-14 12:30:40,604 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2654520.0, ans=0.0 2024-08-14 12:30:46,503 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 20 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-14 12:30:49,383 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2654620.0, ans=0.1 2024-08-14 12:30:51,329 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2654620.0, ans=0.0 2024-08-14 12:31:00,466 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2654720.0, ans=0.1 2024-08-14 12:31:15,986 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2654820.0, ans=0.125 2024-08-14 12:31:36,838 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.71 vs. limit=15.0 2024-08-14 12:31:40,963 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2654920.0, ans=0.2 2024-08-14 12:31:46,972 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 4650, loss[loss=0.08425, beats_loss=0.01174, ecapa_loss=0.0001461, whisper_loss=0.07105, over 21844.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01065, ecapa_loss=0.0001576, whisper_loss=0.09151, over 3883758.45 frames. ], batch size: 90, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:32:56,608 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.41 vs. limit=15.0 2024-08-14 12:33:01,907 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.84 vs. limit=15.0 2024-08-14 12:33:02,824 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 25 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-14 12:33:08,337 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 22 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-14 12:33:11,917 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 28 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-14 12:33:13,265 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 4700, loss[loss=0.1152, beats_loss=0.01081, ecapa_loss=0.0001417, whisper_loss=0.103, over 20551.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01068, ecapa_loss=0.0001573, whisper_loss=0.09161, over 3886240.14 frames. ], batch size: 81, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:33:16,650 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 8 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-14 12:33:16,831 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2655520.0, ans=0.1 2024-08-14 12:33:25,142 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.883e+01 2.291e+01 2.499e+01 2.871e+01 5.538e+01, threshold=4.999e+01, percent-clipped=1.0 2024-08-14 12:34:25,925 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2655920.0, ans=0.0 2024-08-14 12:34:33,331 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 35 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-14 12:34:37,948 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 4750, loss[loss=0.1032, beats_loss=0.01161, ecapa_loss=0.0001739, whisper_loss=0.08981, over 21550.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01073, ecapa_loss=0.0001566, whisper_loss=0.09139, over 3918599.69 frames. ], batch size: 91, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:34:38,368 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2656020.0, ans=0.2 2024-08-14 12:34:38,700 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.13 vs. limit=22.5 2024-08-14 12:34:45,082 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 30 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-14 12:34:48,299 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2656020.0, ans=0.5 2024-08-14 12:34:49,720 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 20 from LS+wenet, 35 from Vox, 38 fro AS 2024-08-14 12:35:23,214 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 31 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-14 12:35:24,718 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 25 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-14 12:35:25,169 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2656320.0, ans=0.2 2024-08-14 12:35:38,115 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 13 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-14 12:35:42,125 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.55 vs. limit=12.0 2024-08-14 12:35:51,513 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 4800, loss[loss=0.1089, beats_loss=0.009343, ecapa_loss=0.0001798, whisper_loss=0.09779, over 17267.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01074, ecapa_loss=0.0001574, whisper_loss=0.09071, over 3928731.10 frames. ], batch size: 72, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:36:02,325 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.391e+01 2.624e+01 2.971e+01 4.050e+02, threshold=5.248e+01, percent-clipped=2.0 2024-08-14 12:36:02,607 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 33 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-14 12:36:18,364 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-14 12:36:19,914 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-14 12:36:29,232 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2656720.0, ans=0.125 2024-08-14 12:36:46,678 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 33 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-14 12:37:05,732 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 4850, loss[loss=0.1293, beats_loss=0.008682, ecapa_loss=0.0001366, whisper_loss=0.1193, over 20638.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.0107, ecapa_loss=0.0001581, whisper_loss=0.09119, over 3920044.00 frames. ], batch size: 78, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:37:07,400 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 37 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-14 12:37:18,161 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 31 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-14 12:37:27,722 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2657120.0, ans=0.0 2024-08-14 12:37:29,894 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=3.92 vs. limit=15.0 2024-08-14 12:37:48,490 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 22 from LS+wenet, 14 from Vox, 18 fro AS 2024-08-14 12:37:54,564 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-14 12:37:57,948 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 22 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-14 12:38:12,937 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2657420.0, ans=0.2 2024-08-14 12:38:15,691 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2657420.0, ans=0.125 2024-08-14 12:38:17,102 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2657420.0, ans=0.125 2024-08-14 12:38:20,937 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 4900, loss[loss=0.07468, beats_loss=0.01227, ecapa_loss=0.0001707, whisper_loss=0.0607, over 17682.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01068, ecapa_loss=0.0001568, whisper_loss=0.09155, over 3914331.22 frames. ], batch size: 74, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:38:31,281 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.774e+01 2.386e+01 2.578e+01 2.812e+01 7.156e+01, threshold=5.157e+01, percent-clipped=2.0 2024-08-14 12:38:38,160 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2657620.0, ans=0.125 2024-08-14 12:38:41,237 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 12:38:56,183 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-14 12:39:08,502 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 19 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-14 12:39:13,166 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 20 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-14 12:39:26,447 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 35 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-14 12:39:36,009 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2658020.0, ans=0.125 2024-08-14 12:39:36,857 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 4950, loss[loss=0.09697, beats_loss=0.01103, ecapa_loss=0.0001621, whisper_loss=0.08432, over 22173.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0107, ecapa_loss=0.0001566, whisper_loss=0.09116, over 3883409.60 frames. ], batch size: 88, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:39:44,687 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.95 vs. limit=15.0 2024-08-14 12:39:45,352 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 17 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-14 12:39:50,070 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2658120.0, ans=0.125 2024-08-14 12:39:56,054 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 28 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-14 12:40:18,342 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 28 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-14 12:40:39,095 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2658420.0, ans=0.2 2024-08-14 12:40:50,165 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 5000, loss[loss=0.1015, beats_loss=0.0105, ecapa_loss=0.0001618, whisper_loss=0.08935, over 13940.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01062, ecapa_loss=0.0001563, whisper_loss=0.09202, over 3871902.90 frames. ], batch size: 54, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:40:55,665 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.75 vs. limit=10.0 2024-08-14 12:40:57,176 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2658520.0, ans=0.0 2024-08-14 12:41:00,996 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.751e+01 2.269e+01 2.546e+01 2.965e+01 4.784e+01, threshold=5.092e+01, percent-clipped=0.0 2024-08-14 12:41:03,220 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2658520.0, ans=0.125 2024-08-14 12:41:13,425 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 25 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-14 12:41:20,548 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2658720.0, ans=0.1 2024-08-14 12:41:32,673 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2658720.0, ans=0.07 2024-08-14 12:41:52,793 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2658920.0, ans=0.1 2024-08-14 12:41:55,903 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2658920.0, ans=0.2 2024-08-14 12:41:59,855 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 36 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-14 12:42:05,616 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 5050, loss[loss=0.1015, beats_loss=0.008934, ecapa_loss=0.0001755, whisper_loss=0.09078, over 16827.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01065, ecapa_loss=0.0001557, whisper_loss=0.09206, over 3856829.82 frames. ], batch size: 67, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:42:12,159 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 22 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-14 12:42:27,127 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 36 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-14 12:42:31,029 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.31 vs. limit=10.0 2024-08-14 12:42:32,737 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.88 vs. limit=5.0 2024-08-14 12:42:34,350 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2659120.0, ans=0.04949747468305833 2024-08-14 12:42:38,536 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2659220.0, ans=0.125 2024-08-14 12:42:41,551 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2659220.0, ans=0.125 2024-08-14 12:43:17,678 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2659420.0, ans=0.125 2024-08-14 12:43:21,864 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 5100, loss[loss=0.1205, beats_loss=0.008715, ecapa_loss=0.0001625, whisper_loss=0.1101, over 13372.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01068, ecapa_loss=0.0001554, whisper_loss=0.09226, over 3873568.60 frames. ], batch size: 53, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:43:28,297 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2659520.0, ans=0.1 2024-08-14 12:43:32,127 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.756e+01 2.354e+01 2.635e+01 2.968e+01 4.253e+01, threshold=5.269e+01, percent-clipped=0.0 2024-08-14 12:43:32,394 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 34 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-14 12:43:53,058 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 40 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-14 12:43:57,256 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 22 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-14 12:43:59,577 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.60 vs. limit=22.5 2024-08-14 12:44:23,267 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2659920.0, ans=0.0 2024-08-14 12:44:36,134 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 5150, loss[loss=0.0767, beats_loss=0.01266, ecapa_loss=0.0001521, whisper_loss=0.06251, over 16308.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01065, ecapa_loss=0.000155, whisper_loss=0.09246, over 3885199.24 frames. ], batch size: 67, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:44:59,463 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2660120.0, ans=0.0 2024-08-14 12:45:13,068 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2660220.0, ans=0.0 2024-08-14 12:45:16,816 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.33 vs. limit=15.0 2024-08-14 12:45:25,083 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 14 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-14 12:45:28,457 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2660320.0, ans=0.0 2024-08-14 12:45:28,903 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2024-08-14 12:45:34,785 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.95 vs. limit=15.0 2024-08-14 12:45:43,269 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2660420.0, ans=0.025 2024-08-14 12:45:49,142 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2660420.0, ans=0.0 2024-08-14 12:45:51,583 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 5200, loss[loss=0.1145, beats_loss=0.009269, ecapa_loss=0.0001719, whisper_loss=0.1035, over 14608.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01067, ecapa_loss=0.0001544, whisper_loss=0.09206, over 3855588.62 frames. ], batch size: 54, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:45:58,775 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2660520.0, ans=0.125 2024-08-14 12:46:02,494 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.363e+01 2.791e+01 3.410e+01 2.422e+02, threshold=5.583e+01, percent-clipped=4.0 2024-08-14 12:46:10,539 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 21 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-14 12:46:18,349 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 24 from LS+wenet, 21 from Vox, 17 fro AS 2024-08-14 12:46:18,685 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.804e+05 2024-08-14 12:46:25,514 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 38 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-14 12:46:26,935 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 24 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-14 12:46:48,075 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.61 vs. limit=15.0 2024-08-14 12:46:54,924 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=2660920.0, ans=0.05 2024-08-14 12:47:04,106 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2660920.0, ans=0.125 2024-08-14 12:47:06,557 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 5250, loss[loss=0.1234, beats_loss=0.009039, ecapa_loss=0.0001611, whisper_loss=0.1127, over 23806.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01061, ecapa_loss=0.0001552, whisper_loss=0.09167, over 3849403.86 frames. ], batch size: 93, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:47:08,454 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 22 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-14 12:47:09,037 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.20 vs. limit=22.5 2024-08-14 12:47:15,583 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 16 from LS+wenet, 29 from Vox, 48 fro AS 2024-08-14 12:47:16,920 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 26 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-14 12:47:32,813 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2661120.0, ans=0.1 2024-08-14 12:47:34,042 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 35 from LS+wenet, 12 from Vox, 42 fro AS 2024-08-14 12:47:41,881 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2661220.0, ans=0.0 2024-08-14 12:47:52,469 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.20 vs. limit=15.0 2024-08-14 12:47:56,830 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2661320.0, ans=0.2 2024-08-14 12:48:04,206 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 19 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-14 12:48:04,802 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.22 vs. limit=22.5 2024-08-14 12:48:13,319 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2661420.0, ans=0.2 2024-08-14 12:48:19,315 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2661520.0, ans=0.0 2024-08-14 12:48:20,118 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 5300, loss[loss=0.1205, beats_loss=0.007107, ecapa_loss=0.000125, whisper_loss=0.1122, over 15263.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01057, ecapa_loss=0.0001555, whisper_loss=0.09172, over 3842652.95 frames. ], batch size: 55, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:48:25,036 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.61 vs. limit=15.0 2024-08-14 12:48:29,728 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.879e+01 2.292e+01 2.528e+01 2.841e+01 9.142e+01, threshold=5.056e+01, percent-clipped=1.0 2024-08-14 12:48:48,213 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2661720.0, ans=0.125 2024-08-14 12:48:52,736 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2661720.0, ans=0.0 2024-08-14 12:49:32,845 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2662020.0, ans=0.0 2024-08-14 12:49:33,657 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 5350, loss[loss=0.06036, beats_loss=0.01412, ecapa_loss=0.000132, whisper_loss=0.04492, over 14346.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01067, ecapa_loss=0.0001554, whisper_loss=0.09122, over 3873865.23 frames. ], batch size: 59, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:49:40,020 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 20 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-14 12:49:45,722 WARNING [optim.py:496] (1/4) Scaling gradients by 0.07321541011333466, model_norm_threshold=50.560279846191406 2024-08-14 12:49:45,889 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.720e+04, grad_sumsq=6.720e+04, orig_rms_sq=1.000e+00 2024-08-14 12:50:20,924 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 25 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-14 12:50:27,188 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2662320.0, ans=0.2 2024-08-14 12:50:30,189 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2662320.0, ans=0.125 2024-08-14 12:50:48,789 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 5400, loss[loss=0.09487, beats_loss=0.01103, ecapa_loss=0.0001622, whisper_loss=0.08222, over 18074.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01069, ecapa_loss=0.0001545, whisper_loss=0.09113, over 3867345.88 frames. ], batch size: 73, lr: 3.30e-03, grad_scale: 5.764607523034235e+17 2024-08-14 12:50:58,341 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.993e+01 2.310e+01 2.504e+01 2.679e+01 6.906e+02, threshold=5.009e+01, percent-clipped=1.0 2024-08-14 12:51:08,761 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 24 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-14 12:51:12,546 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 33 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-14 12:51:19,230 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2662720.0, ans=0.125 2024-08-14 12:51:23,505 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2662720.0, ans=0.0 2024-08-14 12:51:33,611 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-14 12:51:36,767 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.91 vs. limit=10.0 2024-08-14 12:51:55,984 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2662920.0, ans=0.2 2024-08-14 12:52:00,847 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 5450, loss[loss=0.1184, beats_loss=0.01067, ecapa_loss=0.0001667, whisper_loss=0.1061, over 19928.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01071, ecapa_loss=0.0001545, whisper_loss=0.09103, over 3866959.41 frames. ], batch size: 79, lr: 3.30e-03, grad_scale: 5.764607523034235e+17 2024-08-14 12:52:11,453 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 21 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-14 12:52:11,750 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2663020.0, ans=0.1 2024-08-14 12:52:11,956 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.30 vs. limit=6.0 2024-08-14 12:52:20,531 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2663120.0, ans=0.1 2024-08-14 12:52:38,039 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-14 12:52:43,865 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 36 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-14 12:53:07,531 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 24 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-14 12:53:09,052 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 20 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-14 12:53:14,783 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 5500, loss[loss=0.1074, beats_loss=0.01082, ecapa_loss=0.000143, whisper_loss=0.09516, over 18784.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01069, ecapa_loss=0.0001544, whisper_loss=0.09101, over 3870390.00 frames. ], batch size: 76, lr: 3.30e-03, grad_scale: 5.764607523034235e+17 2024-08-14 12:53:24,844 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.674e+01 2.510e+01 2.706e+01 3.048e+01 6.260e+01, threshold=5.412e+01, percent-clipped=1.0 2024-08-14 12:53:33,104 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.30 vs. limit=15.0 2024-08-14 12:53:40,448 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2663620.0, ans=0.125 2024-08-14 12:53:50,809 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2663720.0, ans=0.0 2024-08-14 12:53:53,406 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2663720.0, ans=0.125 2024-08-14 12:54:02,832 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2663820.0, ans=0.1 2024-08-14 12:54:05,663 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2663820.0, ans=0.0 2024-08-14 12:54:12,266 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 18 from LS+wenet, 35 from Vox, 38 fro AS 2024-08-14 12:54:12,870 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2663920.0, ans=0.1 2024-08-14 12:54:18,865 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2663920.0, ans=0.125 2024-08-14 12:54:28,850 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 5550, loss[loss=0.09356, beats_loss=0.01125, ecapa_loss=0.0001645, whisper_loss=0.08066, over 18415.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01077, ecapa_loss=0.0001549, whisper_loss=0.09024, over 3879054.11 frames. ], batch size: 77, lr: 3.30e-03, grad_scale: 5.764607523034235e+17 2024-08-14 12:54:32,108 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 23 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-14 12:54:44,270 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2664120.0, ans=0.1 2024-08-14 12:54:51,960 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=2664120.0, ans=10.0 2024-08-14 12:55:19,858 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-14 12:55:21,306 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-14 12:55:43,910 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 5600, loss[loss=0.1162, beats_loss=0.01092, ecapa_loss=0.0001562, whisper_loss=0.1038, over 20878.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01074, ecapa_loss=0.0001547, whisper_loss=0.0902, over 3881315.20 frames. ], batch size: 84, lr: 3.30e-03, grad_scale: 5.764607523034235e+17 2024-08-14 12:55:54,381 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.335e+01 2.676e+01 3.034e+01 3.132e+02, threshold=5.352e+01, percent-clipped=2.0 2024-08-14 12:56:31,492 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2664820.0, ans=0.0 2024-08-14 12:56:42,011 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 33 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-14 12:56:57,457 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 5650, loss[loss=0.1001, beats_loss=0.01266, ecapa_loss=0.0001553, whisper_loss=0.08587, over 21925.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01077, ecapa_loss=0.0001546, whisper_loss=0.09039, over 3909608.46 frames. ], batch size: 90, lr: 3.30e-03, grad_scale: 5.764607523034235e+17 2024-08-14 12:57:04,470 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2665020.0, ans=0.125 2024-08-14 12:57:05,454 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 23 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-14 12:57:08,347 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 27 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-14 12:57:12,591 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 18 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-14 12:57:28,628 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 19 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-14 12:57:30,745 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.17 vs. limit=12.0 2024-08-14 12:57:31,899 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.87 vs. limit=22.5 2024-08-14 12:57:39,154 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.62 vs. limit=15.0 2024-08-14 12:57:56,435 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2665420.0, ans=0.125 2024-08-14 12:58:01,923 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2665420.0, ans=0.0 2024-08-14 12:58:10,369 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 5700, loss[loss=0.09917, beats_loss=0.01406, ecapa_loss=0.0001029, whisper_loss=0.08408, over 19548.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01073, ecapa_loss=0.0001561, whisper_loss=0.09046, over 3881966.47 frames. ], batch size: 75, lr: 3.30e-03, grad_scale: 5.764607523034235e+17 2024-08-14 12:58:20,810 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.840e+01 2.440e+01 2.658e+01 3.007e+01 5.166e+01, threshold=5.317e+01, percent-clipped=0.0 2024-08-14 12:58:23,466 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.16 vs. limit=15.0 2024-08-14 12:58:30,639 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 16 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-14 12:58:34,160 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.34 vs. limit=15.0 2024-08-14 12:58:38,409 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2665620.0, ans=0.125 2024-08-14 12:58:46,696 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 29 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-14 12:59:01,967 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2665820.0, ans=0.2 2024-08-14 12:59:13,805 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2665920.0, ans=0.125 2024-08-14 12:59:23,118 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2665920.0, ans=0.0 2024-08-14 12:59:24,547 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2666020.0, ans=0.125 2024-08-14 12:59:25,382 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 5750, loss[loss=0.09321, beats_loss=0.0131, ecapa_loss=0.0001096, whisper_loss=0.07901, over 14261.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01076, ecapa_loss=0.0001546, whisper_loss=0.09057, over 3887904.90 frames. ], batch size: 55, lr: 3.30e-03, grad_scale: 5.764607523034235e+17 2024-08-14 12:59:26,426 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.37 vs. limit=15.0 2024-08-14 12:59:31,570 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 24 from LS+wenet, 37 from Vox, 25 fro AS 2024-08-14 12:59:31,946 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2666020.0, ans=0.125 2024-08-14 12:59:49,135 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.25 vs. limit=22.5 2024-08-14 12:59:50,001 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 24 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-14 12:59:52,209 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.10 vs. limit=15.0 2024-08-14 12:59:57,832 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2666220.0, ans=0.1 2024-08-14 12:59:58,897 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-14 12:59:59,276 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2666220.0, ans=0.125 2024-08-14 13:00:10,915 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2666320.0, ans=0.0 2024-08-14 13:00:14,844 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 21 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-14 13:00:40,032 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 5800, loss[loss=0.09495, beats_loss=0.01026, ecapa_loss=0.0001449, whisper_loss=0.08324, over 17704.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01074, ecapa_loss=0.000155, whisper_loss=0.09079, over 3898644.21 frames. ], batch size: 70, lr: 3.30e-03, grad_scale: 5.764607523034235e+17 2024-08-14 13:00:42,125 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2666520.0, ans=0.1 2024-08-14 13:00:43,523 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 29 from LS+wenet, 30 from Vox, 27 fro AS 2024-08-14 13:00:45,348 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.97 vs. limit=15.0 2024-08-14 13:00:50,346 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.741e+01 2.337e+01 2.671e+01 3.010e+01 5.088e+01, threshold=5.343e+01, percent-clipped=0.0 2024-08-14 13:00:54,477 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.59 vs. limit=6.0 2024-08-14 13:01:08,622 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 21 from LS+wenet, 15 from Vox, 55 fro AS 2024-08-14 13:01:11,934 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 22 from LS+wenet, 32 from Vox, 40 fro AS 2024-08-14 13:01:20,953 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2666720.0, ans=0.125 2024-08-14 13:01:43,683 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2666920.0, ans=0.0 2024-08-14 13:01:44,622 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 18 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-14 13:01:54,787 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 5850, loss[loss=0.1166, beats_loss=0.008023, ecapa_loss=0.0001896, whisper_loss=0.1066, over 19102.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01081, ecapa_loss=0.0001554, whisper_loss=0.09003, over 3910503.59 frames. ], batch size: 81, lr: 3.30e-03, grad_scale: 5.764607523034235e+17 2024-08-14 13:02:24,404 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2667220.0, ans=0.0 2024-08-14 13:02:32,241 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.86 vs. limit=15.0 2024-08-14 13:02:34,925 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 23 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-14 13:02:40,921 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2667320.0, ans=0.125 2024-08-14 13:02:42,579 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2667320.0, ans=0.04949747468305833 2024-08-14 13:02:48,494 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=3.71 vs. limit=15.0 2024-08-14 13:02:49,213 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-14 13:02:54,076 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2667420.0, ans=0.125 2024-08-14 13:03:08,138 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2667520.0, ans=0.125 2024-08-14 13:03:08,975 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 5900, loss[loss=0.08455, beats_loss=0.01386, ecapa_loss=0.0001686, whisper_loss=0.06901, over 15378.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0107, ecapa_loss=0.000156, whisper_loss=0.09077, over 3913014.93 frames. ], batch size: 65, lr: 3.30e-03, grad_scale: 5.764607523034235e+17 2024-08-14 13:03:12,068 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 35 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-14 13:03:18,976 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.745e+01 2.335e+01 2.608e+01 2.996e+01 4.185e+01, threshold=5.216e+01, percent-clipped=0.0 2024-08-14 13:03:32,760 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2667620.0, ans=0.2 2024-08-14 13:03:40,293 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-14 13:03:45,977 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 18 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-14 13:04:15,645 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 24 from LS+wenet, 33 from Vox, 32 fro AS 2024-08-14 13:04:16,029 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=5.035e-03 2024-08-14 13:04:22,595 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 5950, loss[loss=0.07367, beats_loss=0.01239, ecapa_loss=0.0001358, whisper_loss=0.05993, over 14085.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01084, ecapa_loss=0.0001562, whisper_loss=0.0898, over 3910958.97 frames. ], batch size: 57, lr: 3.30e-03, grad_scale: 5.764607523034235e+17 2024-08-14 13:04:48,782 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2668120.0, ans=0.125 2024-08-14 13:04:54,624 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 23 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-14 13:05:06,976 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2668320.0, ans=0.0 2024-08-14 13:05:08,279 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2668320.0, ans=0.125 2024-08-14 13:05:18,255 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-14 13:05:37,235 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 6000, loss[loss=0.09024, beats_loss=0.01014, ecapa_loss=0.000174, whisper_loss=0.07835, over 18121.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0109, ecapa_loss=0.0001547, whisper_loss=0.09047, over 3926043.91 frames. ], batch size: 76, lr: 3.30e-03, grad_scale: 5.764607523034235e+17 2024-08-14 13:05:37,235 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-14 13:06:13,770 INFO [train_multi_KD3.py:1149] (1/4) Epoch 19, validation on ASR_libri: loss=0.253, beats_loss=0, ecapa_loss=0.000548, whisper_loss=0.2476, over 922467.00 frames. 2024-08-14 13:06:29,888 INFO [train_multi_KD3.py:1149] (1/4) Epoch 19, validation on SV_voxceleb1: loss=0.004318, beats_loss=0, ecapa_loss=0.0004318, whisper_loss=0, over 939242.00 frames. 2024-08-14 13:06:55,428 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.1758, 3.9575, 4.0515, 4.1143], device='cuda:1') 2024-08-14 13:08:18,952 INFO [train_multi_KD3.py:1149] (1/4) Epoch 19, validation on AT_audioset: loss=0.02353, beats_loss=0.02353, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 13:08:18,956 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 30639MB 2024-08-14 13:08:22,587 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2668520.0, ans=0.0 2024-08-14 13:08:25,212 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 27 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-14 13:08:29,209 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.205e+01 2.512e+01 2.812e+01 4.887e+01, threshold=5.023e+01, percent-clipped=0.0 2024-08-14 13:08:31,396 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2668520.0, ans=0.2 2024-08-14 13:08:53,517 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2668720.0, ans=0.0 2024-08-14 13:08:53,663 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2668720.0, ans=0.0 2024-08-14 13:08:58,085 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2668720.0, ans=0.0 2024-08-14 13:09:10,784 WARNING [optim.py:496] (1/4) Scaling gradients by 0.0662800669670105, model_norm_threshold=50.23476028442383 2024-08-14 13:09:10,982 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.20, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.127e+05, grad_sumsq=1.141e+07, orig_rms_sq=9.876e-03 2024-08-14 13:09:16,716 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2668820.0, ans=0.2 2024-08-14 13:09:16,996 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.61 vs. limit=6.0 2024-08-14 13:09:19,653 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2668920.0, ans=0.2 2024-08-14 13:09:20,889 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2668920.0, ans=0.125 2024-08-14 13:09:22,501 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-14 13:09:34,085 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 6050, loss[loss=0.0847, beats_loss=0.01044, ecapa_loss=0.0001708, whisper_loss=0.07256, over 21919.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01088, ecapa_loss=0.000155, whisper_loss=0.09075, over 3925115.13 frames. ], batch size: 93, lr: 3.29e-03, grad_scale: 5.764607523034235e+17 2024-08-14 13:09:49,676 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2669120.0, ans=0.1 2024-08-14 13:09:50,904 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2669120.0, ans=0.0 2024-08-14 13:09:55,637 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2669120.0, ans=0.125 2024-08-14 13:09:57,456 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.08 vs. limit=12.0 2024-08-14 13:10:00,010 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2669120.0, ans=0.2 2024-08-14 13:10:08,805 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2669220.0, ans=0.125 2024-08-14 13:10:14,045 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.08 vs. limit=15.0 2024-08-14 13:10:40,819 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.67 vs. limit=22.5 2024-08-14 13:10:48,937 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 6100, loss[loss=0.1052, beats_loss=0.01293, ecapa_loss=0.0001638, whisper_loss=0.09064, over 18750.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01084, ecapa_loss=0.0001558, whisper_loss=0.09087, over 3885138.11 frames. ], batch size: 76, lr: 3.29e-03, grad_scale: 5.764607523034235e+17 2024-08-14 13:10:59,192 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.993e+01 2.399e+01 2.783e+01 3.218e+01 7.579e+02, threshold=5.567e+01, percent-clipped=5.0 2024-08-14 13:11:04,192 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2669620.0, ans=0.125 2024-08-14 13:11:04,216 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2669620.0, ans=10.0 2024-08-14 13:11:10,586 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.81 vs. limit=15.0 2024-08-14 13:11:14,703 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2669620.0, ans=0.0 2024-08-14 13:11:22,471 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2669720.0, ans=0.125 2024-08-14 13:11:22,840 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.76 vs. limit=15.0 2024-08-14 13:11:42,144 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.062e+05 2024-08-14 13:11:46,447 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 23 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-14 13:11:48,296 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 13:12:04,170 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 6150, loss[loss=0.1222, beats_loss=0.009593, ecapa_loss=0.0001531, whisper_loss=0.111, over 22915.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01082, ecapa_loss=0.0001555, whisper_loss=0.09075, over 3910454.99 frames. ], batch size: 90, lr: 3.29e-03, grad_scale: 5.764607523034235e+17 2024-08-14 13:12:06,373 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2670020.0, ans=0.125 2024-08-14 13:12:14,542 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2670020.0, ans=0.125 2024-08-14 13:13:08,880 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.81 vs. limit=10.0 2024-08-14 13:13:09,905 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2670420.0, ans=0.125 2024-08-14 13:13:18,093 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 6200, loss[loss=0.1134, beats_loss=0.008384, ecapa_loss=0.0001803, whisper_loss=0.1032, over 13864.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01078, ecapa_loss=0.0001551, whisper_loss=0.09112, over 3903990.81 frames. ], batch size: 55, lr: 3.29e-03, grad_scale: 5.764607523034235e+17 2024-08-14 13:13:19,996 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2670520.0, ans=0.125 2024-08-14 13:13:28,515 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.740e+01 2.359e+01 2.589e+01 2.919e+01 1.541e+02, threshold=5.179e+01, percent-clipped=2.0 2024-08-14 13:13:34,393 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2670620.0, ans=0.025 2024-08-14 13:13:53,155 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.73 vs. limit=15.0 2024-08-14 13:14:01,546 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 20 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-14 13:14:06,128 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2670820.0, ans=0.07 2024-08-14 13:14:16,434 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 24 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-14 13:14:24,838 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 20 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-14 13:14:32,012 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 6250, loss[loss=0.109, beats_loss=0.0127, ecapa_loss=0.0001453, whisper_loss=0.09482, over 22557.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01073, ecapa_loss=0.0001553, whisper_loss=0.09087, over 3893381.13 frames. ], batch size: 90, lr: 3.29e-03, grad_scale: 5.764607523034235e+17 2024-08-14 13:14:32,822 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.63 vs. limit=15.0 2024-08-14 13:14:33,639 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 19 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-14 13:14:35,071 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 14 from LS+wenet, 11 from Vox, 34 fro AS 2024-08-14 13:14:40,245 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.79 vs. limit=22.5 2024-08-14 13:15:32,286 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=10.07 vs. limit=10.0 2024-08-14 13:15:45,371 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 6300, loss[loss=0.1103, beats_loss=0.009467, ecapa_loss=0.000184, whisper_loss=0.09902, over 22570.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01078, ecapa_loss=0.0001559, whisper_loss=0.09048, over 3883233.70 frames. ], batch size: 93, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:15:47,566 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.78 vs. limit=22.5 2024-08-14 13:15:51,957 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2671520.0, ans=0.0 2024-08-14 13:15:53,426 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2671520.0, ans=0.0 2024-08-14 13:15:54,986 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2671520.0, ans=0.0 2024-08-14 13:15:57,180 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.285e+01 2.511e+01 2.818e+01 8.993e+01, threshold=5.023e+01, percent-clipped=1.0 2024-08-14 13:16:00,532 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2671620.0, ans=0.125 2024-08-14 13:16:23,938 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 14 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-14 13:16:36,897 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.66 vs. limit=22.5 2024-08-14 13:16:40,763 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2671820.0, ans=0.125 2024-08-14 13:16:59,458 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2672020.0, ans=0.125 2024-08-14 13:17:00,142 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 6350, loss[loss=0.09485, beats_loss=0.01179, ecapa_loss=0.000199, whisper_loss=0.08106, over 20779.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01075, ecapa_loss=0.0001568, whisper_loss=0.09091, over 3874706.34 frames. ], batch size: 90, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:18:09,340 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2672420.0, ans=0.0 2024-08-14 13:18:14,261 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 6400, loss[loss=0.09819, beats_loss=0.01145, ecapa_loss=0.0001456, whisper_loss=0.08528, over 17678.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01073, ecapa_loss=0.0001559, whisper_loss=0.09127, over 3892123.30 frames. ], batch size: 70, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:18:25,757 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.913e+01 2.344e+01 2.584e+01 2.860e+01 4.850e+01, threshold=5.168e+01, percent-clipped=0.0 2024-08-14 13:18:39,534 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2672620.0, ans=0.125 2024-08-14 13:19:28,486 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 6450, loss[loss=0.1175, beats_loss=0.008934, ecapa_loss=0.0001722, whisper_loss=0.1068, over 22859.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01073, ecapa_loss=0.0001567, whisper_loss=0.09164, over 3883942.68 frames. ], batch size: 94, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:19:29,754 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2024-08-14 13:19:34,982 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2673020.0, ans=0.0 2024-08-14 13:19:37,734 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-14 13:19:40,380 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 23 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-14 13:19:45,245 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2024-08-14 13:19:47,479 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 22 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-14 13:19:56,457 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 24 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-14 13:20:09,070 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.58 vs. limit=15.0 2024-08-14 13:20:25,851 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 22 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-14 13:20:41,397 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 6500, loss[loss=0.1033, beats_loss=0.01176, ecapa_loss=0.0001423, whisper_loss=0.09008, over 22347.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01065, ecapa_loss=0.0001573, whisper_loss=0.0923, over 3882798.61 frames. ], batch size: 93, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:20:51,066 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2673520.0, ans=0.2 2024-08-14 13:20:53,343 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.410e+01 2.661e+01 2.982e+01 1.028e+02, threshold=5.322e+01, percent-clipped=1.0 2024-08-14 13:20:59,407 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 22 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-14 13:21:02,535 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2673620.0, ans=0.125 2024-08-14 13:21:11,553 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 21 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-14 13:21:12,911 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-14 13:21:22,258 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2673720.0, ans=0.125 2024-08-14 13:21:26,556 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.91 vs. limit=15.0 2024-08-14 13:21:28,907 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.34 vs. limit=10.0 2024-08-14 13:21:36,503 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-14 13:21:38,215 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2673820.0, ans=0.0 2024-08-14 13:21:55,648 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 6550, loss[loss=0.09701, beats_loss=0.006858, ecapa_loss=0.000203, whisper_loss=0.08812, over 17951.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01069, ecapa_loss=0.0001567, whisper_loss=0.09201, over 3870874.09 frames. ], batch size: 75, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:21:57,210 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 29 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-14 13:22:05,091 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.769e+01 2024-08-14 13:22:10,746 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 24 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-14 13:22:17,015 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2674120.0, ans=0.125 2024-08-14 13:22:29,904 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2674220.0, ans=0.125 2024-08-14 13:22:35,365 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 18 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-14 13:22:40,145 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.390e-01 2024-08-14 13:22:44,366 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 13:22:48,758 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2674320.0, ans=0.2 2024-08-14 13:22:48,974 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.70 vs. limit=15.0 2024-08-14 13:23:04,788 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2674420.0, ans=0.0 2024-08-14 13:23:08,624 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 6600, loss[loss=0.09416, beats_loss=0.01269, ecapa_loss=0.0001421, whisper_loss=0.08006, over 22998.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01069, ecapa_loss=0.0001567, whisper_loss=0.09167, over 3903671.05 frames. ], batch size: 95, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:23:09,316 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2674520.0, ans=0.2 2024-08-14 13:23:20,872 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.120e+01 2.440e+01 2.726e+01 3.181e+01 5.119e+01, threshold=5.452e+01, percent-clipped=0.0 2024-08-14 13:23:29,972 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 28 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-14 13:23:46,950 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2674720.0, ans=0.0 2024-08-14 13:23:48,006 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 20 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-14 13:24:11,434 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 30 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-14 13:24:17,860 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2674920.0, ans=0.125 2024-08-14 13:24:20,514 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 19 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-14 13:24:21,651 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 6650, loss[loss=0.09317, beats_loss=0.01249, ecapa_loss=0.0001664, whisper_loss=0.07902, over 15650.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01079, ecapa_loss=0.0001567, whisper_loss=0.09109, over 3887233.14 frames. ], batch size: 66, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:24:25,034 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 20 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-14 13:24:26,866 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2675020.0, ans=0.125 2024-08-14 13:24:33,008 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2675020.0, ans=0.125 2024-08-14 13:24:38,854 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2675120.0, ans=0.125 2024-08-14 13:24:48,485 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 30 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-14 13:24:49,074 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2675120.0, ans=0.2 2024-08-14 13:24:56,083 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 24 from LS+wenet, 11 from Vox, 20 fro AS 2024-08-14 13:25:02,101 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 17 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-14 13:25:06,597 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 15 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-14 13:25:13,096 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2675320.0, ans=0.125 2024-08-14 13:25:15,643 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 20 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-14 13:25:21,438 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 28 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-14 13:25:30,035 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 23 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-14 13:25:35,602 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 6700, loss[loss=0.1156, beats_loss=0.01046, ecapa_loss=0.0001131, whisper_loss=0.104, over 18121.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0108, ecapa_loss=0.0001561, whisper_loss=0.09107, over 3878862.65 frames. ], batch size: 69, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:25:47,497 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.972e+01 2.392e+01 2.630e+01 2.889e+01 1.018e+02, threshold=5.259e+01, percent-clipped=2.0 2024-08-14 13:25:51,280 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2675620.0, ans=0.1 2024-08-14 13:25:55,373 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2675620.0, ans=0.2 2024-08-14 13:26:15,990 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=2675720.0, ans=0.5 2024-08-14 13:26:17,216 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 26 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-14 13:26:26,394 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2675820.0, ans=0.09899494936611666 2024-08-14 13:26:32,549 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2675820.0, ans=0.0 2024-08-14 13:26:49,624 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.60 vs. limit=6.0 2024-08-14 13:26:49,828 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 6750, loss[loss=0.105, beats_loss=0.009285, ecapa_loss=0.0001675, whisper_loss=0.09403, over 18692.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01067, ecapa_loss=0.0001565, whisper_loss=0.09201, over 3874343.67 frames. ], batch size: 74, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:26:57,024 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.21 vs. limit=22.5 2024-08-14 13:26:57,915 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 31 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-14 13:27:08,157 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 26 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-14 13:27:18,618 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2676220.0, ans=0.125 2024-08-14 13:27:29,007 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 13:27:37,988 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.13 vs. limit=15.0 2024-08-14 13:27:44,691 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2676320.0, ans=0.125 2024-08-14 13:27:56,196 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.76 vs. limit=15.0 2024-08-14 13:27:57,057 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-14 13:28:02,254 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 6800, loss[loss=0.09557, beats_loss=0.01219, ecapa_loss=0.0001444, whisper_loss=0.08193, over 22038.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01063, ecapa_loss=0.0001572, whisper_loss=0.09159, over 3888227.09 frames. ], batch size: 92, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:28:14,525 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.378e+01 2.676e+01 3.043e+01 8.013e+01, threshold=5.353e+01, percent-clipped=1.0 2024-08-14 13:28:21,166 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2676620.0, ans=0.0 2024-08-14 13:28:27,463 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 23 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-14 13:28:32,343 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2676720.0, ans=0.0 2024-08-14 13:28:32,353 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2676720.0, ans=0.125 2024-08-14 13:28:51,101 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2676820.0, ans=0.125 2024-08-14 13:28:58,670 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2676820.0, ans=0.125 2024-08-14 13:29:02,699 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 23 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-14 13:29:17,009 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 6850, loss[loss=0.1038, beats_loss=0.01009, ecapa_loss=0.0001433, whisper_loss=0.09233, over 20566.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01062, ecapa_loss=0.0001567, whisper_loss=0.09133, over 3855294.94 frames. ], batch size: 82, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:29:27,609 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2677020.0, ans=0.125 2024-08-14 13:29:53,525 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2677220.0, ans=0.125 2024-08-14 13:29:56,559 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.62 vs. limit=12.0 2024-08-14 13:29:59,019 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2677320.0, ans=0.125 2024-08-14 13:30:00,421 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2677320.0, ans=0.125 2024-08-14 13:30:04,097 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.46 vs. limit=22.5 2024-08-14 13:30:06,334 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 29 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-14 13:30:09,218 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2677320.0, ans=0.0 2024-08-14 13:30:13,336 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2677420.0, ans=0.125 2024-08-14 13:30:18,898 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 29 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-14 13:30:26,282 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2677420.0, ans=0.125 2024-08-14 13:30:28,599 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 6900, loss[loss=0.1243, beats_loss=0.00912, ecapa_loss=0.0001646, whisper_loss=0.1136, over 23033.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01072, ecapa_loss=0.0001573, whisper_loss=0.091, over 3895496.64 frames. ], batch size: 88, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:30:39,660 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.298e+01 2.502e+01 2.840e+01 6.631e+01, threshold=5.005e+01, percent-clipped=1.0 2024-08-14 13:30:52,761 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2677620.0, ans=0.125 2024-08-14 13:30:55,621 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 33 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-14 13:30:56,845 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 24 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-14 13:31:00,201 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=2677720.0, ans=10.0 2024-08-14 13:31:21,056 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.23 vs. limit=22.5 2024-08-14 13:31:27,970 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2677920.0, ans=0.1 2024-08-14 13:31:39,217 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 6950, loss[loss=0.1188, beats_loss=0.01045, ecapa_loss=0.0001534, whisper_loss=0.1068, over 16754.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0108, ecapa_loss=0.0001561, whisper_loss=0.09102, over 3920864.44 frames. ], batch size: 65, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:31:41,347 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2678020.0, ans=0.07 2024-08-14 13:31:43,635 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 25 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-14 13:31:45,456 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2678020.0, ans=0.05 2024-08-14 13:31:46,558 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 15 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-14 13:31:59,707 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-14 13:32:05,335 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 17 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-14 13:32:06,991 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2678220.0, ans=0.125 2024-08-14 13:32:07,528 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.37 vs. limit=10.0 2024-08-14 13:32:34,704 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.21 vs. limit=15.0 2024-08-14 13:32:50,060 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.55 vs. limit=15.0 2024-08-14 13:32:50,669 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 7000, loss[loss=0.1115, beats_loss=0.01019, ecapa_loss=0.0001834, whisper_loss=0.09942, over 20333.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01077, ecapa_loss=0.0001564, whisper_loss=0.09056, over 3891715.15 frames. ], batch size: 84, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:32:58,269 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2678520.0, ans=0.0 2024-08-14 13:33:01,974 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.701e+01 2.255e+01 2.474e+01 2.854e+01 4.338e+01, threshold=4.947e+01, percent-clipped=0.0 2024-08-14 13:33:04,003 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2678620.0, ans=0.125 2024-08-14 13:33:12,777 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2678620.0, ans=0.125 2024-08-14 13:33:18,105 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2678720.0, ans=0.04949747468305833 2024-08-14 13:33:24,031 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2678720.0, ans=0.1 2024-08-14 13:33:38,691 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.90 vs. limit=22.5 2024-08-14 13:33:39,702 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2678820.0, ans=0.2 2024-08-14 13:33:46,670 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 23 from LS+wenet, 23 from Vox, 48 fro AS 2024-08-14 13:33:53,842 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 13:34:01,704 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 7050, loss[loss=0.09041, beats_loss=0.01275, ecapa_loss=0.0001751, whisper_loss=0.07591, over 21429.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01076, ecapa_loss=0.0001573, whisper_loss=0.09052, over 3877468.40 frames. ], batch size: 92, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:34:07,957 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2679020.0, ans=0.1 2024-08-14 13:34:29,572 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=2679220.0, ans=0.2 2024-08-14 13:34:39,561 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2679220.0, ans=0.125 2024-08-14 13:34:45,237 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2679320.0, ans=0.125 2024-08-14 13:35:13,259 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 7100, loss[loss=0.09224, beats_loss=0.01047, ecapa_loss=0.000187, whisper_loss=0.0799, over 13992.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01077, ecapa_loss=0.0001558, whisper_loss=0.09085, over 3887428.38 frames. ], batch size: 56, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:35:24,636 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.302e+01 2.502e+01 2.737e+01 3.925e+01, threshold=5.005e+01, percent-clipped=0.0 2024-08-14 13:35:39,616 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2679620.0, ans=0.0 2024-08-14 13:35:41,075 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2679720.0, ans=0.09899494936611666 2024-08-14 13:35:47,908 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2679720.0, ans=0.0 2024-08-14 13:35:49,066 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-14 13:35:58,894 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 23 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-14 13:36:04,099 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 12 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-14 13:36:14,296 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2679920.0, ans=0.2 2024-08-14 13:36:21,321 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2679920.0, ans=0.125 2024-08-14 13:36:22,597 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-14 13:36:27,965 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2679920.0, ans=0.125 2024-08-14 13:36:30,068 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 7150, loss[loss=0.1104, beats_loss=0.009746, ecapa_loss=0.0001552, whisper_loss=0.09913, over 22446.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0108, ecapa_loss=0.0001548, whisper_loss=0.09052, over 3912575.02 frames. ], batch size: 88, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:36:47,550 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 30 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-14 13:36:52,781 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2680120.0, ans=0.125 2024-08-14 13:37:03,041 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2680120.0, ans=0.0 2024-08-14 13:37:16,512 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.86 vs. limit=22.5 2024-08-14 13:37:28,409 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2680320.0, ans=0.125 2024-08-14 13:37:45,787 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 21 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-14 13:37:52,785 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 7200, loss[loss=0.1034, beats_loss=0.01123, ecapa_loss=0.0001361, whisper_loss=0.09084, over 22825.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01078, ecapa_loss=0.0001543, whisper_loss=0.09013, over 3893827.69 frames. ], batch size: 91, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:38:01,208 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.04 vs. limit=22.5 2024-08-14 13:38:04,362 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.787e+01 2.341e+01 2.648e+01 2.948e+01 9.250e+01, threshold=5.295e+01, percent-clipped=2.0 2024-08-14 13:38:15,418 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2680620.0, ans=0.0 2024-08-14 13:38:17,204 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.30 vs. limit=15.0 2024-08-14 13:38:30,672 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 30 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-14 13:38:32,410 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2680720.0, ans=0.125 2024-08-14 13:38:38,318 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 26 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-14 13:38:43,083 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2680820.0, ans=0.125 2024-08-14 13:38:44,395 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 19 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-14 13:39:07,576 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 7250, loss[loss=0.1224, beats_loss=0.01069, ecapa_loss=0.000165, whisper_loss=0.11, over 21522.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01077, ecapa_loss=0.0001547, whisper_loss=0.09001, over 3889876.28 frames. ], batch size: 86, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:39:11,344 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 27 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-14 13:39:22,154 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2681120.0, ans=0.2 2024-08-14 13:39:45,978 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.92 vs. limit=15.0 2024-08-14 13:39:52,841 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2681320.0, ans=0.125 2024-08-14 13:39:55,905 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2681320.0, ans=0.125 2024-08-14 13:39:55,935 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2681320.0, ans=0.125 2024-08-14 13:40:04,733 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.08 vs. limit=12.0 2024-08-14 13:40:14,409 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-14 13:40:19,884 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 16 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-14 13:40:21,174 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 7300, loss[loss=0.09893, beats_loss=0.01014, ecapa_loss=0.0001183, whisper_loss=0.08761, over 14746.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01073, ecapa_loss=0.0001545, whisper_loss=0.09051, over 3892750.71 frames. ], batch size: 54, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:40:22,948 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 25 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-14 13:40:24,259 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2681520.0, ans=0.125 2024-08-14 13:40:33,353 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.289e+01 2.573e+01 2.951e+01 1.378e+02, threshold=5.146e+01, percent-clipped=1.0 2024-08-14 13:40:36,698 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 13 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-14 13:40:48,870 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2681620.0, ans=0.125 2024-08-14 13:41:04,515 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2681820.0, ans=0.0 2024-08-14 13:41:15,466 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2681820.0, ans=0.0 2024-08-14 13:41:20,263 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2681920.0, ans=0.0 2024-08-14 13:41:30,231 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 18 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-14 13:41:36,494 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 7350, loss[loss=0.07986, beats_loss=0.01161, ecapa_loss=0.0001239, whisper_loss=0.06701, over 15303.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01078, ecapa_loss=0.0001556, whisper_loss=0.09019, over 3880026.03 frames. ], batch size: 61, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:41:46,743 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 36 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-14 13:41:50,085 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2682120.0, ans=0.2 2024-08-14 13:41:52,946 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 34 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-14 13:42:09,704 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2682220.0, ans=0.0 2024-08-14 13:42:14,008 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2682220.0, ans=0.125 2024-08-14 13:42:14,312 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.68 vs. limit=15.0 2024-08-14 13:42:33,402 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2682320.0, ans=0.0 2024-08-14 13:42:37,888 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2682420.0, ans=0.125 2024-08-14 13:42:39,325 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2682420.0, ans=0.2 2024-08-14 13:42:50,695 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 7400, loss[loss=0.1073, beats_loss=0.0111, ecapa_loss=0.0001926, whisper_loss=0.09426, over 21562.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0108, ecapa_loss=0.0001559, whisper_loss=0.08992, over 3873363.13 frames. ], batch size: 90, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:42:55,512 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2682520.0, ans=0.2 2024-08-14 13:43:01,279 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2682520.0, ans=0.125 2024-08-14 13:43:02,174 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.818e+01 2.321e+01 2.551e+01 2.887e+01 1.021e+02, threshold=5.101e+01, percent-clipped=1.0 2024-08-14 13:43:02,872 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2682520.0, ans=0.125 2024-08-14 13:43:05,903 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.29 vs. limit=15.0 2024-08-14 13:43:15,493 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2682620.0, ans=0.0 2024-08-14 13:43:19,462 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 25 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-14 13:43:28,287 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2682720.0, ans=0.125 2024-08-14 13:43:34,113 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 36 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-14 13:43:53,997 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2682920.0, ans=0.2 2024-08-14 13:44:01,911 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.74 vs. limit=15.0 2024-08-14 13:44:02,375 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 7450, loss[loss=0.1047, beats_loss=0.01141, ecapa_loss=0.0001408, whisper_loss=0.09192, over 21796.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01076, ecapa_loss=0.0001551, whisper_loss=0.09079, over 3898523.56 frames. ], batch size: 86, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:44:11,234 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 15 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-14 13:44:19,032 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2683120.0, ans=10.0 2024-08-14 13:44:32,530 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.42 vs. limit=12.0 2024-08-14 13:45:16,420 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 7500, loss[loss=0.08553, beats_loss=0.01236, ecapa_loss=0.0001292, whisper_loss=0.07188, over 20290.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01076, ecapa_loss=0.0001538, whisper_loss=0.09087, over 3902602.96 frames. ], batch size: 82, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:45:28,153 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.801e+01 2.296e+01 2.546e+01 2.865e+01 4.082e+01, threshold=5.092e+01, percent-clipped=0.0 2024-08-14 13:45:33,426 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2683620.0, ans=0.04949747468305833 2024-08-14 13:45:39,121 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 23 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-14 13:45:39,556 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2683620.0, ans=0.125 2024-08-14 13:45:43,768 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 21 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-14 13:45:47,329 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2683720.0, ans=0.1 2024-08-14 13:45:51,684 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2683720.0, ans=0.0 2024-08-14 13:46:14,727 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2683820.0, ans=0.2 2024-08-14 13:46:18,032 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 21 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-14 13:46:23,996 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-14 13:46:32,466 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 7550, loss[loss=0.08989, beats_loss=0.009712, ecapa_loss=0.0001836, whisper_loss=0.07834, over 20273.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01079, ecapa_loss=0.000154, whisper_loss=0.09086, over 3933834.10 frames. ], batch size: 83, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:46:58,643 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2684120.0, ans=0.0 2024-08-14 13:47:05,657 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 17 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-14 13:47:18,075 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.11 vs. limit=15.0 2024-08-14 13:47:34,297 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2684420.0, ans=0.0 2024-08-14 13:47:35,688 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2684420.0, ans=10.0 2024-08-14 13:47:38,639 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2684420.0, ans=0.0 2024-08-14 13:47:46,615 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 7600, loss[loss=0.1154, beats_loss=0.008636, ecapa_loss=0.000156, whisper_loss=0.1052, over 21600.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01075, ecapa_loss=0.0001546, whisper_loss=0.09066, over 3936005.37 frames. ], batch size: 84, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:47:53,366 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2684520.0, ans=0.125 2024-08-14 13:47:54,491 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 18 from LS+wenet, 33 from Vox, 40 fro AS 2024-08-14 13:47:58,865 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.955e+01 2.371e+01 2.546e+01 2.782e+01 5.094e+01, threshold=5.091e+01, percent-clipped=1.0 2024-08-14 13:48:01,108 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2684620.0, ans=0.125 2024-08-14 13:48:21,573 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2684720.0, ans=0.07 2024-08-14 13:48:35,685 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 27 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-14 13:49:00,482 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 7650, loss[loss=0.09068, beats_loss=0.01169, ecapa_loss=0.0001232, whisper_loss=0.07776, over 18786.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01068, ecapa_loss=0.0001555, whisper_loss=0.09077, over 3927437.82 frames. ], batch size: 73, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:49:04,145 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-14 13:49:32,584 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2685220.0, ans=0.0 2024-08-14 13:49:35,287 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2685220.0, ans=0.125 2024-08-14 13:49:38,221 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2685220.0, ans=10.0 2024-08-14 13:49:59,409 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 22 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-14 13:50:05,686 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 16 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-14 13:50:07,162 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 17 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-14 13:50:13,963 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 7700, loss[loss=0.09216, beats_loss=0.0106, ecapa_loss=0.0001597, whisper_loss=0.07997, over 17608.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01062, ecapa_loss=0.0001555, whisper_loss=0.09092, over 3927049.31 frames. ], batch size: 73, lr: 3.28e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:50:22,249 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2685520.0, ans=0.0 2024-08-14 13:50:22,284 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2685520.0, ans=0.0 2024-08-14 13:50:23,345 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 34 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-14 13:50:25,913 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.559e+01 2.371e+01 2.640e+01 3.039e+01 4.657e+01, threshold=5.281e+01, percent-clipped=0.0 2024-08-14 13:51:00,215 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.43 vs. limit=15.0 2024-08-14 13:51:12,867 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2685920.0, ans=0.2 2024-08-14 13:51:17,295 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2685920.0, ans=0.0 2024-08-14 13:51:26,529 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 7750, loss[loss=0.1056, beats_loss=0.01049, ecapa_loss=0.0001259, whisper_loss=0.09382, over 16942.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0106, ecapa_loss=0.000155, whisper_loss=0.09079, over 3918557.26 frames. ], batch size: 66, lr: 3.28e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:51:43,481 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 19 from LS+wenet, 10 from Vox, 38 fro AS 2024-08-14 13:51:49,894 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=2686120.0, ans=22.5 2024-08-14 13:52:07,486 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2686220.0, ans=0.125 2024-08-14 13:52:36,585 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-14 13:52:40,677 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 7800, loss[loss=0.1099, beats_loss=0.01214, ecapa_loss=0.0001349, whisper_loss=0.09645, over 14862.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01057, ecapa_loss=0.0001551, whisper_loss=0.09036, over 3910034.41 frames. ], batch size: 58, lr: 3.28e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:52:41,438 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2686520.0, ans=0.0 2024-08-14 13:52:52,228 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.962e+01 2.425e+01 2.611e+01 2.883e+01 9.855e+01, threshold=5.222e+01, percent-clipped=1.0 2024-08-14 13:52:52,531 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 22 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-14 13:53:47,298 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2686920.0, ans=0.125 2024-08-14 13:53:50,560 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2686920.0, ans=0.125 2024-08-14 13:53:54,557 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 7850, loss[loss=0.1018, beats_loss=0.01189, ecapa_loss=0.0001732, whisper_loss=0.08814, over 20539.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01064, ecapa_loss=0.0001555, whisper_loss=0.09093, over 3939081.96 frames. ], batch size: 89, lr: 3.28e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:53:56,104 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-14 13:53:56,304 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2687020.0, ans=0.0 2024-08-14 13:53:58,013 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.45 vs. limit=15.0 2024-08-14 13:54:07,281 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 18 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-14 13:54:25,955 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2687220.0, ans=0.0 2024-08-14 13:54:39,597 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2687320.0, ans=0.125 2024-08-14 13:54:45,506 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2687320.0, ans=0.125 2024-08-14 13:55:08,989 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 7900, loss[loss=0.08211, beats_loss=0.01215, ecapa_loss=0.0001757, whisper_loss=0.0682, over 15929.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01076, ecapa_loss=0.0001549, whisper_loss=0.09056, over 3925309.53 frames. ], batch size: 63, lr: 3.28e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:55:15,169 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-14 13:55:20,569 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.378e+01 2.612e+01 2.895e+01 1.059e+02, threshold=5.225e+01, percent-clipped=1.0 2024-08-14 13:55:27,429 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2687620.0, ans=0.2 2024-08-14 13:55:34,685 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2687620.0, ans=0.125 2024-08-14 13:55:47,667 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2687720.0, ans=0.0 2024-08-14 13:55:49,974 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 17 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-14 13:56:00,545 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 20 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-14 13:56:02,279 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2687820.0, ans=0.07 2024-08-14 13:56:07,392 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.16 vs. limit=15.0 2024-08-14 13:56:08,491 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2687920.0, ans=0.125 2024-08-14 13:56:20,338 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 13 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-14 13:56:20,706 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2687920.0, ans=0.125 2024-08-14 13:56:22,925 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 7950, loss[loss=0.06003, beats_loss=0.01281, ecapa_loss=0.0001258, whisper_loss=0.04596, over 14317.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0108, ecapa_loss=0.0001545, whisper_loss=0.09023, over 3878482.88 frames. ], batch size: 58, lr: 3.28e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:56:26,531 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=2688020.0, ans=0.5 2024-08-14 13:56:29,551 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-14 13:56:37,449 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.04 vs. limit=6.0 2024-08-14 13:56:38,812 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.76 vs. limit=15.0 2024-08-14 13:57:09,816 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2688320.0, ans=0.0 2024-08-14 13:57:09,879 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2688320.0, ans=0.2 2024-08-14 13:57:15,667 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2688320.0, ans=0.125 2024-08-14 13:57:23,133 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2688420.0, ans=0.125 2024-08-14 13:57:37,556 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 8000, loss[loss=0.09844, beats_loss=0.01285, ecapa_loss=0.0001465, whisper_loss=0.08412, over 15490.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01083, ecapa_loss=0.0001541, whisper_loss=0.09005, over 3856449.65 frames. ], batch size: 62, lr: 3.28e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:57:37,845 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-14 13:57:40,943 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2688520.0, ans=0.0 2024-08-14 13:57:42,310 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2688520.0, ans=0.0 2024-08-14 13:57:44,729 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-14 13:57:48,981 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.795e+01 2.382e+01 2.629e+01 3.053e+01 3.860e+01, threshold=5.259e+01, percent-clipped=0.0 2024-08-14 13:57:55,544 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2688620.0, ans=0.0 2024-08-14 13:58:07,959 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.60 vs. limit=5.0 2024-08-14 13:58:11,878 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2688720.0, ans=0.07 2024-08-14 13:58:18,077 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2688720.0, ans=0.125 2024-08-14 13:58:24,905 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 21 from LS+wenet, 18 from Vox, 51 fro AS 2024-08-14 13:58:50,872 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 8050, loss[loss=0.1042, beats_loss=0.00929, ecapa_loss=0.0001764, whisper_loss=0.09312, over 13819.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01082, ecapa_loss=0.000154, whisper_loss=0.09025, over 3852155.84 frames. ], batch size: 58, lr: 3.28e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:58:58,125 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 22 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-14 14:00:03,719 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 8100, loss[loss=0.09292, beats_loss=0.01008, ecapa_loss=0.0001897, whisper_loss=0.08094, over 19408.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01068, ecapa_loss=0.0001547, whisper_loss=0.09129, over 3880062.03 frames. ], batch size: 80, lr: 3.28e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 14:00:04,446 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2689520.0, ans=0.0 2024-08-14 14:00:15,749 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.342e+01 2.614e+01 2.950e+01 9.116e+01, threshold=5.228e+01, percent-clipped=3.0 2024-08-14 14:00:31,142 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2689620.0, ans=0.04949747468305833 2024-08-14 14:00:37,312 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2689720.0, ans=0.07 2024-08-14 14:01:13,833 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2689920.0, ans=0.125 2024-08-14 14:01:15,787 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 8150, loss[loss=0.1015, beats_loss=0.01115, ecapa_loss=0.0001733, whisper_loss=0.08864, over 19970.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01059, ecapa_loss=0.0001555, whisper_loss=0.09193, over 3894467.79 frames. ], batch size: 83, lr: 3.28e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 14:01:26,701 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 16 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-14 14:01:37,077 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.11 vs. limit=6.0 2024-08-14 14:01:48,064 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 27 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 14:02:09,665 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2690320.0, ans=0.125 2024-08-14 14:02:23,844 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2690420.0, ans=0.125 2024-08-14 14:02:27,445 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.81 vs. limit=15.0 2024-08-14 14:02:29,239 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 8200, loss[loss=0.1064, beats_loss=0.007952, ecapa_loss=0.000188, whisper_loss=0.09658, over 20954.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01063, ecapa_loss=0.0001546, whisper_loss=0.09208, over 3911313.51 frames. ], batch size: 82, lr: 3.28e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 14:02:40,547 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.782e+01 2.302e+01 2.493e+01 2.763e+01 4.005e+01, threshold=4.986e+01, percent-clipped=0.0 2024-08-14 14:02:59,254 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.49 vs. limit=12.0 2024-08-14 14:03:00,326 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2690720.0, ans=0.1 2024-08-14 14:03:11,751 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-14 14:03:27,842 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2690920.0, ans=0.125 2024-08-14 14:03:33,629 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 19 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-14 14:03:40,975 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 27 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-14 14:03:42,198 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 8250, loss[loss=0.1097, beats_loss=0.009805, ecapa_loss=0.0001242, whisper_loss=0.09861, over 21210.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01063, ecapa_loss=0.0001554, whisper_loss=0.09197, over 3913494.11 frames. ], batch size: 79, lr: 3.28e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 14:03:44,810 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.59 vs. limit=22.5 2024-08-14 14:04:00,891 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2691120.0, ans=0.125 2024-08-14 14:04:22,840 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-14 14:04:27,466 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.538e+01 2024-08-14 14:04:27,891 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.18 vs. limit=12.0 2024-08-14 14:04:30,425 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=2691320.0, ans=0.025 2024-08-14 14:04:39,616 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2691320.0, ans=0.125 2024-08-14 14:04:47,304 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2691420.0, ans=0.0 2024-08-14 14:04:55,280 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 20 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-14 14:04:56,475 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 8300, loss[loss=0.0957, beats_loss=0.01237, ecapa_loss=0.0001378, whisper_loss=0.08195, over 17301.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01069, ecapa_loss=0.0001547, whisper_loss=0.09109, over 3883160.34 frames. ], batch size: 67, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:04:56,976 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2691520.0, ans=0.125 2024-08-14 14:05:08,146 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.767e+01 2.406e+01 2.618e+01 2.998e+01 6.409e+01, threshold=5.237e+01, percent-clipped=1.0 2024-08-14 14:05:13,298 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=23.31 vs. limit=15.0 2024-08-14 14:05:23,772 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.14 vs. limit=22.5 2024-08-14 14:05:32,240 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 15 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-14 14:05:36,610 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2691720.0, ans=0.125 2024-08-14 14:05:39,467 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 16 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-14 14:05:42,647 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2691820.0, ans=0.125 2024-08-14 14:05:50,310 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 23 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-14 14:06:09,905 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2692020.0, ans=0.125 2024-08-14 14:06:10,682 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 8350, loss[loss=0.09158, beats_loss=0.01164, ecapa_loss=0.0001424, whisper_loss=0.07852, over 21567.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01077, ecapa_loss=0.0001544, whisper_loss=0.09102, over 3881171.43 frames. ], batch size: 87, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:06:32,277 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-14 14:06:44,097 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 24 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-14 14:06:55,532 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2692220.0, ans=0.1 2024-08-14 14:07:14,861 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2692420.0, ans=0.125 2024-08-14 14:07:16,514 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2692420.0, ans=0.0 2024-08-14 14:07:30,296 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 8400, loss[loss=0.1198, beats_loss=0.008061, ecapa_loss=0.0001614, whisper_loss=0.1101, over 19155.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01074, ecapa_loss=0.0001537, whisper_loss=0.09131, over 3919037.76 frames. ], batch size: 71, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:07:37,439 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2692520.0, ans=0.125 2024-08-14 14:07:43,273 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.388e+01 2.632e+01 2.972e+01 1.432e+02, threshold=5.263e+01, percent-clipped=3.0 2024-08-14 14:07:44,073 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-14 14:07:49,198 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2692620.0, ans=0.07 2024-08-14 14:07:54,896 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2692620.0, ans=0.125 2024-08-14 14:08:00,833 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2692720.0, ans=0.05 2024-08-14 14:08:00,877 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2692720.0, ans=0.2 2024-08-14 14:08:01,872 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 25 from LS+wenet, 12 from Vox, 17 fro AS 2024-08-14 14:08:20,459 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-14 14:08:32,310 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2692920.0, ans=0.125 2024-08-14 14:08:35,683 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.97 vs. limit=15.0 2024-08-14 14:08:48,893 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 8450, loss[loss=0.1257, beats_loss=0.008495, ecapa_loss=0.0001773, whisper_loss=0.1154, over 22793.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0107, ecapa_loss=0.0001554, whisper_loss=0.09133, over 3927534.37 frames. ], batch size: 89, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:08:49,165 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 19 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-14 14:08:54,973 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 34 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-14 14:09:10,504 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2693120.0, ans=0.0 2024-08-14 14:09:26,931 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2693220.0, ans=0.125 2024-08-14 14:09:32,644 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2693220.0, ans=0.125 2024-08-14 14:09:35,843 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 22 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-14 14:09:36,154 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2693320.0, ans=0.0 2024-08-14 14:09:40,805 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 14:09:42,133 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2693320.0, ans=10.0 2024-08-14 14:09:55,047 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2693420.0, ans=0.1 2024-08-14 14:09:55,981 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-14 14:09:56,543 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2693420.0, ans=0.05 2024-08-14 14:09:57,551 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-14 14:10:01,282 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.40 vs. limit=22.5 2024-08-14 14:10:06,682 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 8500, loss[loss=0.1224, beats_loss=0.01012, ecapa_loss=0.0001811, whisper_loss=0.1104, over 21507.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01074, ecapa_loss=0.0001539, whisper_loss=0.09112, over 3930845.50 frames. ], batch size: 89, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:10:19,599 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.978e+01 2.292e+01 2.601e+01 3.025e+01 1.070e+02, threshold=5.203e+01, percent-clipped=1.0 2024-08-14 14:10:24,245 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 17 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-14 14:10:27,790 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.97 vs. limit=6.0 2024-08-14 14:10:32,251 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 35 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-14 14:10:38,148 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2693720.0, ans=0.125 2024-08-14 14:10:51,665 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.29 vs. limit=15.0 2024-08-14 14:11:06,323 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2693820.0, ans=0.0 2024-08-14 14:11:07,981 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.65 vs. limit=12.0 2024-08-14 14:11:18,706 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2693920.0, ans=0.0 2024-08-14 14:11:27,247 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 8550, loss[loss=0.101, beats_loss=0.01208, ecapa_loss=0.0001347, whisper_loss=0.08759, over 18694.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01068, ecapa_loss=0.0001532, whisper_loss=0.09118, over 3919748.47 frames. ], batch size: 76, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:11:28,424 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.04 vs. limit=15.0 2024-08-14 14:11:29,553 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2694020.0, ans=0.125 2024-08-14 14:11:31,000 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2694020.0, ans=0.125 2024-08-14 14:11:38,257 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2694020.0, ans=0.0 2024-08-14 14:11:44,522 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 28 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-14 14:11:44,794 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2694120.0, ans=0.2 2024-08-14 14:11:54,343 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-14 14:12:04,316 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 23 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-14 14:12:14,387 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.47 vs. limit=12.0 2024-08-14 14:12:15,578 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2694320.0, ans=0.1 2024-08-14 14:12:21,865 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2694320.0, ans=0.09899494936611666 2024-08-14 14:12:38,737 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2694420.0, ans=0.0 2024-08-14 14:12:40,261 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2694420.0, ans=0.2 2024-08-14 14:12:44,246 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 28 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-14 14:12:45,513 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 8600, loss[loss=0.1154, beats_loss=0.01013, ecapa_loss=0.0001384, whisper_loss=0.1039, over 19979.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01062, ecapa_loss=0.0001538, whisper_loss=0.09175, over 3916760.34 frames. ], batch size: 77, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:12:52,154 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2694520.0, ans=0.1 2024-08-14 14:12:57,641 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.714e+01 2.473e+01 2.757e+01 3.150e+01 4.170e+01, threshold=5.513e+01, percent-clipped=0.0 2024-08-14 14:13:07,541 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2694620.0, ans=0.125 2024-08-14 14:13:11,815 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 23 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-14 14:13:14,858 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2694720.0, ans=0.1 2024-08-14 14:13:46,116 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 34 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 14:14:00,905 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 26 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-14 14:14:01,204 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2694920.0, ans=0.125 2024-08-14 14:14:03,555 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 8650, loss[loss=0.09956, beats_loss=0.01198, ecapa_loss=0.0001522, whisper_loss=0.08605, over 15940.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01065, ecapa_loss=0.0001541, whisper_loss=0.09132, over 3914252.10 frames. ], batch size: 64, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:14:03,978 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-14 14:14:25,052 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2695120.0, ans=0.125 2024-08-14 14:14:26,376 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2695120.0, ans=0.125 2024-08-14 14:14:40,336 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 22 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-14 14:15:00,890 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2695320.0, ans=0.1 2024-08-14 14:15:17,474 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 20 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-14 14:15:18,743 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 8700, loss[loss=0.1002, beats_loss=0.01231, ecapa_loss=0.0001303, whisper_loss=0.08659, over 16565.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0106, ecapa_loss=0.0001555, whisper_loss=0.09146, over 3894304.92 frames. ], batch size: 63, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:15:19,374 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2695520.0, ans=0.125 2024-08-14 14:15:30,385 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.834e+01 2.361e+01 2.667e+01 2.943e+01 6.389e+01, threshold=5.334e+01, percent-clipped=1.0 2024-08-14 14:15:30,961 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2695520.0, ans=0.125 2024-08-14 14:15:37,894 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 15 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-14 14:15:48,762 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2695720.0, ans=0.125 2024-08-14 14:15:49,736 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 32 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-14 14:16:06,024 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2695820.0, ans=0.125 2024-08-14 14:16:18,054 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 24 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-14 14:16:22,204 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2695920.0, ans=0.1 2024-08-14 14:16:31,779 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 8750, loss[loss=0.108, beats_loss=0.01047, ecapa_loss=0.0001547, whisper_loss=0.09593, over 23444.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01062, ecapa_loss=0.0001548, whisper_loss=0.09132, over 3871402.56 frames. ], batch size: 93, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:16:32,788 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2696020.0, ans=10.0 2024-08-14 14:16:46,373 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.91 vs. limit=22.5 2024-08-14 14:16:51,570 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2696120.0, ans=0.0 2024-08-14 14:17:12,975 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2696220.0, ans=0.0 2024-08-14 14:17:18,421 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-14 14:17:23,399 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.13 vs. limit=15.0 2024-08-14 14:17:44,090 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 8800, loss[loss=0.09197, beats_loss=0.01088, ecapa_loss=0.0001285, whisper_loss=0.07981, over 16562.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01072, ecapa_loss=0.0001545, whisper_loss=0.09155, over 3899658.99 frames. ], batch size: 62, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:17:55,728 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.755e+01 2.470e+01 2.757e+01 3.014e+01 7.462e+01, threshold=5.513e+01, percent-clipped=1.0 2024-08-14 14:18:58,402 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 8850, loss[loss=0.09183, beats_loss=0.01227, ecapa_loss=0.0001417, whisper_loss=0.07814, over 19205.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01077, ecapa_loss=0.0001533, whisper_loss=0.09082, over 3889889.26 frames. ], batch size: 75, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:19:02,118 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2697020.0, ans=0.0 2024-08-14 14:19:19,247 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2697120.0, ans=0.1 2024-08-14 14:19:29,205 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 25 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-14 14:19:31,183 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2697220.0, ans=10.0 2024-08-14 14:19:42,767 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2697320.0, ans=0.0 2024-08-14 14:19:45,853 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2697320.0, ans=0.125 2024-08-14 14:19:46,820 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 24 from LS+wenet, 14 from Vox, 43 fro AS 2024-08-14 14:19:50,148 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2697320.0, ans=0.05 2024-08-14 14:19:57,201 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 24 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-14 14:20:10,596 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 19 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-14 14:20:11,711 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 8900, loss[loss=0.08907, beats_loss=0.01082, ecapa_loss=0.0001746, whisper_loss=0.0765, over 16211.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01072, ecapa_loss=0.0001535, whisper_loss=0.09112, over 3873598.33 frames. ], batch size: 67, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:20:23,569 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.627e+01 2.296e+01 2.497e+01 2.712e+01 4.460e+01, threshold=4.994e+01, percent-clipped=0.0 2024-08-14 14:20:24,105 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2697520.0, ans=0.2 2024-08-14 14:20:45,082 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.99 vs. limit=12.0 2024-08-14 14:21:18,850 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2697920.0, ans=0.125 2024-08-14 14:21:25,790 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 8950, loss[loss=0.0845, beats_loss=0.01165, ecapa_loss=0.0001328, whisper_loss=0.07153, over 14840.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01082, ecapa_loss=0.000153, whisper_loss=0.09017, over 3828871.17 frames. ], batch size: 57, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:21:34,630 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2698020.0, ans=0.125 2024-08-14 14:21:35,790 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2698020.0, ans=0.125 2024-08-14 14:21:37,057 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2698020.0, ans=0.125 2024-08-14 14:21:37,152 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2698020.0, ans=0.125 2024-08-14 14:21:44,505 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 36 from LS+wenet, 13 from Vox, 43 fro AS 2024-08-14 14:21:55,384 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2698220.0, ans=0.125 2024-08-14 14:21:55,468 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2698220.0, ans=0.1 2024-08-14 14:21:56,556 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 24 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-14 14:22:07,697 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.37 vs. limit=22.5 2024-08-14 14:22:12,218 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.12 vs. limit=22.5 2024-08-14 14:22:13,834 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.31 vs. limit=10.0 2024-08-14 14:22:13,850 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.72 vs. limit=15.0 2024-08-14 14:22:16,101 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 20 from LS+wenet, 30 from Vox, 40 fro AS 2024-08-14 14:22:39,089 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 9000, loss[loss=0.1073, beats_loss=0.01056, ecapa_loss=0.0001611, whisper_loss=0.09512, over 22263.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01075, ecapa_loss=0.0001542, whisper_loss=0.09103, over 3852354.34 frames. ], batch size: 89, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:22:39,089 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-14 14:23:17,723 INFO [train_multi_KD3.py:1149] (1/4) Epoch 19, validation on ASR_libri: loss=0.2527, beats_loss=0, ecapa_loss=0.0005393, whisper_loss=0.2473, over 922467.00 frames. 2024-08-14 14:23:35,624 INFO [train_multi_KD3.py:1149] (1/4) Epoch 19, validation on SV_voxceleb1: loss=0.00426, beats_loss=0, ecapa_loss=0.000426, whisper_loss=0, over 939242.00 frames. 2024-08-14 14:24:02,791 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([2.8673, 1.9166, 1.9001, 1.6314, 2.4285, 1.8874, 1.9072, 1.9097], device='cuda:1') 2024-08-14 14:25:23,948 INFO [train_multi_KD3.py:1149] (1/4) Epoch 19, validation on AT_audioset: loss=0.02357, beats_loss=0.02357, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 14:25:23,951 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 30639MB 2024-08-14 14:25:34,727 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 22 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-14 14:25:35,734 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.359e+01 2.561e+01 2.926e+01 5.640e+01, threshold=5.122e+01, percent-clipped=1.0 2024-08-14 14:25:49,227 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 28 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-14 14:25:50,962 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2698620.0, ans=0.125 2024-08-14 14:25:55,949 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2698720.0, ans=0.0 2024-08-14 14:26:03,676 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2698720.0, ans=0.125 2024-08-14 14:26:12,879 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2698820.0, ans=0.125 2024-08-14 14:26:13,866 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 24 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-14 14:26:15,344 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 23 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-14 14:26:17,325 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 16 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-14 14:26:30,087 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-14 14:26:38,389 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 9050, loss[loss=0.09737, beats_loss=0.01148, ecapa_loss=0.0001753, whisper_loss=0.08413, over 19273.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01066, ecapa_loss=0.000155, whisper_loss=0.09118, over 3872308.40 frames. ], batch size: 79, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:26:39,325 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2699020.0, ans=0.0 2024-08-14 14:26:44,921 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 28 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 14:27:26,544 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2699320.0, ans=0.125 2024-08-14 14:27:26,609 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2699320.0, ans=0.0 2024-08-14 14:27:45,926 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2699420.0, ans=0.125 2024-08-14 14:27:52,351 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 9100, loss[loss=0.07683, beats_loss=0.01084, ecapa_loss=0.0001893, whisper_loss=0.0641, over 17342.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01071, ecapa_loss=0.000155, whisper_loss=0.0908, over 3875621.51 frames. ], batch size: 75, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:27:55,285 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.47 vs. limit=15.0 2024-08-14 14:28:04,580 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.934e+01 2.245e+01 2.534e+01 2.882e+01 3.902e+01, threshold=5.067e+01, percent-clipped=0.0 2024-08-14 14:28:15,791 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2699620.0, ans=0.1 2024-08-14 14:28:32,029 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 24 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 14:28:34,019 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.98 vs. limit=10.0 2024-08-14 14:28:39,483 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2699820.0, ans=0.09899494936611666 2024-08-14 14:28:49,470 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2699820.0, ans=0.1 2024-08-14 14:28:52,849 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2699920.0, ans=0.0 2024-08-14 14:28:53,924 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 21 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-14 14:29:05,877 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=2700020.0, ans=15.0 2024-08-14 14:29:06,658 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 9150, loss[loss=0.08536, beats_loss=0.01332, ecapa_loss=0.0001098, whisper_loss=0.07094, over 23439.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01081, ecapa_loss=0.0001537, whisper_loss=0.09007, over 3910094.48 frames. ], batch size: 93, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:29:06,813 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 13 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-14 14:29:15,947 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.42 vs. limit=12.0 2024-08-14 14:29:28,968 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.81 vs. limit=15.0 2024-08-14 14:29:54,052 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2700320.0, ans=0.1 2024-08-14 14:29:59,417 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2700320.0, ans=0.0 2024-08-14 14:30:05,882 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 22 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-14 14:30:12,830 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-14 14:30:15,929 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-14 14:30:19,881 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 9200, loss[loss=0.09268, beats_loss=0.01211, ecapa_loss=0.0001684, whisper_loss=0.07888, over 17975.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01075, ecapa_loss=0.0001541, whisper_loss=0.09027, over 3915610.66 frames. ], batch size: 74, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:30:24,758 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2700520.0, ans=0.0 2024-08-14 14:30:31,060 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.860e+01 2.280e+01 2.601e+01 2.975e+01 5.180e+01, threshold=5.201e+01, percent-clipped=1.0 2024-08-14 14:30:35,812 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-14 14:30:42,356 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.13 vs. limit=15.0 2024-08-14 14:30:46,918 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.85 vs. limit=15.0 2024-08-14 14:31:02,171 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2700820.0, ans=0.2 2024-08-14 14:31:05,213 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2700820.0, ans=0.125 2024-08-14 14:31:17,057 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.78 vs. limit=10.0 2024-08-14 14:31:17,920 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 20 from LS+wenet, 21 from Vox, 15 fro AS 2024-08-14 14:31:21,032 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2700920.0, ans=0.2 2024-08-14 14:31:29,383 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 34 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-14 14:31:31,819 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 9250, loss[loss=0.1173, beats_loss=0.008979, ecapa_loss=0.0001318, whisper_loss=0.107, over 21780.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01069, ecapa_loss=0.000156, whisper_loss=0.09025, over 3935245.17 frames. ], batch size: 83, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:31:33,418 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 24 from LS+wenet, 35 from Vox, 35 fro AS 2024-08-14 14:32:21,253 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2701320.0, ans=0.125 2024-08-14 14:32:26,775 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 22 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-14 14:32:27,492 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=2701320.0, ans=6.0 2024-08-14 14:32:37,678 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2701420.0, ans=0.125 2024-08-14 14:32:43,972 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 9300, loss[loss=0.09725, beats_loss=0.01109, ecapa_loss=0.0001477, whisper_loss=0.08468, over 17064.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.0106, ecapa_loss=0.0001548, whisper_loss=0.09135, over 3950469.01 frames. ], batch size: 67, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:32:56,078 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.362e+01 2.551e+01 2.899e+01 4.764e+01, threshold=5.103e+01, percent-clipped=0.0 2024-08-14 14:32:56,637 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2701520.0, ans=0.1 2024-08-14 14:32:59,052 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 33 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-14 14:33:16,828 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2701720.0, ans=0.125 2024-08-14 14:33:30,287 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2701820.0, ans=0.125 2024-08-14 14:33:42,956 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 24 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-14 14:33:57,669 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 9350, loss[loss=0.09617, beats_loss=0.0112, ecapa_loss=0.0001144, whisper_loss=0.08382, over 16973.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01062, ecapa_loss=0.0001545, whisper_loss=0.09064, over 3928571.05 frames. ], batch size: 64, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:34:02,999 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.62 vs. limit=6.0 2024-08-14 14:34:05,811 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2702020.0, ans=0.125 2024-08-14 14:34:26,651 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=6.54 vs. limit=12.0 2024-08-14 14:34:38,810 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-14 14:34:39,090 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2702220.0, ans=0.0 2024-08-14 14:34:53,950 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2702320.0, ans=0.1 2024-08-14 14:34:58,366 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2702420.0, ans=0.125 2024-08-14 14:35:11,775 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 9400, loss[loss=0.09951, beats_loss=0.01073, ecapa_loss=0.0001541, whisper_loss=0.08724, over 17572.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01059, ecapa_loss=0.0001551, whisper_loss=0.09119, over 3895799.28 frames. ], batch size: 69, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:35:23,610 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.830e+01 2.407e+01 2.622e+01 2.905e+01 1.999e+02, threshold=5.243e+01, percent-clipped=1.0 2024-08-14 14:35:34,368 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 19 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-14 14:35:41,742 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2702720.0, ans=0.125 2024-08-14 14:35:49,009 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2702720.0, ans=0.2 2024-08-14 14:35:57,743 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2702820.0, ans=0.0 2024-08-14 14:36:08,190 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.32 vs. limit=15.0 2024-08-14 14:36:14,190 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2702920.0, ans=0.125 2024-08-14 14:36:17,278 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2702920.0, ans=0.0 2024-08-14 14:36:25,171 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 9450, loss[loss=0.1036, beats_loss=0.01147, ecapa_loss=0.0001297, whisper_loss=0.09081, over 15590.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01059, ecapa_loss=0.0001561, whisper_loss=0.09102, over 3890255.99 frames. ], batch size: 61, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:36:46,001 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 14 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-14 14:37:36,680 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2703520.0, ans=0.0 2024-08-14 14:37:37,343 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 9500, loss[loss=0.1007, beats_loss=0.009259, ecapa_loss=0.0001633, whisper_loss=0.0898, over 18667.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0106, ecapa_loss=0.0001565, whisper_loss=0.09074, over 3900265.29 frames. ], batch size: 75, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:37:48,984 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.794e+01 2.397e+01 2.649e+01 2.966e+01 9.786e+01, threshold=5.299e+01, percent-clipped=1.0 2024-08-14 14:37:50,696 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 21 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-14 14:38:08,423 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2703720.0, ans=0.1 2024-08-14 14:38:08,466 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2703720.0, ans=0.125 2024-08-14 14:38:17,743 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2703720.0, ans=0.125 2024-08-14 14:38:31,886 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 18 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-14 14:38:43,937 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2703920.0, ans=0.125 2024-08-14 14:38:49,660 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2704020.0, ans=0.125 2024-08-14 14:38:50,527 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 9550, loss[loss=0.08662, beats_loss=0.01121, ecapa_loss=0.0001339, whisper_loss=0.07407, over 22988.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01058, ecapa_loss=0.0001586, whisper_loss=0.09002, over 3875765.45 frames. ], batch size: 93, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:38:58,759 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2704020.0, ans=0.1 2024-08-14 14:39:02,208 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=5.157e-03 2024-08-14 14:39:12,918 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.42 vs. limit=15.0 2024-08-14 14:39:44,966 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2704320.0, ans=0.0 2024-08-14 14:40:05,758 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 22 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-14 14:40:07,256 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2704420.0, ans=0.125 2024-08-14 14:40:17,109 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 9600, loss[loss=0.08657, beats_loss=0.01312, ecapa_loss=0.0001182, whisper_loss=0.07227, over 20648.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01056, ecapa_loss=0.0001587, whisper_loss=0.09022, over 3885579.37 frames. ], batch size: 78, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:40:22,232 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.60 vs. limit=10.0 2024-08-14 14:40:24,873 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 25 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-14 14:40:31,802 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.768e+01 2.443e+01 2.792e+01 3.086e+01 6.637e+01, threshold=5.584e+01, percent-clipped=2.0 2024-08-14 14:40:32,387 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2704520.0, ans=0.125 2024-08-14 14:40:40,677 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2704620.0, ans=0.125 2024-08-14 14:41:01,150 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.86 vs. limit=15.0 2024-08-14 14:41:05,398 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2704720.0, ans=0.125 2024-08-14 14:41:17,998 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2704820.0, ans=0.125 2024-08-14 14:41:21,683 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2704820.0, ans=0.125 2024-08-14 14:41:23,064 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 22 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-14 14:41:39,684 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 17 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-14 14:41:48,885 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 9650, loss[loss=0.1168, beats_loss=0.007759, ecapa_loss=0.0001764, whisper_loss=0.1073, over 17701.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01055, ecapa_loss=0.0001582, whisper_loss=0.09015, over 3860981.14 frames. ], batch size: 72, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:41:52,996 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.64 vs. limit=22.5 2024-08-14 14:41:54,221 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 17 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-14 14:42:23,739 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-14 14:42:26,886 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2705220.0, ans=0.0 2024-08-14 14:42:31,721 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.52 vs. limit=6.0 2024-08-14 14:42:48,954 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2705320.0, ans=0.07 2024-08-14 14:43:05,803 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 9700, loss[loss=0.1092, beats_loss=0.008929, ecapa_loss=0.0001545, whisper_loss=0.09872, over 22538.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01047, ecapa_loss=0.0001582, whisper_loss=0.09071, over 3841035.97 frames. ], batch size: 90, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:43:17,852 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.210e+01 2.464e+01 2.850e+01 7.455e+01, threshold=4.928e+01, percent-clipped=1.0 2024-08-14 14:43:29,279 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.113e-01 2024-08-14 14:43:39,359 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 13 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-14 14:43:40,832 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 18 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-14 14:43:41,145 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2705720.0, ans=0.1 2024-08-14 14:43:58,531 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 21 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-14 14:43:58,957 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2705820.0, ans=0.0 2024-08-14 14:44:00,088 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 26 from LS+wenet, 26 from Vox, 14 fro AS 2024-08-14 14:44:00,358 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2705820.0, ans=0.2 2024-08-14 14:44:03,209 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2705820.0, ans=0.0 2024-08-14 14:44:05,628 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 15 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-14 14:44:09,094 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=2705920.0, ans=0.025 2024-08-14 14:44:20,323 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 9750, loss[loss=0.1243, beats_loss=0.01003, ecapa_loss=0.0001473, whisper_loss=0.1128, over 22733.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01048, ecapa_loss=0.0001574, whisper_loss=0.09086, over 3829551.17 frames. ], batch size: 90, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:44:40,859 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 23 from LS+wenet, 32 from Vox, 38 fro AS 2024-08-14 14:44:43,835 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 24 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-14 14:44:58,258 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 33 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-14 14:44:58,769 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2706220.0, ans=0.125 2024-08-14 14:45:10,981 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2706320.0, ans=0.125 2024-08-14 14:45:37,052 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 9800, loss[loss=0.1138, beats_loss=0.01101, ecapa_loss=0.0001684, whisper_loss=0.1011, over 23058.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01054, ecapa_loss=0.0001574, whisper_loss=0.09112, over 3868737.62 frames. ], batch size: 91, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:45:48,275 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2706520.0, ans=0.0 2024-08-14 14:45:49,074 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.322e+01 2.608e+01 2.964e+01 4.916e+01, threshold=5.216e+01, percent-clipped=0.0 2024-08-14 14:45:57,393 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2706620.0, ans=0.125 2024-08-14 14:46:04,968 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 24 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-14 14:46:10,612 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 39 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-14 14:46:13,964 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2706720.0, ans=0.1 2024-08-14 14:46:15,737 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.70 vs. limit=12.0 2024-08-14 14:46:40,442 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=5.652e+01 2024-08-14 14:46:44,485 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2706920.0, ans=0.125 2024-08-14 14:46:51,637 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 9850, loss[loss=0.08694, beats_loss=0.0124, ecapa_loss=0.0001608, whisper_loss=0.07293, over 16071.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01061, ecapa_loss=0.0001564, whisper_loss=0.09171, over 3895012.38 frames. ], batch size: 67, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:46:56,329 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2707020.0, ans=0.1 2024-08-14 14:47:05,416 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2707120.0, ans=0.1 2024-08-14 14:47:18,570 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 23 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-14 14:47:20,119 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2707220.0, ans=0.125 2024-08-14 14:47:27,287 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2707220.0, ans=0.125 2024-08-14 14:47:42,643 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 25 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-14 14:47:47,248 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 22 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-14 14:48:00,149 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2707420.0, ans=0.0 2024-08-14 14:48:07,263 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 9900, loss[loss=0.1115, beats_loss=0.01057, ecapa_loss=0.0001241, whisper_loss=0.09971, over 18556.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.0107, ecapa_loss=0.0001549, whisper_loss=0.09153, over 3905198.53 frames. ], batch size: 69, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:48:10,549 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 24 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-14 14:48:15,128 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2707520.0, ans=0.0 2024-08-14 14:48:16,917 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2707520.0, ans=0.125 2024-08-14 14:48:16,958 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2707520.0, ans=0.025 2024-08-14 14:48:19,580 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.359e+01 2.713e+01 2.970e+01 4.614e+01, threshold=5.426e+01, percent-clipped=0.0 2024-08-14 14:48:34,049 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2707620.0, ans=0.125 2024-08-14 14:48:37,429 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 21 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-14 14:48:51,140 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 11 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-14 14:48:56,034 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.69 vs. limit=12.0 2024-08-14 14:48:57,962 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2707820.0, ans=0.125 2024-08-14 14:49:05,952 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.47 vs. limit=15.0 2024-08-14 14:49:38,535 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 9950, loss[loss=0.1132, beats_loss=0.01153, ecapa_loss=0.0001583, whisper_loss=0.1001, over 22596.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01075, ecapa_loss=0.000155, whisper_loss=0.09152, over 3877478.88 frames. ], batch size: 93, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:50:03,083 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 18 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-14 14:50:07,254 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2708120.0, ans=0.0 2024-08-14 14:50:09,651 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2708120.0, ans=0.125 2024-08-14 14:50:23,089 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 29 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-14 14:50:27,294 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2708220.0, ans=0.0 2024-08-14 14:50:44,002 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2708320.0, ans=0.04949747468305833 2024-08-14 14:51:19,687 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2708420.0, ans=0.1 2024-08-14 14:51:27,265 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 10000, loss[loss=0.1408, beats_loss=0.006769, ecapa_loss=0.0001429, whisper_loss=0.1326, over 18087.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01072, ecapa_loss=0.0001559, whisper_loss=0.09109, over 3856565.49 frames. ], batch size: 65, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:51:46,231 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.731e+01 2.366e+01 2.562e+01 2.817e+01 3.470e+01, threshold=5.124e+01, percent-clipped=0.0 2024-08-14 14:51:51,378 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2708620.0, ans=0.1 2024-08-14 14:51:53,831 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2708620.0, ans=0.125 2024-08-14 14:52:08,926 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=2708720.0, ans=0.02 2024-08-14 14:52:30,190 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.47 vs. limit=10.0 2024-08-14 14:52:38,682 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 24 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-14 14:52:45,311 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2708920.0, ans=0.1 2024-08-14 14:52:45,356 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2708920.0, ans=0.0 2024-08-14 14:52:57,257 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.639e+05 2024-08-14 14:52:58,559 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 10050, loss[loss=0.1057, beats_loss=0.008402, ecapa_loss=0.0001754, whisper_loss=0.09555, over 20580.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01066, ecapa_loss=0.0001562, whisper_loss=0.09092, over 3861354.80 frames. ], batch size: 82, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:53:08,007 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2709020.0, ans=0.125 2024-08-14 14:53:34,209 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2709220.0, ans=0.0 2024-08-14 14:53:34,539 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=2709220.0, ans=15.0 2024-08-14 14:53:39,505 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2709220.0, ans=0.0 2024-08-14 14:53:40,853 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2709220.0, ans=0.125 2024-08-14 14:53:45,494 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 23 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-14 14:53:54,355 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.36 vs. limit=22.5 2024-08-14 14:54:16,976 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 10100, loss[loss=0.117, beats_loss=0.009553, ecapa_loss=0.0001801, whisper_loss=0.1057, over 22174.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01067, ecapa_loss=0.0001559, whisper_loss=0.0908, over 3869642.20 frames. ], batch size: 91, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:54:22,780 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 23 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-14 14:54:29,031 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.269e+01 2.495e+01 2.791e+01 4.696e+01, threshold=4.989e+01, percent-clipped=0.0 2024-08-14 14:55:08,604 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.88 vs. limit=15.0 2024-08-14 14:55:10,140 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2709820.0, ans=0.125 2024-08-14 14:55:13,102 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2709820.0, ans=0.0 2024-08-14 14:55:20,681 WARNING [optim.py:496] (1/4) Scaling gradients by 0.040875811129808426, model_norm_threshold=49.8900260925293 2024-08-14 14:55:20,847 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.21, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.113e+05, grad_sumsq=3.113e+05, orig_rms_sq=1.000e+00 2024-08-14 14:55:25,346 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 30 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-14 14:55:27,181 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 26 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-14 14:55:32,694 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.56 vs. limit=15.0 2024-08-14 14:55:34,410 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 10150, loss[loss=0.1096, beats_loss=0.01211, ecapa_loss=0.0001348, whisper_loss=0.09617, over 23027.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01068, ecapa_loss=0.0001568, whisper_loss=0.09096, over 3864448.02 frames. ], batch size: 90, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:55:38,279 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-14 14:55:43,394 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2710020.0, ans=0.0 2024-08-14 14:55:50,610 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2710120.0, ans=0.2 2024-08-14 14:55:51,837 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 14 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-14 14:55:52,434 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.11 vs. limit=15.0 2024-08-14 14:55:54,359 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2710120.0, ans=0.0 2024-08-14 14:55:56,837 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-14 14:55:58,142 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 20 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-14 14:55:58,448 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2710120.0, ans=0.125 2024-08-14 14:55:58,505 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2710120.0, ans=0.125 2024-08-14 14:56:07,699 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 24 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-14 14:56:17,444 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.35 vs. limit=22.5 2024-08-14 14:56:26,156 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-14 14:56:37,031 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2710420.0, ans=0.2 2024-08-14 14:56:41,152 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 34 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-14 14:56:51,695 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 10200, loss[loss=0.105, beats_loss=0.008151, ecapa_loss=0.0002031, whisper_loss=0.09478, over 20912.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01065, ecapa_loss=0.000158, whisper_loss=0.09063, over 3867521.61 frames. ], batch size: 88, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:57:04,286 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.342e+01 2.619e+01 2.972e+01 1.221e+03, threshold=5.239e+01, percent-clipped=2.0 2024-08-14 14:57:06,312 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2710620.0, ans=0.07 2024-08-14 14:57:15,805 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 24 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-14 14:57:27,633 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2710720.0, ans=0.2 2024-08-14 14:57:28,955 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2710720.0, ans=0.125 2024-08-14 14:57:30,904 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.40 vs. limit=15.0 2024-08-14 14:57:56,766 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2710920.0, ans=0.0 2024-08-14 14:57:59,625 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=7.204e-02 2024-08-14 14:58:08,642 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 10250, loss[loss=0.1061, beats_loss=0.01211, ecapa_loss=0.0001236, whisper_loss=0.09276, over 22404.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01058, ecapa_loss=0.0001576, whisper_loss=0.09114, over 3884612.67 frames. ], batch size: 89, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:58:13,131 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.05 vs. limit=10.0 2024-08-14 14:58:15,581 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2711020.0, ans=0.1 2024-08-14 14:59:04,099 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 28 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-14 14:59:24,222 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2711420.0, ans=0.1 2024-08-14 14:59:24,245 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2711420.0, ans=0.125 2024-08-14 14:59:29,341 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 10300, loss[loss=0.09721, beats_loss=0.01011, ecapa_loss=0.0001617, whisper_loss=0.08548, over 17513.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01057, ecapa_loss=0.0001576, whisper_loss=0.09134, over 3892255.04 frames. ], batch size: 68, lr: 3.27e-03, grad_scale: 1.152921504606847e+18 2024-08-14 14:59:41,486 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.309e+01 2.627e+01 3.015e+01 4.712e+01, threshold=5.254e+01, percent-clipped=0.0 2024-08-14 14:59:49,288 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2711620.0, ans=0.0 2024-08-14 15:00:22,135 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.06 vs. limit=10.0 2024-08-14 15:00:27,542 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2711820.0, ans=0.125 2024-08-14 15:00:32,916 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2711820.0, ans=0.2 2024-08-14 15:00:35,268 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-14 15:00:46,062 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.58 vs. limit=15.0 2024-08-14 15:00:54,239 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 10350, loss[loss=0.1177, beats_loss=0.009296, ecapa_loss=0.0002035, whisper_loss=0.1064, over 21918.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01061, ecapa_loss=0.0001565, whisper_loss=0.092, over 3901112.37 frames. ], batch size: 92, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:00:54,960 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2712020.0, ans=0.125 2024-08-14 15:01:05,045 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.52 vs. limit=6.0 2024-08-14 15:01:06,407 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.85 vs. limit=15.0 2024-08-14 15:01:12,336 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.92 vs. limit=22.5 2024-08-14 15:01:24,361 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2712120.0, ans=0.125 2024-08-14 15:01:28,458 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 21 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-14 15:01:35,809 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 15 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-14 15:01:47,046 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2712320.0, ans=0.125 2024-08-14 15:01:53,576 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.55 vs. limit=15.0 2024-08-14 15:02:01,299 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 24 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-14 15:02:02,115 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2712420.0, ans=0.125 2024-08-14 15:02:11,905 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.06 vs. limit=12.0 2024-08-14 15:02:12,281 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 10400, loss[loss=0.1012, beats_loss=0.009333, ecapa_loss=0.0002024, whisper_loss=0.08983, over 15583.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0106, ecapa_loss=0.0001576, whisper_loss=0.09151, over 3884402.38 frames. ], batch size: 68, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:02:25,680 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.275e+01 2.638e+01 3.125e+01 4.616e+01, threshold=5.275e+01, percent-clipped=0.0 2024-08-14 15:02:42,375 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2712720.0, ans=0.2 2024-08-14 15:02:43,789 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2712720.0, ans=0.0 2024-08-14 15:02:48,543 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.37 vs. limit=15.0 2024-08-14 15:03:03,423 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.96 vs. limit=15.0 2024-08-14 15:03:09,926 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 20 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-14 15:03:16,292 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 21 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-14 15:03:18,251 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 19 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-14 15:03:26,956 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 10450, loss[loss=0.07216, beats_loss=0.01554, ecapa_loss=0.0001383, whisper_loss=0.05523, over 18194.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01055, ecapa_loss=0.0001574, whisper_loss=0.09159, over 3883348.36 frames. ], batch size: 75, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:03:47,923 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.69 vs. limit=10.0 2024-08-14 15:03:52,121 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2713120.0, ans=0.1 2024-08-14 15:03:59,181 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2713220.0, ans=0.2 2024-08-14 15:04:02,399 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2713220.0, ans=0.125 2024-08-14 15:04:06,449 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 18 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-14 15:04:17,283 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2713320.0, ans=0.125 2024-08-14 15:04:18,474 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 21 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-14 15:04:24,527 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 12 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-14 15:04:26,052 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 27 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-14 15:04:26,432 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2713420.0, ans=0.025 2024-08-14 15:04:33,987 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.96 vs. limit=12.0 2024-08-14 15:04:42,076 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 10500, loss[loss=0.1289, beats_loss=0.009994, ecapa_loss=0.000145, whisper_loss=0.1175, over 23703.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01057, ecapa_loss=0.0001574, whisper_loss=0.09146, over 3885820.64 frames. ], batch size: 91, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:04:44,180 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.746e+05 2024-08-14 15:04:46,967 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2713520.0, ans=0.0 2024-08-14 15:04:54,142 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 24 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-14 15:04:55,510 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.389e+01 2.560e+01 2.877e+01 3.688e+01, threshold=5.121e+01, percent-clipped=0.0 2024-08-14 15:05:14,344 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 15:05:21,658 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2713720.0, ans=0.125 2024-08-14 15:05:25,033 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.15 vs. limit=15.0 2024-08-14 15:05:27,244 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 16 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-14 15:05:46,604 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 23 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-14 15:05:51,721 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.29 vs. limit=15.0 2024-08-14 15:05:56,565 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 10550, loss[loss=0.1096, beats_loss=0.009011, ecapa_loss=0.0001648, whisper_loss=0.09889, over 23228.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01061, ecapa_loss=0.0001564, whisper_loss=0.09087, over 3879981.53 frames. ], batch size: 91, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:06:17,914 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.57 vs. limit=10.0 2024-08-14 15:07:10,087 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2714520.0, ans=0.0 2024-08-14 15:07:10,925 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 10600, loss[loss=0.1268, beats_loss=0.008498, ecapa_loss=0.0001607, whisper_loss=0.1167, over 23615.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01063, ecapa_loss=0.0001556, whisper_loss=0.09132, over 3902049.80 frames. ], batch size: 90, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:07:13,163 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.86 vs. limit=22.5 2024-08-14 15:07:16,778 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.87 vs. limit=15.0 2024-08-14 15:07:24,554 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.742e+01 2.333e+01 2.524e+01 2.900e+01 4.921e+01, threshold=5.049e+01, percent-clipped=0.0 2024-08-14 15:07:26,629 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2714620.0, ans=0.0 2024-08-14 15:07:45,874 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2714720.0, ans=0.0 2024-08-14 15:07:47,402 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2714720.0, ans=0.0 2024-08-14 15:07:48,691 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2714720.0, ans=0.0 2024-08-14 15:07:59,198 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2714820.0, ans=0.1 2024-08-14 15:08:24,414 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.43 vs. limit=15.0 2024-08-14 15:08:25,292 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 10650, loss[loss=0.11, beats_loss=0.009032, ecapa_loss=0.0001538, whisper_loss=0.09945, over 19405.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01061, ecapa_loss=0.0001554, whisper_loss=0.0918, over 3916319.64 frames. ], batch size: 73, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:08:41,340 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2715120.0, ans=0.0 2024-08-14 15:08:51,798 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2715120.0, ans=0.0 2024-08-14 15:08:58,798 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2715220.0, ans=0.125 2024-08-14 15:09:05,002 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2715220.0, ans=0.125 2024-08-14 15:09:39,652 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 10700, loss[loss=0.1052, beats_loss=0.01039, ecapa_loss=0.0001464, whisper_loss=0.09331, over 20803.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01064, ecapa_loss=0.000155, whisper_loss=0.09168, over 3923460.71 frames. ], batch size: 79, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:09:53,060 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.367e+01 2.619e+01 3.037e+01 4.020e+01, threshold=5.239e+01, percent-clipped=0.0 2024-08-14 15:10:04,016 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2715620.0, ans=0.2 2024-08-14 15:10:18,700 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-14 15:10:37,495 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.17 vs. limit=12.0 2024-08-14 15:10:41,228 INFO [train_multi_KD3.py:844] (1/4) A total of 96 cuts. 20 from LS+wenet, 28 from Vox, 48 fro AS 2024-08-14 15:10:42,726 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 23 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-14 15:10:44,893 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.15 vs. limit=6.0 2024-08-14 15:10:49,052 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 24 from LS+wenet, 15 from Vox, 54 fro AS 2024-08-14 15:10:54,549 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 10750, loss[loss=0.09201, beats_loss=0.01303, ecapa_loss=0.0001469, whisper_loss=0.07751, over 21062.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01063, ecapa_loss=0.0001554, whisper_loss=0.09116, over 3906813.72 frames. ], batch size: 89, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:10:56,415 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 27 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-14 15:10:56,779 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2716020.0, ans=0.125 2024-08-14 15:11:14,905 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2716120.0, ans=0.125 2024-08-14 15:11:19,145 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 25 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-14 15:11:25,580 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2716220.0, ans=0.0 2024-08-14 15:11:40,272 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.60 vs. limit=15.0 2024-08-14 15:12:00,196 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2716420.0, ans=0.2 2024-08-14 15:12:01,581 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 35 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-14 15:12:03,225 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2716420.0, ans=0.125 2024-08-14 15:12:09,930 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 10800, loss[loss=0.1014, beats_loss=0.01156, ecapa_loss=0.0001365, whisper_loss=0.08846, over 22790.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01068, ecapa_loss=0.0001536, whisper_loss=0.09195, over 3932533.88 frames. ], batch size: 90, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:12:12,159 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2716520.0, ans=0.125 2024-08-14 15:12:23,576 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.877e+01 2.404e+01 2.650e+01 3.101e+01 5.207e+01, threshold=5.300e+01, percent-clipped=0.0 2024-08-14 15:12:31,498 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2716620.0, ans=0.05 2024-08-14 15:12:34,868 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.08 vs. limit=10.0 2024-08-14 15:12:40,457 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2716720.0, ans=0.1 2024-08-14 15:12:44,080 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 27 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-14 15:12:45,532 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 19 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-14 15:13:03,453 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2716820.0, ans=0.125 2024-08-14 15:13:04,490 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 26 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-14 15:13:19,372 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2716920.0, ans=0.125 2024-08-14 15:13:23,494 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 10850, loss[loss=0.0995, beats_loss=0.01209, ecapa_loss=0.0001089, whisper_loss=0.08632, over 16160.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01073, ecapa_loss=0.0001545, whisper_loss=0.09155, over 3923053.71 frames. ], batch size: 61, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:14:30,966 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2717420.0, ans=0.1 2024-08-14 15:14:36,935 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2717420.0, ans=0.125 2024-08-14 15:14:36,995 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2717420.0, ans=0.2 2024-08-14 15:14:39,126 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 10900, loss[loss=0.09392, beats_loss=0.01139, ecapa_loss=0.0001831, whisper_loss=0.0807, over 17921.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01068, ecapa_loss=0.0001546, whisper_loss=0.09169, over 3920602.18 frames. ], batch size: 76, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:14:50,463 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 15:14:51,738 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 24 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-14 15:14:52,837 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.325e+01 2.589e+01 2.879e+01 4.786e+01, threshold=5.178e+01, percent-clipped=0.0 2024-08-14 15:15:19,618 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=2717720.0, ans=0.5 2024-08-14 15:15:21,144 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2717720.0, ans=0.1 2024-08-14 15:15:25,620 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2717820.0, ans=0.0 2024-08-14 15:15:28,839 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2717820.0, ans=0.125 2024-08-14 15:15:29,955 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 20 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-14 15:15:31,398 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 25 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-14 15:15:49,739 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2717920.0, ans=0.125 2024-08-14 15:15:53,875 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 10950, loss[loss=0.07482, beats_loss=0.01237, ecapa_loss=0.0001461, whisper_loss=0.06099, over 16312.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0107, ecapa_loss=0.0001548, whisper_loss=0.09058, over 3903295.26 frames. ], batch size: 67, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:15:55,434 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 21 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-14 15:15:58,367 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 18 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-14 15:16:08,764 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2718120.0, ans=0.0 2024-08-14 15:16:14,702 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.08 vs. limit=12.0 2024-08-14 15:16:24,374 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.65 vs. limit=22.5 2024-08-14 15:16:28,381 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2718220.0, ans=0.0 2024-08-14 15:16:46,312 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2718320.0, ans=0.0 2024-08-14 15:17:10,046 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 11000, loss[loss=0.09822, beats_loss=0.01205, ecapa_loss=0.0001521, whisper_loss=0.08465, over 20353.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01064, ecapa_loss=0.0001559, whisper_loss=0.09126, over 3934803.08 frames. ], batch size: 84, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:17:25,017 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.292e+01 2.575e+01 2.886e+01 4.359e+01, threshold=5.149e+01, percent-clipped=0.0 2024-08-14 15:17:35,147 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2718620.0, ans=0.0 2024-08-14 15:17:38,577 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 21 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-14 15:17:44,410 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.87 vs. limit=15.0 2024-08-14 15:18:02,492 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 33 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-14 15:18:15,890 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 21 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-14 15:18:19,525 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2718920.0, ans=0.125 2024-08-14 15:18:23,908 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 22 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-14 15:18:24,780 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.45 vs. limit=15.0 2024-08-14 15:18:34,260 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 11050, loss[loss=0.07911, beats_loss=0.01217, ecapa_loss=0.000102, whisper_loss=0.06592, over 14824.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01058, ecapa_loss=0.000157, whisper_loss=0.09109, over 3940096.24 frames. ], batch size: 56, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:18:37,676 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 24 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-14 15:18:41,466 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2719020.0, ans=0.1 2024-08-14 15:19:15,324 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=14.11 vs. limit=15.0 2024-08-14 15:19:25,527 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.76 vs. limit=22.5 2024-08-14 15:19:42,687 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.04 vs. limit=12.0 2024-08-14 15:19:43,928 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.74 vs. limit=10.0 2024-08-14 15:19:50,095 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2719420.0, ans=0.1 2024-08-14 15:19:53,526 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2719420.0, ans=0.125 2024-08-14 15:20:00,469 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 11100, loss[loss=0.1144, beats_loss=0.009381, ecapa_loss=0.0001491, whisper_loss=0.1035, over 14080.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01058, ecapa_loss=0.0001569, whisper_loss=0.09148, over 3963838.88 frames. ], batch size: 54, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:20:14,261 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.813e+01 2.445e+01 2.651e+01 2.947e+01 5.465e+01, threshold=5.303e+01, percent-clipped=1.0 2024-08-14 15:20:18,034 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.41 vs. limit=10.0 2024-08-14 15:20:19,254 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2719620.0, ans=0.125 2024-08-14 15:20:27,035 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2719620.0, ans=0.125 2024-08-14 15:20:34,637 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 15:20:41,531 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 22 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-14 15:20:46,917 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.45 vs. limit=12.0 2024-08-14 15:20:55,687 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 15:20:57,190 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2719820.0, ans=0.125 2024-08-14 15:20:57,574 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.17 vs. limit=22.5 2024-08-14 15:21:19,884 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 11150, loss[loss=0.09912, beats_loss=0.007847, ecapa_loss=0.0001616, whisper_loss=0.08966, over 15522.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01056, ecapa_loss=0.0001559, whisper_loss=0.09173, over 3939735.70 frames. ], batch size: 62, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:21:27,724 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2720020.0, ans=0.0 2024-08-14 15:22:00,237 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2720220.0, ans=0.125 2024-08-14 15:22:01,286 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 12 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-14 15:22:04,420 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2720320.0, ans=0.035 2024-08-14 15:22:10,200 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-14 15:22:10,688 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2720320.0, ans=0.07 2024-08-14 15:22:30,562 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-14 15:22:33,405 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 11200, loss[loss=0.1105, beats_loss=0.01039, ecapa_loss=0.000151, whisper_loss=0.09859, over 13763.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01054, ecapa_loss=0.0001559, whisper_loss=0.09232, over 3946321.33 frames. ], batch size: 56, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:22:35,211 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 29 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-14 15:22:46,576 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.924e+01 2.435e+01 2.587e+01 2.892e+01 4.591e+01, threshold=5.173e+01, percent-clipped=0.0 2024-08-14 15:22:50,405 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2720620.0, ans=0.1 2024-08-14 15:23:09,929 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2720720.0, ans=0.1 2024-08-14 15:23:12,563 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 22 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-14 15:23:25,708 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 23 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-14 15:23:39,351 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2720920.0, ans=0.07 2024-08-14 15:23:47,395 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 11250, loss[loss=0.08793, beats_loss=0.01137, ecapa_loss=0.0001364, whisper_loss=0.07519, over 18674.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01064, ecapa_loss=0.0001556, whisper_loss=0.09154, over 3901481.12 frames. ], batch size: 73, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:23:50,779 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 27 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-14 15:23:59,898 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2721020.0, ans=0.0 2024-08-14 15:24:06,009 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2721120.0, ans=0.2 2024-08-14 15:24:09,859 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=18.90 vs. limit=22.5 2024-08-14 15:24:12,015 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 21 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-14 15:24:15,441 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 15:24:28,006 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 27 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-14 15:24:40,340 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2721320.0, ans=0.125 2024-08-14 15:24:45,163 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2721320.0, ans=0.0 2024-08-14 15:24:54,808 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2721420.0, ans=0.125 2024-08-14 15:25:07,593 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2721520.0, ans=0.025 2024-08-14 15:25:08,390 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 11300, loss[loss=0.07091, beats_loss=0.01308, ecapa_loss=0.0001589, whisper_loss=0.05624, over 20669.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01061, ecapa_loss=0.0001553, whisper_loss=0.09212, over 3915979.81 frames. ], batch size: 86, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:25:21,483 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.886e+01 2.316e+01 2.542e+01 2.891e+01 3.051e+02, threshold=5.084e+01, percent-clipped=1.0 2024-08-14 15:25:26,428 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2721620.0, ans=0.125 2024-08-14 15:25:26,700 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.80 vs. limit=22.5 2024-08-14 15:25:28,428 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2721620.0, ans=0.0 2024-08-14 15:25:29,558 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 21 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-14 15:25:31,259 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2721620.0, ans=0.125 2024-08-14 15:25:51,573 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 28 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-14 15:25:57,572 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2721820.0, ans=0.2 2024-08-14 15:26:20,952 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 28 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-14 15:26:25,467 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 11350, loss[loss=0.1045, beats_loss=0.009705, ecapa_loss=0.0001561, whisper_loss=0.09323, over 16912.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01049, ecapa_loss=0.0001566, whisper_loss=0.0925, over 3902490.95 frames. ], batch size: 68, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:26:28,760 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.32 vs. limit=6.0 2024-08-14 15:26:40,837 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 16 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-14 15:26:52,312 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2722120.0, ans=0.0 2024-08-14 15:27:07,553 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 15:27:09,726 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 30 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-14 15:27:12,116 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2722220.0, ans=0.125 2024-08-14 15:27:41,200 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.32 vs. limit=10.0 2024-08-14 15:27:42,656 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-14 15:27:59,140 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 11400, loss[loss=0.09131, beats_loss=0.01286, ecapa_loss=0.0001511, whisper_loss=0.07694, over 22973.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01052, ecapa_loss=0.0001563, whisper_loss=0.09239, over 3884870.35 frames. ], batch size: 94, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:28:01,284 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 24 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-14 15:28:01,789 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.93 vs. limit=15.0 2024-08-14 15:28:13,454 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.371e+01 2.609e+01 2.947e+01 4.785e+01, threshold=5.218e+01, percent-clipped=0.0 2024-08-14 15:28:15,066 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 21 from LS+wenet, 17 from Vox, 17 fro AS 2024-08-14 15:28:18,356 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2722620.0, ans=0.125 2024-08-14 15:28:33,623 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-14 15:28:41,305 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2722720.0, ans=0.0 2024-08-14 15:28:41,374 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2722720.0, ans=0.125 2024-08-14 15:28:44,682 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-14 15:28:48,705 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.11 vs. limit=22.5 2024-08-14 15:29:25,679 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2722920.0, ans=0.125 2024-08-14 15:29:31,506 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 11450, loss[loss=0.08693, beats_loss=0.01269, ecapa_loss=0.0001533, whisper_loss=0.07271, over 21371.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01053, ecapa_loss=0.0001583, whisper_loss=0.09177, over 3852198.75 frames. ], batch size: 91, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:29:45,472 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 28 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-14 15:29:48,774 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.09 vs. limit=15.0 2024-08-14 15:30:02,071 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 24 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-14 15:30:50,162 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2723320.0, ans=0.0 2024-08-14 15:31:22,246 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 28 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-14 15:31:26,366 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2723420.0, ans=0.125 2024-08-14 15:31:30,643 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 11500, loss[loss=0.1021, beats_loss=0.01043, ecapa_loss=0.0001667, whisper_loss=0.09003, over 22874.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01052, ecapa_loss=0.0001567, whisper_loss=0.09237, over 3881240.46 frames. ], batch size: 94, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:31:36,124 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2723520.0, ans=0.1 2024-08-14 15:31:52,275 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.370e+01 2.644e+01 2.916e+01 4.086e+01, threshold=5.287e+01, percent-clipped=0.0 2024-08-14 15:31:57,124 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 22 from LS+wenet, 35 from Vox, 32 fro AS 2024-08-14 15:32:02,448 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.97 vs. limit=22.5 2024-08-14 15:32:05,539 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2723620.0, ans=0.1 2024-08-14 15:32:31,492 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 22 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-14 15:32:43,804 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2723820.0, ans=0.95 2024-08-14 15:32:47,470 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2723820.0, ans=0.0 2024-08-14 15:32:59,175 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2723820.0, ans=0.125 2024-08-14 15:33:14,489 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2723920.0, ans=0.2 2024-08-14 15:33:31,508 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 11550, loss[loss=0.1128, beats_loss=0.009508, ecapa_loss=0.000149, whisper_loss=0.1018, over 22206.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01047, ecapa_loss=0.0001575, whisper_loss=0.09272, over 3889345.18 frames. ], batch size: 87, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:33:37,425 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2724020.0, ans=0.0 2024-08-14 15:33:47,143 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2724020.0, ans=0.2 2024-08-14 15:33:53,358 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2724120.0, ans=0.0 2024-08-14 15:34:06,503 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2724120.0, ans=0.2 2024-08-14 15:35:01,995 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 22 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-14 15:35:02,368 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2724420.0, ans=0.2 2024-08-14 15:35:16,381 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 11600, loss[loss=0.1036, beats_loss=0.01037, ecapa_loss=0.00014, whisper_loss=0.09186, over 20513.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01057, ecapa_loss=0.0001572, whisper_loss=0.09169, over 3889425.65 frames. ], batch size: 80, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:35:22,069 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 22 from LS+wenet, 29 from Vox, 44 fro AS 2024-08-14 15:35:25,656 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2724520.0, ans=0.2 2024-08-14 15:35:25,948 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.39 vs. limit=6.0 2024-08-14 15:35:28,041 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 31 from Vox, 30 fro AS 2024-08-14 15:35:29,422 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.402e+01 2.609e+01 2.881e+01 4.573e+01, threshold=5.219e+01, percent-clipped=0.0 2024-08-14 15:35:57,213 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-14 15:36:02,507 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 33 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-14 15:36:21,079 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-14 15:36:28,206 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 11650, loss[loss=0.1015, beats_loss=0.01133, ecapa_loss=0.0001702, whisper_loss=0.08851, over 15735.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01064, ecapa_loss=0.0001568, whisper_loss=0.09117, over 3905287.50 frames. ], batch size: 65, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:36:34,660 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 15 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-14 15:36:38,042 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2725020.0, ans=0.125 2024-08-14 15:36:52,300 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 20 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-14 15:37:01,547 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 19 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-14 15:37:01,864 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2725220.0, ans=0.0 2024-08-14 15:37:13,571 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2725320.0, ans=0.125 2024-08-14 15:37:16,009 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-14 15:37:24,491 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 15:37:26,112 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2725320.0, ans=0.1 2024-08-14 15:37:26,149 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2725320.0, ans=0.125 2024-08-14 15:37:27,534 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2725420.0, ans=0.0 2024-08-14 15:37:30,904 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2725420.0, ans=0.1 2024-08-14 15:37:44,575 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 11700, loss[loss=0.08723, beats_loss=0.01288, ecapa_loss=0.0001528, whisper_loss=0.07283, over 20773.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01072, ecapa_loss=0.0001559, whisper_loss=0.09096, over 3921545.60 frames. ], batch size: 84, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:37:59,650 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.942e+01 2.338e+01 2.598e+01 2.950e+01 6.638e+01, threshold=5.196e+01, percent-clipped=2.0 2024-08-14 15:38:03,251 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 25 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-14 15:38:06,808 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 16 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-14 15:38:07,086 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2725620.0, ans=0.1 2024-08-14 15:38:20,640 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2725720.0, ans=0.0 2024-08-14 15:38:23,517 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-14 15:38:29,351 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2725720.0, ans=0.0 2024-08-14 15:38:31,647 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.91 vs. limit=15.0 2024-08-14 15:38:36,552 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2725820.0, ans=0.125 2024-08-14 15:38:52,171 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.53 vs. limit=15.0 2024-08-14 15:38:53,551 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2725920.0, ans=0.1 2024-08-14 15:39:00,706 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2725920.0, ans=0.1 2024-08-14 15:39:11,350 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 11750, loss[loss=0.08688, beats_loss=0.01247, ecapa_loss=0.0001505, whisper_loss=0.0729, over 19905.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01074, ecapa_loss=0.0001555, whisper_loss=0.09054, over 3928469.88 frames. ], batch size: 83, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:39:11,611 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 24 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-14 15:39:17,192 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2726020.0, ans=0.0 2024-08-14 15:39:26,963 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-14 15:40:09,843 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2726320.0, ans=0.125 2024-08-14 15:40:25,537 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 15 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-14 15:40:32,281 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 11800, loss[loss=0.08866, beats_loss=0.009479, ecapa_loss=0.0001422, whisper_loss=0.07776, over 21116.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01079, ecapa_loss=0.0001552, whisper_loss=0.09048, over 3921031.15 frames. ], batch size: 81, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:40:45,178 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.963e+01 2.511e+01 2.719e+01 3.108e+01 4.014e+02, threshold=5.439e+01, percent-clipped=2.0 2024-08-14 15:40:46,007 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2726620.0, ans=0.0 2024-08-14 15:40:53,646 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.12 vs. limit=15.0 2024-08-14 15:41:18,018 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2726820.0, ans=0.125 2024-08-14 15:41:19,466 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2726820.0, ans=0.1 2024-08-14 15:41:20,842 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2726820.0, ans=0.0 2024-08-14 15:41:30,714 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-14 15:41:33,772 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2726920.0, ans=0.0 2024-08-14 15:41:34,900 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 24 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-14 15:41:44,897 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 11850, loss[loss=0.07252, beats_loss=0.0103, ecapa_loss=0.0001376, whisper_loss=0.06084, over 19729.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01085, ecapa_loss=0.0001538, whisper_loss=0.09006, over 3928271.85 frames. ], batch size: 80, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:42:17,406 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 25 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-14 15:42:17,700 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=2727220.0, ans=0.95 2024-08-14 15:42:46,504 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2727420.0, ans=0.125 2024-08-14 15:42:51,507 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2727420.0, ans=0.09899494936611666 2024-08-14 15:42:56,343 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 11900, loss[loss=0.1119, beats_loss=0.01107, ecapa_loss=0.0001783, whisper_loss=0.0991, over 21099.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01084, ecapa_loss=0.0001531, whisper_loss=0.09016, over 3941962.40 frames. ], batch size: 88, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:43:09,809 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.301e+01 2.664e+01 2.917e+01 5.181e+01, threshold=5.327e+01, percent-clipped=0.0 2024-08-14 15:43:15,072 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2727620.0, ans=0.125 2024-08-14 15:43:23,775 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2727620.0, ans=0.2 2024-08-14 15:43:29,119 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2727720.0, ans=0.125 2024-08-14 15:43:35,085 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 15 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-14 15:43:37,050 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2727720.0, ans=0.125 2024-08-14 15:43:53,946 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-14 15:43:54,392 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=2727920.0, ans=0.2 2024-08-14 15:43:54,402 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2727920.0, ans=0.125 2024-08-14 15:44:09,912 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 11950, loss[loss=0.08491, beats_loss=0.01145, ecapa_loss=0.000151, whisper_loss=0.07195, over 14624.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01076, ecapa_loss=0.0001548, whisper_loss=0.09014, over 3896486.25 frames. ], batch size: 58, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:44:11,626 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 18 from LS+wenet, 20 from Vox, 17 fro AS 2024-08-14 15:44:14,595 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 25 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-14 15:44:30,772 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 18 from LS+wenet, 12 from Vox, 38 fro AS 2024-08-14 15:44:32,478 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2728120.0, ans=0.1 2024-08-14 15:45:02,003 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2728320.0, ans=0.125 2024-08-14 15:45:08,352 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.86 vs. limit=15.0 2024-08-14 15:45:16,759 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.73 vs. limit=22.5 2024-08-14 15:45:23,068 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 12000, loss[loss=0.09231, beats_loss=0.01029, ecapa_loss=0.000135, whisper_loss=0.08067, over 16703.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01082, ecapa_loss=0.0001532, whisper_loss=0.08992, over 3901158.41 frames. ], batch size: 62, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:45:23,069 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-14 15:46:00,641 INFO [train_multi_KD3.py:1149] (1/4) Epoch 19, validation on ASR_libri: loss=0.2528, beats_loss=0, ecapa_loss=0.000545, whisper_loss=0.2473, over 922467.00 frames. 2024-08-14 15:46:18,273 INFO [train_multi_KD3.py:1149] (1/4) Epoch 19, validation on SV_voxceleb1: loss=0.004271, beats_loss=0, ecapa_loss=0.0004271, whisper_loss=0, over 939242.00 frames. 2024-08-14 15:48:10,005 INFO [train_multi_KD3.py:1149] (1/4) Epoch 19, validation on AT_audioset: loss=0.0235, beats_loss=0.0235, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 15:48:10,008 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 30639MB 2024-08-14 15:48:16,817 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2728520.0, ans=0.125 2024-08-14 15:48:18,619 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=2728520.0, ans=10.0 2024-08-14 15:48:21,000 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 22 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-14 15:48:23,899 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.742e+01 2.361e+01 2.603e+01 2.893e+01 4.151e+01, threshold=5.206e+01, percent-clipped=0.0 2024-08-14 15:48:33,352 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 19 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-14 15:48:44,996 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-14 15:48:53,018 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.98 vs. limit=22.5 2024-08-14 15:49:19,947 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.48 vs. limit=22.5 2024-08-14 15:49:25,032 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 12050, loss[loss=0.1024, beats_loss=0.009584, ecapa_loss=0.0001742, whisper_loss=0.09108, over 17511.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01076, ecapa_loss=0.0001541, whisper_loss=0.09007, over 3892239.62 frames. ], batch size: 70, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:49:25,668 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2729020.0, ans=0.0 2024-08-14 15:49:26,759 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 30 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-14 15:49:43,386 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 23 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-14 15:49:51,312 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.51 vs. limit=12.0 2024-08-14 15:50:10,252 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2729320.0, ans=0.125 2024-08-14 15:50:10,785 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2024-08-14 15:50:18,975 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 21 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-14 15:50:26,154 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 21 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-14 15:50:33,843 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 17 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-14 15:50:35,915 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.02 vs. limit=15.0 2024-08-14 15:50:39,217 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 12100, loss[loss=0.1105, beats_loss=0.009501, ecapa_loss=0.0001804, whisper_loss=0.0992, over 22074.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01071, ecapa_loss=0.0001542, whisper_loss=0.09101, over 3910045.32 frames. ], batch size: 92, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:50:44,157 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 32 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-14 15:50:51,875 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2729520.0, ans=0.1 2024-08-14 15:50:52,533 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.911e+01 2.283e+01 2.551e+01 2.892e+01 3.951e+01, threshold=5.101e+01, percent-clipped=0.0 2024-08-14 15:50:57,597 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2729620.0, ans=0.1 2024-08-14 15:50:59,139 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.51 vs. limit=15.0 2024-08-14 15:51:02,993 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 23 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-14 15:51:08,570 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-14 15:51:18,143 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.42 vs. limit=15.0 2024-08-14 15:51:39,141 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 28 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-14 15:51:51,832 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 12150, loss[loss=0.09579, beats_loss=0.01086, ecapa_loss=0.0001296, whisper_loss=0.08364, over 13544.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01068, ecapa_loss=0.0001545, whisper_loss=0.09105, over 3887583.27 frames. ], batch size: 53, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:51:52,425 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2730020.0, ans=0.0 2024-08-14 15:51:52,525 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2730020.0, ans=0.2 2024-08-14 15:51:58,496 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.48 vs. limit=15.0 2024-08-14 15:52:08,159 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 23 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-14 15:52:21,299 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.20 vs. limit=15.0 2024-08-14 15:52:22,184 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2730220.0, ans=0.1 2024-08-14 15:52:22,637 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=3.94 vs. limit=15.0 2024-08-14 15:52:27,859 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 15:52:29,674 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.80 vs. limit=6.0 2024-08-14 15:52:30,542 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2730220.0, ans=0.125 2024-08-14 15:52:38,043 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-14 15:52:47,371 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2730320.0, ans=0.2 2024-08-14 15:52:59,316 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 15 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-14 15:53:06,495 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 12200, loss[loss=0.107, beats_loss=0.00796, ecapa_loss=0.0001907, whisper_loss=0.09718, over 19681.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01066, ecapa_loss=0.0001542, whisper_loss=0.09098, over 3868557.83 frames. ], batch size: 81, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:53:19,557 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.701e+01 2.397e+01 2.639e+01 2.869e+01 4.830e+01, threshold=5.277e+01, percent-clipped=0.0 2024-08-14 15:53:55,276 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-14 15:54:14,295 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 27 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-14 15:54:17,014 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-14 15:54:19,673 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 12250, loss[loss=0.1033, beats_loss=0.009771, ecapa_loss=0.000153, whisper_loss=0.09196, over 22867.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01058, ecapa_loss=0.0001555, whisper_loss=0.09075, over 3853553.89 frames. ], batch size: 92, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:55:12,886 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 21 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-14 15:55:32,674 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 12300, loss[loss=0.1048, beats_loss=0.01133, ecapa_loss=0.0001257, whisper_loss=0.09222, over 17140.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01059, ecapa_loss=0.0001556, whisper_loss=0.0907, over 3830364.70 frames. ], batch size: 66, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:55:46,098 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.098e+01 2.391e+01 2.726e+01 3.127e+01 1.434e+02, threshold=5.452e+01, percent-clipped=1.0 2024-08-14 15:55:49,684 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=3.146e-02 2024-08-14 15:55:54,446 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.17 vs. limit=15.0 2024-08-14 15:56:09,976 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 26 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-14 15:56:12,658 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 21 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-14 15:56:15,567 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 19 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-14 15:56:22,070 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2731820.0, ans=0.0 2024-08-14 15:56:30,192 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 16 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-14 15:56:34,684 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 25 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-14 15:56:39,231 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 26 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-14 15:56:46,342 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 12350, loss[loss=0.07899, beats_loss=0.01254, ecapa_loss=0.0001526, whisper_loss=0.06492, over 18187.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01058, ecapa_loss=0.0001566, whisper_loss=0.09063, over 3851205.73 frames. ], batch size: 76, lr: 3.26e-03, grad_scale: 1.152921504606847e+18 2024-08-14 15:56:52,799 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2732020.0, ans=0.125 2024-08-14 15:56:59,650 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2732120.0, ans=0.125 2024-08-14 15:57:15,175 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.93 vs. limit=15.0 2024-08-14 15:57:15,260 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.70 vs. limit=15.0 2024-08-14 15:57:17,353 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 30 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-14 15:57:47,232 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2732420.0, ans=0.0 2024-08-14 15:57:59,045 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 24 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-14 15:58:00,296 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 12400, loss[loss=0.09869, beats_loss=0.01169, ecapa_loss=0.0001542, whisper_loss=0.08545, over 20676.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01051, ecapa_loss=0.0001572, whisper_loss=0.09082, over 3838599.12 frames. ], batch size: 83, lr: 3.26e-03, grad_scale: 1.152921504606847e+18 2024-08-14 15:58:13,949 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.330e+01 2.578e+01 2.980e+01 5.348e+02, threshold=5.156e+01, percent-clipped=2.0 2024-08-14 15:58:31,216 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2732720.0, ans=0.0 2024-08-14 15:59:07,343 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.whiten.whitening_limit, batch_count=2732920.0, ans=12.0 2024-08-14 15:59:08,106 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 18 from LS+wenet, 10 from Vox, 36 fro AS 2024-08-14 15:59:11,073 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2732920.0, ans=0.125 2024-08-14 15:59:14,831 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 12450, loss[loss=0.1169, beats_loss=0.008381, ecapa_loss=0.0001446, whisper_loss=0.1071, over 17179.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01056, ecapa_loss=0.0001571, whisper_loss=0.09104, over 3856013.74 frames. ], batch size: 64, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:59:15,479 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2733020.0, ans=0.0 2024-08-14 15:59:16,688 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 28 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-14 15:59:22,762 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2733020.0, ans=0.04949747468305833 2024-08-14 15:59:39,559 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.55 vs. limit=15.0 2024-08-14 16:00:08,565 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2733320.0, ans=0.1 2024-08-14 16:00:19,200 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 32 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-14 16:00:30,570 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 12500, loss[loss=0.1043, beats_loss=0.01081, ecapa_loss=0.0001626, whisper_loss=0.09186, over 16975.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01062, ecapa_loss=0.0001555, whisper_loss=0.091, over 3879499.78 frames. ], batch size: 68, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:00:45,545 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.14 vs. limit=22.5 2024-08-14 16:00:45,937 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.755e+01 2.338e+01 2.506e+01 2.817e+01 7.820e+01, threshold=5.011e+01, percent-clipped=1.0 2024-08-14 16:01:04,250 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-14 16:01:17,385 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 19 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-14 16:01:25,192 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2733820.0, ans=0.0 2024-08-14 16:01:32,107 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.whiten.whitening_limit, batch_count=2733920.0, ans=12.0 2024-08-14 16:01:46,196 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 12550, loss[loss=0.08499, beats_loss=0.01104, ecapa_loss=0.000208, whisper_loss=0.07187, over 13077.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01058, ecapa_loss=0.0001555, whisper_loss=0.09139, over 3879895.58 frames. ], batch size: 57, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:01:59,786 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2734120.0, ans=0.125 2024-08-14 16:02:21,556 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 19 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-14 16:02:37,213 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.78 vs. limit=15.0 2024-08-14 16:02:38,300 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2734320.0, ans=0.1 2024-08-14 16:03:00,561 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 12600, loss[loss=0.1161, beats_loss=0.009003, ecapa_loss=0.0001855, whisper_loss=0.1052, over 17157.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01064, ecapa_loss=0.0001555, whisper_loss=0.09121, over 3873625.02 frames. ], batch size: 69, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:03:00,824 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 15 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-14 16:03:06,581 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2734520.0, ans=0.125 2024-08-14 16:03:14,511 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.923e+01 2.270e+01 2.592e+01 3.036e+01 4.281e+01, threshold=5.185e+01, percent-clipped=0.0 2024-08-14 16:03:15,731 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.41 vs. limit=15.0 2024-08-14 16:03:25,016 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 27 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-14 16:03:25,281 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=2734620.0, ans=0.05 2024-08-14 16:03:30,040 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2734720.0, ans=0.125 2024-08-14 16:03:39,271 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2734720.0, ans=0.0 2024-08-14 16:03:40,504 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 20 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-14 16:03:45,283 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2734820.0, ans=0.125 2024-08-14 16:03:52,787 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2734820.0, ans=0.5 2024-08-14 16:04:14,029 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 12650, loss[loss=0.1095, beats_loss=0.009999, ecapa_loss=0.0001334, whisper_loss=0.09819, over 23982.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01079, ecapa_loss=0.0001556, whisper_loss=0.09061, over 3908541.91 frames. ], batch size: 91, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:04:24,859 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 21 from LS+wenet, 32 from Vox, 34 fro AS 2024-08-14 16:04:29,645 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2735120.0, ans=0.125 2024-08-14 16:04:36,641 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 33 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-14 16:04:45,970 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-14 16:04:46,167 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2735220.0, ans=0.125 2024-08-14 16:04:50,678 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2735220.0, ans=0.0 2024-08-14 16:05:26,926 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2735520.0, ans=0.125 2024-08-14 16:05:27,834 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 12700, loss[loss=0.09475, beats_loss=0.01215, ecapa_loss=0.0001451, whisper_loss=0.08114, over 17654.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01081, ecapa_loss=0.0001559, whisper_loss=0.09053, over 3897115.99 frames. ], batch size: 71, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:05:41,769 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 16:05:42,518 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.717e+01 2.369e+01 2.524e+01 2.927e+01 4.569e+01, threshold=5.048e+01, percent-clipped=0.0 2024-08-14 16:05:45,722 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-14 16:05:51,645 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-14 16:05:54,907 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.75 vs. limit=6.0 2024-08-14 16:05:56,024 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2735720.0, ans=0.0 2024-08-14 16:06:01,999 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.93 vs. limit=22.5 2024-08-14 16:06:26,815 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2735920.0, ans=0.125 2024-08-14 16:06:31,008 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-14 16:06:34,249 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2735920.0, ans=0.125 2024-08-14 16:06:41,533 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 12750, loss[loss=0.09529, beats_loss=0.01292, ecapa_loss=0.0001414, whisper_loss=0.08096, over 15481.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01083, ecapa_loss=0.0001559, whisper_loss=0.09041, over 3911809.16 frames. ], batch size: 63, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:06:52,057 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2736020.0, ans=0.0 2024-08-14 16:06:54,457 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 18 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-14 16:07:12,047 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2736220.0, ans=0.125 2024-08-14 16:07:28,355 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2736320.0, ans=0.125 2024-08-14 16:07:38,963 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2736420.0, ans=0.125 2024-08-14 16:07:50,833 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2736420.0, ans=0.125 2024-08-14 16:07:55,018 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 12800, loss[loss=0.0956, beats_loss=0.01228, ecapa_loss=0.0001481, whisper_loss=0.08184, over 23242.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01082, ecapa_loss=0.0001567, whisper_loss=0.09004, over 3891537.94 frames. ], batch size: 98, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:08:09,444 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.938e+01 2.300e+01 2.515e+01 2.756e+01 3.404e+01, threshold=5.031e+01, percent-clipped=0.0 2024-08-14 16:08:20,186 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 25 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-14 16:08:24,648 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2736720.0, ans=0.0 2024-08-14 16:08:28,822 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 13 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-14 16:08:31,732 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 26 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-14 16:08:52,306 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.73 vs. limit=15.0 2024-08-14 16:09:06,145 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 25 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-14 16:09:09,129 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 12850, loss[loss=0.08544, beats_loss=0.01362, ecapa_loss=0.0001368, whisper_loss=0.07046, over 18582.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01083, ecapa_loss=0.0001569, whisper_loss=0.08959, over 3872305.39 frames. ], batch size: 77, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:09:12,462 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2737020.0, ans=0.1 2024-08-14 16:09:22,915 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2737120.0, ans=0.125 2024-08-14 16:09:56,564 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 17 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-14 16:10:14,651 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2737420.0, ans=0.125 2024-08-14 16:10:16,287 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2737420.0, ans=0.0 2024-08-14 16:10:21,417 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 12900, loss[loss=0.1115, beats_loss=0.009574, ecapa_loss=0.0001954, whisper_loss=0.1, over 21289.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01086, ecapa_loss=0.000156, whisper_loss=0.08945, over 3850558.62 frames. ], batch size: 89, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:10:22,157 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.55 vs. limit=15.0 2024-08-14 16:10:35,671 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.790e+01 2.212e+01 2.559e+01 2.809e+01 4.062e+01, threshold=5.118e+01, percent-clipped=0.0 2024-08-14 16:11:05,590 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2737820.0, ans=0.0 2024-08-14 16:11:17,291 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 24 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-14 16:11:22,023 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.03 vs. limit=15.0 2024-08-14 16:11:26,220 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2737920.0, ans=0.0 2024-08-14 16:11:34,559 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 12950, loss[loss=0.0924, beats_loss=0.01152, ecapa_loss=0.0001913, whisper_loss=0.07896, over 20636.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01074, ecapa_loss=0.0001574, whisper_loss=0.08963, over 3841227.19 frames. ], batch size: 86, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:11:37,664 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 29 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-14 16:11:42,291 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2738020.0, ans=0.1 2024-08-14 16:11:57,420 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.68 vs. limit=12.0 2024-08-14 16:12:05,646 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-14 16:12:06,147 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2738220.0, ans=0.2 2024-08-14 16:12:12,402 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2738220.0, ans=0.0 2024-08-14 16:12:24,139 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.17 vs. limit=15.0 2024-08-14 16:12:28,963 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2738320.0, ans=0.2 2024-08-14 16:12:49,595 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 13000, loss[loss=0.1253, beats_loss=0.008477, ecapa_loss=0.0001893, whisper_loss=0.1149, over 21662.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01072, ecapa_loss=0.0001576, whisper_loss=0.08988, over 3887473.28 frames. ], batch size: 88, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:12:51,698 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2738520.0, ans=0.04949747468305833 2024-08-14 16:12:52,590 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 28 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-14 16:12:54,485 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2738520.0, ans=0.0 2024-08-14 16:12:59,558 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.57 vs. limit=15.0 2024-08-14 16:13:04,746 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.729e+01 2.365e+01 2.543e+01 2.775e+01 1.627e+02, threshold=5.086e+01, percent-clipped=3.0 2024-08-14 16:13:14,649 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.01 vs. limit=15.0 2024-08-14 16:13:15,617 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2738620.0, ans=0.1 2024-08-14 16:13:41,160 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2738820.0, ans=0.1 2024-08-14 16:13:42,236 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 38 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-14 16:14:03,078 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2738920.0, ans=0.1 2024-08-14 16:14:05,326 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 13050, loss[loss=0.128, beats_loss=0.009233, ecapa_loss=0.000154, whisper_loss=0.1172, over 22209.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01065, ecapa_loss=0.0001574, whisper_loss=0.09023, over 3859190.15 frames. ], batch size: 85, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:14:20,498 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 17 from LS+wenet, 33 from Vox, 38 fro AS 2024-08-14 16:14:23,567 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2739120.0, ans=0.125 2024-08-14 16:14:37,509 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.44 vs. limit=15.0 2024-08-14 16:14:39,843 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2739220.0, ans=0.1 2024-08-14 16:14:46,327 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.92 vs. limit=15.0 2024-08-14 16:15:01,944 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2739320.0, ans=0.0 2024-08-14 16:15:05,064 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2739420.0, ans=0.125 2024-08-14 16:15:06,397 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2739420.0, ans=0.0 2024-08-14 16:15:18,524 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 13100, loss[loss=0.07235, beats_loss=0.0137, ecapa_loss=0.0001637, whisper_loss=0.05701, over 14230.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01076, ecapa_loss=0.000156, whisper_loss=0.08898, over 3841854.84 frames. ], batch size: 59, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:15:19,144 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2739520.0, ans=0.125 2024-08-14 16:15:22,261 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2739520.0, ans=0.125 2024-08-14 16:15:31,254 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2739520.0, ans=0.0 2024-08-14 16:15:31,297 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2739520.0, ans=0.125 2024-08-14 16:15:33,256 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.40 vs. limit=22.5 2024-08-14 16:15:33,719 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.901e+01 2.291e+01 2.498e+01 2.880e+01 4.346e+01, threshold=4.996e+01, percent-clipped=0.0 2024-08-14 16:15:41,810 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 23 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-14 16:16:04,380 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2739820.0, ans=0.0 2024-08-14 16:16:04,653 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.19 vs. limit=10.0 2024-08-14 16:16:07,397 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2739820.0, ans=0.1 2024-08-14 16:16:20,345 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2739920.0, ans=0.125 2024-08-14 16:16:23,275 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2739920.0, ans=0.125 2024-08-14 16:16:23,328 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2739920.0, ans=0.0 2024-08-14 16:16:23,351 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2739920.0, ans=0.1 2024-08-14 16:16:29,698 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=15.23 vs. limit=15.0 2024-08-14 16:16:33,374 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 13150, loss[loss=0.08228, beats_loss=0.01313, ecapa_loss=0.0001271, whisper_loss=0.06788, over 15110.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01084, ecapa_loss=0.0001554, whisper_loss=0.08896, over 3844822.74 frames. ], batch size: 59, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:16:38,084 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 22 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-14 16:16:40,305 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2740020.0, ans=0.2 2024-08-14 16:16:44,449 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2740020.0, ans=0.0 2024-08-14 16:16:46,958 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 30 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-14 16:17:11,392 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2740220.0, ans=0.1 2024-08-14 16:17:12,873 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2740220.0, ans=0.125 2024-08-14 16:17:45,657 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.99 vs. limit=22.5 2024-08-14 16:17:47,432 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 13200, loss[loss=0.08958, beats_loss=0.01213, ecapa_loss=0.0001355, whisper_loss=0.07609, over 20139.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01074, ecapa_loss=0.0001554, whisper_loss=0.08936, over 3809145.01 frames. ], batch size: 81, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:17:56,956 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2740520.0, ans=0.125 2024-08-14 16:17:57,046 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2740520.0, ans=0.0 2024-08-14 16:18:02,624 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.908e+01 2.404e+01 2.825e+01 3.249e+01 1.605e+02, threshold=5.649e+01, percent-clipped=1.0 2024-08-14 16:18:07,682 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2740620.0, ans=0.125 2024-08-14 16:18:08,660 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-14 16:18:16,021 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2740720.0, ans=0.0 2024-08-14 16:18:43,651 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 24 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-14 16:18:53,029 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2740920.0, ans=0.0 2024-08-14 16:19:00,924 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 13250, loss[loss=0.113, beats_loss=0.009111, ecapa_loss=0.0001551, whisper_loss=0.1024, over 18764.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01079, ecapa_loss=0.0001554, whisper_loss=0.08887, over 3780434.76 frames. ], batch size: 74, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:19:01,219 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 19 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-14 16:19:21,144 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 16 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-14 16:19:26,927 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 22 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-14 16:19:40,191 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2741220.0, ans=0.0 2024-08-14 16:19:43,234 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2741320.0, ans=0.125 2024-08-14 16:19:44,612 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2741320.0, ans=0.125 2024-08-14 16:19:47,409 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2741320.0, ans=0.125 2024-08-14 16:19:50,288 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.63 vs. limit=12.0 2024-08-14 16:19:53,888 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 24 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-14 16:20:07,219 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 29 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-14 16:20:08,706 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 17 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-14 16:20:12,819 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 13300, loss[loss=0.09969, beats_loss=0.01107, ecapa_loss=0.000129, whisper_loss=0.08733, over 20619.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01076, ecapa_loss=0.0001541, whisper_loss=0.08913, over 3767358.56 frames. ], batch size: 80, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:20:27,882 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.762e+01 2.357e+01 2.636e+01 2.927e+01 4.489e+01, threshold=5.273e+01, percent-clipped=0.0 2024-08-14 16:21:12,726 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.39 vs. limit=15.0 2024-08-14 16:21:26,659 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 13350, loss[loss=0.06607, beats_loss=0.01245, ecapa_loss=0.0001304, whisper_loss=0.05232, over 14312.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0108, ecapa_loss=0.0001526, whisper_loss=0.0894, over 3792211.45 frames. ], batch size: 58, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:21:34,855 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2742020.0, ans=0.0 2024-08-14 16:21:40,450 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 15 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-14 16:21:46,445 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2742120.0, ans=0.1 2024-08-14 16:22:31,108 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2742420.0, ans=0.125 2024-08-14 16:22:40,975 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 13400, loss[loss=0.08979, beats_loss=0.01303, ecapa_loss=0.000141, whisper_loss=0.07535, over 17289.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01085, ecapa_loss=0.0001524, whisper_loss=0.08918, over 3818027.40 frames. ], batch size: 72, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:22:55,929 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.916e+01 2.422e+01 2.683e+01 3.045e+01 1.877e+02, threshold=5.367e+01, percent-clipped=2.0 2024-08-14 16:23:11,424 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2742720.0, ans=0.1 2024-08-14 16:23:14,758 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2742720.0, ans=0.05 2024-08-14 16:23:26,614 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.04 vs. limit=15.0 2024-08-14 16:23:27,219 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 20 from LS+wenet, 23 from Vox, 48 fro AS 2024-08-14 16:23:34,624 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2742820.0, ans=0.1 2024-08-14 16:23:45,435 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.72 vs. limit=6.0 2024-08-14 16:23:46,768 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2742920.0, ans=0.025 2024-08-14 16:23:53,462 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-14 16:23:54,567 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 13450, loss[loss=0.1014, beats_loss=0.01129, ecapa_loss=0.0001578, whisper_loss=0.08851, over 21888.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0108, ecapa_loss=0.0001526, whisper_loss=0.08986, over 3833871.61 frames. ], batch size: 91, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:24:02,339 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 25 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-14 16:24:04,536 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.65 vs. limit=15.0 2024-08-14 16:24:09,364 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 15 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-14 16:24:14,041 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 19 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-14 16:24:33,536 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 22 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-14 16:24:49,471 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.41 vs. limit=10.0 2024-08-14 16:24:54,587 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2743420.0, ans=0.1 2024-08-14 16:24:58,946 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2743420.0, ans=0.125 2024-08-14 16:25:08,609 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 13500, loss[loss=0.09223, beats_loss=0.01259, ecapa_loss=0.00014, whisper_loss=0.07824, over 23349.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01073, ecapa_loss=0.0001541, whisper_loss=0.09041, over 3862742.96 frames. ], batch size: 95, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:25:15,705 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.80 vs. limit=15.0 2024-08-14 16:25:19,435 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2743520.0, ans=0.1 2024-08-14 16:25:23,158 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.695e+01 2.281e+01 2.536e+01 2.815e+01 4.454e+01, threshold=5.072e+01, percent-clipped=0.0 2024-08-14 16:25:23,865 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2743620.0, ans=0.0 2024-08-14 16:25:25,448 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2743620.0, ans=0.0 2024-08-14 16:25:31,321 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.74 vs. limit=22.5 2024-08-14 16:25:48,393 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-14 16:25:50,393 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2743720.0, ans=0.0 2024-08-14 16:25:50,413 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2743720.0, ans=0.125 2024-08-14 16:26:03,681 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2743820.0, ans=0.125 2024-08-14 16:26:04,701 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 14 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-14 16:26:07,833 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2743920.0, ans=0.0 2024-08-14 16:26:09,100 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 27 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-14 16:26:18,108 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2743920.0, ans=0.05 2024-08-14 16:26:21,291 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2744020.0, ans=0.125 2024-08-14 16:26:22,145 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 13550, loss[loss=0.118, beats_loss=0.009035, ecapa_loss=0.000131, whisper_loss=0.1077, over 17067.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01072, ecapa_loss=0.0001533, whisper_loss=0.09049, over 3854174.82 frames. ], batch size: 61, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:26:22,847 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2744020.0, ans=0.1 2024-08-14 16:26:28,562 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-14 16:26:40,432 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2744120.0, ans=0.1 2024-08-14 16:26:44,426 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 34 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-14 16:26:44,774 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2744120.0, ans=0.2 2024-08-14 16:26:45,395 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.37 vs. limit=15.0 2024-08-14 16:26:48,988 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-14 16:26:59,628 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.00 vs. limit=15.0 2024-08-14 16:27:00,840 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2744220.0, ans=0.0 2024-08-14 16:27:00,859 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2744220.0, ans=0.0 2024-08-14 16:27:03,408 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 33 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-14 16:27:06,820 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2744320.0, ans=0.125 2024-08-14 16:27:07,197 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=2744320.0, ans=10.0 2024-08-14 16:27:12,112 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 18 from LS+wenet, 29 from Vox, 45 fro AS 2024-08-14 16:27:34,897 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 13600, loss[loss=0.09276, beats_loss=0.01214, ecapa_loss=0.0001453, whisper_loss=0.07916, over 21543.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01076, ecapa_loss=0.000155, whisper_loss=0.09055, over 3884274.14 frames. ], batch size: 88, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:27:37,164 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2744520.0, ans=0.05 2024-08-14 16:27:45,098 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 28 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-14 16:27:49,310 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.289e+01 2.556e+01 2.921e+01 4.683e+01, threshold=5.111e+01, percent-clipped=0.0 2024-08-14 16:27:58,982 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.95 vs. limit=15.0 2024-08-14 16:28:02,903 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2744720.0, ans=0.0 2024-08-14 16:28:12,403 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2744720.0, ans=0.0 2024-08-14 16:28:14,163 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2744720.0, ans=0.125 2024-08-14 16:28:31,643 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2744820.0, ans=0.1 2024-08-14 16:28:45,251 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.12 vs. limit=10.0 2024-08-14 16:28:48,839 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 13650, loss[loss=0.09791, beats_loss=0.01139, ecapa_loss=0.0001405, whisper_loss=0.08511, over 18363.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01084, ecapa_loss=0.0001547, whisper_loss=0.09058, over 3912483.62 frames. ], batch size: 71, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:29:22,042 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2745220.0, ans=0.0 2024-08-14 16:29:27,755 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 16:29:29,642 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.100e+05 2024-08-14 16:29:40,986 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2745320.0, ans=0.0 2024-08-14 16:29:49,187 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.11 vs. limit=15.0 2024-08-14 16:29:51,334 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 20 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-14 16:29:54,194 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 19 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-14 16:30:02,253 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 13700, loss[loss=0.08926, beats_loss=0.01315, ecapa_loss=9.82e-05, whisper_loss=0.07512, over 15838.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0108, ecapa_loss=0.0001537, whisper_loss=0.09105, over 3909930.20 frames. ], batch size: 59, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:30:02,993 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2745520.0, ans=0.2 2024-08-14 16:30:04,101 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 18 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-14 16:30:04,406 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2745520.0, ans=0.2 2024-08-14 16:30:06,604 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.36 vs. limit=22.5 2024-08-14 16:30:14,303 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 22 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-14 16:30:16,810 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.735e+01 2.313e+01 2.534e+01 2.793e+01 4.098e+01, threshold=5.069e+01, percent-clipped=0.0 2024-08-14 16:30:37,229 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-14 16:30:38,536 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 23 from LS+wenet, 16 from Vox, 15 fro AS 2024-08-14 16:30:41,690 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2745720.0, ans=0.0 2024-08-14 16:30:53,325 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2745820.0, ans=0.125 2024-08-14 16:30:54,577 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2745820.0, ans=0.0 2024-08-14 16:31:14,673 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 13750, loss[loss=0.1102, beats_loss=0.008789, ecapa_loss=0.0001663, whisper_loss=0.09971, over 20267.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01068, ecapa_loss=0.0001535, whisper_loss=0.09123, over 3906668.63 frames. ], batch size: 81, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:31:18,137 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2746020.0, ans=0.125 2024-08-14 16:31:24,504 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.14 vs. limit=10.0 2024-08-14 16:31:28,020 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 18 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-14 16:31:37,523 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.33 vs. limit=10.0 2024-08-14 16:31:49,015 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2746220.0, ans=0.125 2024-08-14 16:31:58,218 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.65 vs. limit=6.0 2024-08-14 16:32:09,283 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 19 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-14 16:32:12,523 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2746420.0, ans=0.125 2024-08-14 16:32:12,617 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2746420.0, ans=0.2 2024-08-14 16:32:28,905 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 13800, loss[loss=0.1112, beats_loss=0.01134, ecapa_loss=0.0001535, whisper_loss=0.09836, over 22941.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01071, ecapa_loss=0.0001534, whisper_loss=0.09112, over 3877134.48 frames. ], batch size: 91, lr: 3.25e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:32:45,694 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.356e+01 2.629e+01 2.983e+01 1.767e+02, threshold=5.258e+01, percent-clipped=3.0 2024-08-14 16:32:50,814 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=2746620.0, ans=0.05 2024-08-14 16:32:52,470 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2746620.0, ans=0.0 2024-08-14 16:32:59,138 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 28 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-14 16:33:09,025 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.79 vs. limit=15.0 2024-08-14 16:33:13,150 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.48 vs. limit=12.0 2024-08-14 16:33:20,146 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 23 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-14 16:33:43,047 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 13850, loss[loss=0.06507, beats_loss=0.01286, ecapa_loss=0.0001618, whisper_loss=0.05059, over 17385.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01068, ecapa_loss=0.0001533, whisper_loss=0.09163, over 3880380.71 frames. ], batch size: 74, lr: 3.25e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:33:53,999 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 20 from LS+wenet, 10 from Vox, 32 fro AS 2024-08-14 16:33:57,141 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2747120.0, ans=0.05 2024-08-14 16:34:10,128 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 26 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-14 16:34:12,838 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 29 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-14 16:34:14,691 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2747220.0, ans=0.125 2024-08-14 16:34:28,124 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.25 vs. limit=22.5 2024-08-14 16:34:31,998 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2747320.0, ans=0.125 2024-08-14 16:34:40,689 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2747420.0, ans=0.125 2024-08-14 16:34:53,797 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 26 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-14 16:34:56,686 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 13900, loss[loss=0.08707, beats_loss=0.01243, ecapa_loss=0.0001547, whisper_loss=0.07309, over 19495.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01073, ecapa_loss=0.0001542, whisper_loss=0.09139, over 3879754.26 frames. ], batch size: 82, lr: 3.25e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:35:12,337 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.429e+01 2.660e+01 3.144e+01 1.636e+02, threshold=5.320e+01, percent-clipped=3.0 2024-08-14 16:35:26,709 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2747720.0, ans=0.1 2024-08-14 16:35:32,903 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2747720.0, ans=0.125 2024-08-14 16:35:56,071 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2747920.0, ans=0.0 2024-08-14 16:35:57,505 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2747920.0, ans=0.125 2024-08-14 16:36:09,814 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 13950, loss[loss=0.1013, beats_loss=0.006348, ecapa_loss=0.0001643, whisper_loss=0.09327, over 13987.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.0106, ecapa_loss=0.000155, whisper_loss=0.09206, over 3898364.68 frames. ], batch size: 53, lr: 3.25e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:36:28,306 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.93 vs. limit=15.0 2024-08-14 16:36:30,776 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2748120.0, ans=0.125 2024-08-14 16:36:36,711 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 20 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-14 16:36:58,505 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 18 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-14 16:36:59,979 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 27 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-14 16:37:01,735 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.23 vs. limit=12.0 2024-08-14 16:37:05,196 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 22 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-14 16:37:22,373 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 14000, loss[loss=0.1117, beats_loss=0.01106, ecapa_loss=0.0001743, whisper_loss=0.09893, over 23002.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01068, ecapa_loss=0.0001539, whisper_loss=0.09129, over 3888349.19 frames. ], batch size: 92, lr: 3.25e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:37:31,755 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2748520.0, ans=0.0 2024-08-14 16:37:37,746 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 23 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-14 16:37:38,817 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.942e+01 2.341e+01 2.629e+01 3.019e+01 1.116e+02, threshold=5.259e+01, percent-clipped=1.0 2024-08-14 16:37:46,628 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 36 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-14 16:38:08,046 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2748820.0, ans=0.0 2024-08-14 16:38:25,234 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 15 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-14 16:38:27,135 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2748920.0, ans=0.125 2024-08-14 16:38:35,842 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2749020.0, ans=0.0 2024-08-14 16:38:36,750 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 14050, loss[loss=0.09907, beats_loss=0.009107, ecapa_loss=0.0002031, whisper_loss=0.08793, over 13934.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01071, ecapa_loss=0.0001541, whisper_loss=0.09095, over 3888296.05 frames. ], batch size: 55, lr: 3.25e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:38:41,512 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 20 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-14 16:38:51,753 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 22 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-14 16:38:56,799 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 36 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-14 16:38:57,026 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 16:39:01,023 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 20 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-14 16:39:01,369 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 16:39:06,974 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 25 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-14 16:39:30,121 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-14 16:39:34,990 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.52 vs. limit=10.0 2024-08-14 16:39:40,490 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2749420.0, ans=0.0 2024-08-14 16:39:41,504 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 15 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-14 16:39:47,596 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 23 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-14 16:39:47,862 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2749420.0, ans=0.125 2024-08-14 16:39:47,958 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2749420.0, ans=0.0 2024-08-14 16:39:50,085 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 14100, loss[loss=0.09946, beats_loss=0.01006, ecapa_loss=0.000168, whisper_loss=0.08772, over 15090.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01065, ecapa_loss=0.0001545, whisper_loss=0.09094, over 3876926.23 frames. ], batch size: 59, lr: 3.25e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:40:02,427 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 17 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-14 16:40:05,942 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2749620.0, ans=0.125 2024-08-14 16:40:06,745 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.961e+01 2.359e+01 2.545e+01 2.723e+01 7.272e+01, threshold=5.090e+01, percent-clipped=1.0 2024-08-14 16:40:08,869 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2749620.0, ans=0.5 2024-08-14 16:40:13,463 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2749620.0, ans=0.2 2024-08-14 16:40:20,488 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2749720.0, ans=0.125 2024-08-14 16:40:48,007 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 25 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-14 16:40:54,202 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 33 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-14 16:41:04,221 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 14150, loss[loss=0.1133, beats_loss=0.0105, ecapa_loss=0.0001762, whisper_loss=0.1011, over 21610.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01074, ecapa_loss=0.0001555, whisper_loss=0.09076, over 3861287.25 frames. ], batch size: 91, lr: 3.25e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:41:05,996 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 20 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-14 16:41:19,220 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 28 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-14 16:41:34,284 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2750220.0, ans=0.125 2024-08-14 16:41:36,830 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 18 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-14 16:41:47,689 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2750320.0, ans=0.125 2024-08-14 16:42:05,345 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 15 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-14 16:42:18,185 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 14200, loss[loss=0.1244, beats_loss=0.008352, ecapa_loss=0.0001546, whisper_loss=0.1145, over 16250.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0107, ecapa_loss=0.0001549, whisper_loss=0.09094, over 3871395.75 frames. ], batch size: 60, lr: 3.25e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:42:21,896 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2750520.0, ans=0.125 2024-08-14 16:42:33,716 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2750620.0, ans=0.2 2024-08-14 16:42:34,484 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.745e+01 2.417e+01 2.674e+01 2.957e+01 3.053e+02, threshold=5.348e+01, percent-clipped=2.0 2024-08-14 16:42:36,037 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 22 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-14 16:42:39,310 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2750620.0, ans=0.1 2024-08-14 16:42:44,066 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2750620.0, ans=0.1 2024-08-14 16:42:50,409 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.06 vs. limit=15.0 2024-08-14 16:42:53,166 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2750720.0, ans=0.125 2024-08-14 16:42:53,251 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=2750720.0, ans=0.5 2024-08-14 16:43:06,839 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2750820.0, ans=0.125 2024-08-14 16:43:08,337 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2750820.0, ans=0.2 2024-08-14 16:43:09,167 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 26 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-14 16:43:32,653 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 14250, loss[loss=0.1048, beats_loss=0.01279, ecapa_loss=9.942e-05, whisper_loss=0.091, over 21640.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01079, ecapa_loss=0.0001527, whisper_loss=0.09061, over 3912038.42 frames. ], batch size: 83, lr: 3.25e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:43:48,217 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2751120.0, ans=0.0 2024-08-14 16:44:02,954 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2751220.0, ans=0.125 2024-08-14 16:44:07,480 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2751220.0, ans=0.125 2024-08-14 16:44:15,062 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.98 vs. limit=22.5 2024-08-14 16:44:17,519 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2751320.0, ans=0.125 2024-08-14 16:44:24,305 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 26 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-14 16:44:45,258 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 14300, loss[loss=0.1032, beats_loss=0.009579, ecapa_loss=0.0001425, whisper_loss=0.09224, over 23474.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01075, ecapa_loss=0.0001536, whisper_loss=0.09044, over 3902602.72 frames. ], batch size: 91, lr: 3.25e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:44:52,652 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 24 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-14 16:44:54,032 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-14 16:44:59,809 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 30 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-14 16:45:02,426 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.810e+01 2.444e+01 2.637e+01 2.966e+01 4.430e+01, threshold=5.274e+01, percent-clipped=0.0 2024-08-14 16:45:18,748 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 25 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-14 16:45:20,093 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 25 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-14 16:45:35,770 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2751820.0, ans=0.0 2024-08-14 16:45:40,119 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.83 vs. limit=15.0 2024-08-14 16:45:59,932 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 14350, loss[loss=0.09375, beats_loss=0.01068, ecapa_loss=0.0001453, whisper_loss=0.08162, over 18481.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01073, ecapa_loss=0.0001539, whisper_loss=0.09077, over 3908551.28 frames. ], batch size: 73, lr: 3.24e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:46:08,669 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.89 vs. limit=8.0 2024-08-14 16:46:13,657 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 18 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-14 16:46:25,942 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2752120.0, ans=0.125 2024-08-14 16:46:35,780 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 32 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-14 16:46:36,011 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2752220.0, ans=0.2 2024-08-14 16:46:42,530 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 19 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-14 16:46:42,833 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2752220.0, ans=0.2 2024-08-14 16:46:48,770 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2752320.0, ans=0.125 2024-08-14 16:47:16,531 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 14400, loss[loss=0.1118, beats_loss=0.01102, ecapa_loss=0.0001589, whisper_loss=0.09923, over 22772.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01063, ecapa_loss=0.0001553, whisper_loss=0.09133, over 3912290.83 frames. ], batch size: 93, lr: 3.24e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:47:25,519 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.93 vs. limit=15.0 2024-08-14 16:47:28,675 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.47 vs. limit=22.5 2024-08-14 16:47:34,137 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.965e+01 2.337e+01 2.636e+01 2.855e+01 4.364e+01, threshold=5.273e+01, percent-clipped=0.0 2024-08-14 16:47:36,304 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2752620.0, ans=0.125 2024-08-14 16:47:41,165 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2752620.0, ans=0.125 2024-08-14 16:47:51,090 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 27 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-14 16:48:03,575 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2752820.0, ans=0.2 2024-08-14 16:48:03,686 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2752820.0, ans=0.0 2024-08-14 16:48:16,614 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 18 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-14 16:48:20,221 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2752920.0, ans=0.1 2024-08-14 16:48:29,874 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 33 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-14 16:48:33,106 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 23 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-14 16:48:34,297 INFO [train_multi_KD3.py:1116] (1/4) Epoch 19, batch 14450, loss[loss=0.08953, beats_loss=0.0115, ecapa_loss=0.0001305, whisper_loss=0.07672, over 23040.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01068, ecapa_loss=0.0001561, whisper_loss=0.09058, over 3902556.81 frames. ], batch size: 92, lr: 3.24e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:48:40,674 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2753020.0, ans=0.0 2024-08-14 16:48:52,425 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2753120.0, ans=0.125 2024-08-14 16:49:17,916 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 24 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-14 16:49:26,106 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 16 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-14 16:50:13,403 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 0, loss[loss=0.07931, beats_loss=0.009554, ecapa_loss=0.0001363, whisper_loss=0.0684, over 19105.00 frames. ], tot_loss[loss=0.07931, beats_loss=0.009554, ecapa_loss=0.0001363, whisper_loss=0.0684, over 19105.00 frames. ], batch size: 74, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:50:13,404 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-14 16:50:23,279 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.5818, 4.0224, 4.3261, 4.4934], device='cuda:1') 2024-08-14 16:50:36,190 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.6702, 1.5012, 2.5349, 2.4414], device='cuda:1') 2024-08-14 16:50:50,461 INFO [train_multi_KD3.py:1149] (1/4) Epoch 20, validation on ASR_libri: loss=0.2532, beats_loss=0, ecapa_loss=0.0005431, whisper_loss=0.2478, over 922467.00 frames. 2024-08-14 16:51:07,102 INFO [train_multi_KD3.py:1149] (1/4) Epoch 20, validation on SV_voxceleb1: loss=0.004351, beats_loss=0, ecapa_loss=0.0004351, whisper_loss=0, over 939242.00 frames. 2024-08-14 16:52:53,086 INFO [train_multi_KD3.py:1149] (1/4) Epoch 20, validation on AT_audioset: loss=0.02356, beats_loss=0.02356, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 16:52:53,089 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 30639MB 2024-08-14 16:53:00,992 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2753420.0, ans=0.125 2024-08-14 16:53:42,150 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-14 16:53:46,786 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.701e+01 2.322e+01 2.623e+01 2.945e+01 5.325e+01, threshold=5.246e+01, percent-clipped=1.0 2024-08-14 16:53:50,039 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 26 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-14 16:53:51,156 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2753620.0, ans=0.125 2024-08-14 16:54:07,550 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2753720.0, ans=0.125 2024-08-14 16:54:40,443 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.06 vs. limit=5.0 2024-08-14 16:54:56,929 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 50, loss[loss=0.08503, beats_loss=0.01024, ecapa_loss=0.0001417, whisper_loss=0.07337, over 18008.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.00999, ecapa_loss=0.0001626, whisper_loss=0.08969, over 897662.92 frames. ], batch size: 74, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:55:07,795 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-14 16:55:11,849 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 13 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-14 16:55:40,947 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.85 vs. limit=15.0 2024-08-14 16:55:53,175 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.30 vs. limit=15.0 2024-08-14 16:55:59,717 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.78 vs. limit=6.0 2024-08-14 16:56:52,175 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 100, loss[loss=0.0902, beats_loss=0.009887, ecapa_loss=0.0001479, whisper_loss=0.07883, over 17761.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.009911, ecapa_loss=0.0001574, whisper_loss=0.08867, over 1557832.63 frames. ], batch size: 71, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:56:56,825 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2754420.0, ans=0.1 2024-08-14 16:57:13,550 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2754520.0, ans=0.04949747468305833 2024-08-14 16:57:15,233 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 19 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-14 16:57:17,181 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 16 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-14 16:57:19,892 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.67 vs. limit=15.0 2024-08-14 16:57:38,227 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.993e+01 2.579e+01 2.856e+01 3.069e+01 3.660e+02, threshold=5.711e+01, percent-clipped=1.0 2024-08-14 16:58:04,365 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.61 vs. limit=6.0 2024-08-14 16:58:12,350 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2754720.0, ans=0.125 2024-08-14 16:58:20,072 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 36 from Vox, 27 fro AS 2024-08-14 16:58:24,280 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2754820.0, ans=0.0 2024-08-14 16:58:38,844 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 150, loss[loss=0.1262, beats_loss=0.009256, ecapa_loss=0.000124, whisper_loss=0.1157, over 22194.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.00976, ecapa_loss=0.0001561, whisper_loss=0.08987, over 2080149.51 frames. ], batch size: 83, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:58:56,757 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2754920.0, ans=0.0 2024-08-14 16:58:59,653 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 28 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-14 17:00:03,752 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 200, loss[loss=0.1108, beats_loss=0.01066, ecapa_loss=0.000143, whisper_loss=0.09869, over 20315.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.009966, ecapa_loss=0.0001537, whisper_loss=0.09081, over 2502606.04 frames. ], batch size: 81, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:00:08,160 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 26 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-14 17:00:12,280 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-14 17:00:32,427 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2755520.0, ans=0.2 2024-08-14 17:00:35,831 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2755620.0, ans=0.125 2024-08-14 17:00:36,643 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.015e+01 2.515e+01 2.860e+01 3.195e+01 5.864e+01, threshold=5.719e+01, percent-clipped=1.0 2024-08-14 17:00:37,348 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-14 17:01:00,309 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 16 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-14 17:01:07,083 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2755820.0, ans=0.0 2024-08-14 17:01:13,421 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.32 vs. limit=15.0 2024-08-14 17:01:15,624 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2755820.0, ans=0.1 2024-08-14 17:01:18,200 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 250, loss[loss=0.1064, beats_loss=0.01039, ecapa_loss=0.000127, whisper_loss=0.09478, over 18104.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01015, ecapa_loss=0.0001541, whisper_loss=0.09096, over 2807667.19 frames. ], batch size: 68, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:01:20,025 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2755920.0, ans=0.0 2024-08-14 17:01:34,817 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2756020.0, ans=0.0 2024-08-14 17:01:43,437 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2756020.0, ans=0.2 2024-08-14 17:01:46,075 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 26 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-14 17:01:46,303 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2756120.0, ans=0.0 2024-08-14 17:01:50,253 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 31 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-14 17:01:50,571 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2756120.0, ans=0.125 2024-08-14 17:01:52,033 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 24 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-14 17:02:03,735 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2756220.0, ans=0.0 2024-08-14 17:02:24,692 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2756320.0, ans=0.2 2024-08-14 17:02:30,032 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 300, loss[loss=0.09177, beats_loss=0.01164, ecapa_loss=0.0001418, whisper_loss=0.07871, over 19839.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01034, ecapa_loss=0.0001548, whisper_loss=0.0899, over 3002656.74 frames. ], batch size: 76, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:02:32,826 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 19 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-14 17:02:40,638 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 26 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-14 17:02:40,949 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2756420.0, ans=0.0 2024-08-14 17:02:54,765 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 22 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-14 17:02:57,247 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 19 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-14 17:03:00,060 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.823e+01 2.324e+01 2.572e+01 2.876e+01 1.018e+02, threshold=5.143e+01, percent-clipped=1.0 2024-08-14 17:03:24,331 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 35 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-14 17:03:41,667 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 350, loss[loss=0.08957, beats_loss=0.009674, ecapa_loss=0.0001638, whisper_loss=0.07825, over 15484.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01057, ecapa_loss=0.0001541, whisper_loss=0.08892, over 3201667.85 frames. ], batch size: 60, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:03:47,535 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 32 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-14 17:03:49,484 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 17:03:51,660 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 14 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-14 17:03:59,970 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2757020.0, ans=0.125 2024-08-14 17:04:03,350 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2757020.0, ans=0.0 2024-08-14 17:04:05,118 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=9.17 vs. limit=12.0 2024-08-14 17:04:05,545 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 35 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-14 17:04:29,373 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.51 vs. limit=22.5 2024-08-14 17:04:41,523 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2757320.0, ans=0.125 2024-08-14 17:04:43,029 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.59 vs. limit=15.0 2024-08-14 17:04:52,633 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 400, loss[loss=0.09677, beats_loss=0.01025, ecapa_loss=0.0001335, whisper_loss=0.08518, over 19854.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01055, ecapa_loss=0.0001526, whisper_loss=0.08932, over 3320896.91 frames. ], batch size: 76, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:04:59,718 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 25 from LS+wenet, 31 from Vox, 38 fro AS 2024-08-14 17:05:06,195 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2757520.0, ans=0.0 2024-08-14 17:05:22,153 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-14 17:05:23,255 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.797e+01 2.267e+01 2.550e+01 2.888e+01 2.244e+02, threshold=5.100e+01, percent-clipped=1.0 2024-08-14 17:05:30,097 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 26 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-14 17:05:31,618 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2757620.0, ans=0.125 2024-08-14 17:05:33,984 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 30 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-14 17:05:39,237 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2757720.0, ans=0.0 2024-08-14 17:05:40,154 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 24 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-14 17:05:40,627 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2757720.0, ans=0.1 2024-08-14 17:05:47,863 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2757720.0, ans=0.125 2024-08-14 17:05:59,205 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2757820.0, ans=10.0 2024-08-14 17:06:07,288 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 450, loss[loss=0.09018, beats_loss=0.01198, ecapa_loss=0.0001583, whisper_loss=0.07662, over 20962.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01056, ecapa_loss=0.0001533, whisper_loss=0.08882, over 3431938.68 frames. ], batch size: 86, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:06:12,138 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2757920.0, ans=0.025 2024-08-14 17:06:18,421 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2757920.0, ans=0.125 2024-08-14 17:06:30,253 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 20 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-14 17:06:41,553 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2758120.0, ans=0.0 2024-08-14 17:06:54,148 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2758220.0, ans=0.0 2024-08-14 17:07:09,484 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.83 vs. limit=22.5 2024-08-14 17:07:11,219 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2758320.0, ans=0.125 2024-08-14 17:07:25,006 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2758320.0, ans=0.125 2024-08-14 17:07:29,557 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 500, loss[loss=0.09992, beats_loss=0.009639, ecapa_loss=0.0001498, whisper_loss=0.08878, over 23543.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01047, ecapa_loss=0.0001536, whisper_loss=0.09005, over 3516679.15 frames. ], batch size: 91, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:07:38,538 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 20 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-14 17:07:58,317 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2758520.0, ans=0.125 2024-08-14 17:08:03,989 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.300e+01 2.536e+01 2.836e+01 8.494e+01, threshold=5.071e+01, percent-clipped=3.0 2024-08-14 17:08:08,164 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2758620.0, ans=0.2 2024-08-14 17:08:12,569 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2758620.0, ans=0.125 2024-08-14 17:08:17,221 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2758720.0, ans=0.1 2024-08-14 17:08:20,577 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.85 vs. limit=15.0 2024-08-14 17:08:23,188 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-14 17:08:51,814 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 550, loss[loss=0.1175, beats_loss=0.01039, ecapa_loss=0.0001317, whisper_loss=0.1058, over 23558.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01059, ecapa_loss=0.0001516, whisper_loss=0.08987, over 3608479.71 frames. ], batch size: 89, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:08:52,348 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2758920.0, ans=0.2 2024-08-14 17:09:00,238 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 26 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-14 17:09:04,279 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2758920.0, ans=0.125 2024-08-14 17:09:05,912 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2758920.0, ans=0.04949747468305833 2024-08-14 17:09:42,432 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2759120.0, ans=0.0 2024-08-14 17:09:43,269 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 21 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-14 17:10:18,429 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 600, loss[loss=0.1231, beats_loss=0.008251, ecapa_loss=0.0001439, whisper_loss=0.1134, over 22270.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01054, ecapa_loss=0.0001509, whisper_loss=0.0904, over 3697208.73 frames. ], batch size: 85, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:10:22,696 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2759420.0, ans=0.2 2024-08-14 17:10:26,366 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 19 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-14 17:10:29,087 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 25 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-14 17:10:55,783 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.774e+01 2.286e+01 2.611e+01 2.966e+01 2.824e+02, threshold=5.221e+01, percent-clipped=2.0 2024-08-14 17:11:11,998 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2759720.0, ans=0.125 2024-08-14 17:11:19,102 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2759720.0, ans=0.07 2024-08-14 17:11:26,511 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2759820.0, ans=0.0 2024-08-14 17:11:31,439 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2759820.0, ans=0.125 2024-08-14 17:11:45,485 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 650, loss[loss=0.1164, beats_loss=0.009209, ecapa_loss=0.0001633, whisper_loss=0.1055, over 23553.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01052, ecapa_loss=0.0001512, whisper_loss=0.09075, over 3722139.84 frames. ], batch size: 92, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:11:49,563 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 17 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-14 17:11:55,386 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 17 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-14 17:12:02,591 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2759920.0, ans=0.0 2024-08-14 17:12:23,802 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 29 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-14 17:12:30,711 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 25 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-14 17:12:32,377 WARNING [optim.py:496] (1/4) Scaling gradients by 0.059259023517370224, model_norm_threshold=52.210243225097656 2024-08-14 17:12:32,559 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.11, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.800e+04, grad_sumsq=2.525e+04, orig_rms_sq=3.485e+00 2024-08-14 17:12:34,218 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2760120.0, ans=0.125 2024-08-14 17:12:37,354 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 33 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-14 17:12:43,971 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 22 from LS+wenet, 31 from Vox, 37 fro AS 2024-08-14 17:12:46,008 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2760220.0, ans=0.1 2024-08-14 17:12:54,039 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 25 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-14 17:12:55,438 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2760320.0, ans=0.125 2024-08-14 17:12:57,509 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2760320.0, ans=0.0 2024-08-14 17:13:05,091 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 35 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-14 17:13:11,505 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 700, loss[loss=0.1175, beats_loss=0.008515, ecapa_loss=0.0001885, whisper_loss=0.1071, over 14476.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01041, ecapa_loss=0.0001535, whisper_loss=0.09124, over 3735617.40 frames. ], batch size: 58, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:13:11,809 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 16 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-14 17:13:35,465 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.07 vs. limit=22.5 2024-08-14 17:13:36,880 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.09 vs. limit=10.0 2024-08-14 17:13:39,091 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2760520.0, ans=0.0 2024-08-14 17:13:39,096 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2760520.0, ans=0.125 2024-08-14 17:13:44,043 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2760620.0, ans=0.125 2024-08-14 17:13:46,873 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.684e+01 2.378e+01 2.624e+01 2.914e+01 8.811e+02, threshold=5.248e+01, percent-clipped=3.0 2024-08-14 17:13:47,120 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 28 from LS+wenet, 31 from Vox, 23 fro AS 2024-08-14 17:14:02,120 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.01 vs. limit=15.0 2024-08-14 17:14:10,142 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 20 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-14 17:14:36,170 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 750, loss[loss=0.1023, beats_loss=0.00945, ecapa_loss=0.000188, whisper_loss=0.091, over 15962.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.0104, ecapa_loss=0.0001527, whisper_loss=0.09154, over 3732068.05 frames. ], batch size: 65, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:14:38,955 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2760920.0, ans=0.0 2024-08-14 17:14:42,348 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.62 vs. limit=15.0 2024-08-14 17:14:48,445 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 14 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-14 17:14:50,153 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2760920.0, ans=0.2 2024-08-14 17:15:10,172 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 21 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-14 17:15:24,384 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.21 vs. limit=12.0 2024-08-14 17:15:28,952 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2761220.0, ans=0.125 2024-08-14 17:15:37,226 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2761220.0, ans=0.0 2024-08-14 17:15:38,421 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 26 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-14 17:15:54,282 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2761320.0, ans=0.125 2024-08-14 17:15:56,950 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 32 from LS+wenet, 32 from Vox, 31 fro AS 2024-08-14 17:16:00,673 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 800, loss[loss=0.08935, beats_loss=0.01048, ecapa_loss=0.0001378, whisper_loss=0.07749, over 22411.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01037, ecapa_loss=0.0001531, whisper_loss=0.09117, over 3740237.19 frames. ], batch size: 89, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:16:05,495 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 21 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-14 17:16:14,172 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 19 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-14 17:16:17,925 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2761520.0, ans=0.0 2024-08-14 17:16:33,155 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.798e+01 2.284e+01 2.457e+01 2.814e+01 4.816e+01, threshold=4.915e+01, percent-clipped=0.0 2024-08-14 17:16:33,758 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2761620.0, ans=0.2 2024-08-14 17:16:47,495 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.40 vs. limit=22.5 2024-08-14 17:16:59,403 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 15 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-14 17:17:06,743 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.19 vs. limit=15.0 2024-08-14 17:17:08,705 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.47 vs. limit=22.5 2024-08-14 17:17:18,549 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 850, loss[loss=0.1233, beats_loss=0.01066, ecapa_loss=0.0001733, whisper_loss=0.1109, over 21947.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01041, ecapa_loss=0.0001522, whisper_loss=0.09074, over 3744776.34 frames. ], batch size: 89, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:17:37,089 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2762020.0, ans=0.125 2024-08-14 17:17:46,919 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2762020.0, ans=0.125 2024-08-14 17:17:55,015 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2762120.0, ans=0.125 2024-08-14 17:17:55,101 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2762120.0, ans=0.125 2024-08-14 17:18:12,765 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 21 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-14 17:18:14,454 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 17 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-14 17:18:43,488 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 900, loss[loss=0.06898, beats_loss=0.01172, ecapa_loss=0.0001293, whisper_loss=0.05597, over 15232.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01038, ecapa_loss=0.0001516, whisper_loss=0.09137, over 3768608.85 frames. ], batch size: 58, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:18:49,847 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 19 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-14 17:19:19,688 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2762620.0, ans=0.125 2024-08-14 17:19:21,196 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.817e+01 2.291e+01 2.536e+01 2.780e+01 9.206e+01, threshold=5.071e+01, percent-clipped=1.0 2024-08-14 17:19:26,566 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.36 vs. limit=15.0 2024-08-14 17:19:54,971 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2762820.0, ans=0.125 2024-08-14 17:19:56,241 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 23 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-14 17:20:06,502 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 950, loss[loss=0.09951, beats_loss=0.009612, ecapa_loss=0.0001382, whisper_loss=0.08852, over 17715.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01042, ecapa_loss=0.0001507, whisper_loss=0.09119, over 3787838.19 frames. ], batch size: 67, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:20:43,227 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2763020.0, ans=0.1 2024-08-14 17:20:55,950 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2763120.0, ans=0.1 2024-08-14 17:20:56,073 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2763120.0, ans=0.125 2024-08-14 17:21:05,933 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 25 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-14 17:21:21,913 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 17:21:51,624 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.23 vs. limit=15.0 2024-08-14 17:21:54,383 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 1000, loss[loss=0.1029, beats_loss=0.01045, ecapa_loss=0.0001518, whisper_loss=0.0909, over 16958.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01051, ecapa_loss=0.0001506, whisper_loss=0.08977, over 3771067.66 frames. ], batch size: 64, lr: 3.15e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:22:01,680 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2763420.0, ans=0.2 2024-08-14 17:22:08,331 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 22 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-14 17:22:10,104 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2763420.0, ans=0.2 2024-08-14 17:22:28,379 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2763520.0, ans=0.2 2024-08-14 17:22:28,489 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2763520.0, ans=0.125 2024-08-14 17:22:33,129 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.913e+01 2.269e+01 2.540e+01 2.773e+01 4.748e+01, threshold=5.079e+01, percent-clipped=0.0 2024-08-14 17:23:28,691 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 34 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-14 17:23:36,628 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 1050, loss[loss=0.1121, beats_loss=0.008039, ecapa_loss=0.0001602, whisper_loss=0.1024, over 16465.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01046, ecapa_loss=0.0001508, whisper_loss=0.09055, over 3807104.14 frames. ], batch size: 61, lr: 3.15e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:23:53,722 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2763920.0, ans=0.125 2024-08-14 17:24:12,347 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 24 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-14 17:24:24,942 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 23 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-14 17:24:44,962 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.52 vs. limit=15.0 2024-08-14 17:24:58,792 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.39 vs. limit=22.5 2024-08-14 17:25:22,439 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2764320.0, ans=0.0 2024-08-14 17:25:24,050 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 23 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-14 17:25:36,567 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 1100, loss[loss=0.1112, beats_loss=0.01181, ecapa_loss=0.000151, whisper_loss=0.09789, over 23211.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01043, ecapa_loss=0.0001516, whisper_loss=0.09112, over 3816020.20 frames. ], batch size: 90, lr: 3.15e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:25:59,349 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2764520.0, ans=0.125 2024-08-14 17:26:06,446 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2764520.0, ans=0.2 2024-08-14 17:26:12,313 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-14 17:26:25,028 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2764620.0, ans=0.1 2024-08-14 17:26:29,712 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.924e+01 2.358e+01 2.558e+01 2.908e+01 1.671e+02, threshold=5.116e+01, percent-clipped=1.0 2024-08-14 17:26:30,961 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2764620.0, ans=0.125 2024-08-14 17:26:52,339 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 21 from LS+wenet, 22 from Vox, 15 fro AS 2024-08-14 17:26:56,426 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.99 vs. limit=6.0 2024-08-14 17:27:07,196 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 18 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-14 17:27:39,313 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 1150, loss[loss=0.07313, beats_loss=0.01015, ecapa_loss=0.000151, whisper_loss=0.06148, over 15496.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01051, ecapa_loss=0.000152, whisper_loss=0.09069, over 3822130.09 frames. ], batch size: 61, lr: 3.15e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:28:32,770 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2765120.0, ans=0.0 2024-08-14 17:28:37,573 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.72 vs. limit=6.0 2024-08-14 17:28:44,084 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2765120.0, ans=0.125 2024-08-14 17:29:01,104 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 24 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-14 17:29:20,560 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2765320.0, ans=0.0 2024-08-14 17:29:22,186 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.41 vs. limit=22.5 2024-08-14 17:29:24,231 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 1200, loss[loss=0.07779, beats_loss=0.01047, ecapa_loss=0.0001712, whisper_loss=0.0656, over 16205.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01058, ecapa_loss=0.0001513, whisper_loss=0.09027, over 3840209.94 frames. ], batch size: 64, lr: 3.15e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:29:29,064 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.06 vs. limit=22.5 2024-08-14 17:29:34,504 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 16 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-14 17:29:54,663 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.780e+01 2.420e+01 2.672e+01 3.121e+01 5.638e+01, threshold=5.344e+01, percent-clipped=1.0 2024-08-14 17:30:12,664 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 20 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-14 17:30:38,747 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 1250, loss[loss=0.1313, beats_loss=0.008746, ecapa_loss=0.0001482, whisper_loss=0.1211, over 23462.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01055, ecapa_loss=0.0001511, whisper_loss=0.09047, over 3808796.20 frames. ], batch size: 88, lr: 3.15e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:30:49,063 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2765920.0, ans=0.0 2024-08-14 17:30:53,210 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.17 vs. limit=15.0 2024-08-14 17:31:06,617 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.18 vs. limit=15.0 2024-08-14 17:31:13,938 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 23 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-14 17:31:17,899 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 35 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-14 17:31:28,417 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2766220.0, ans=0.1 2024-08-14 17:31:29,680 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 14 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-14 17:31:31,354 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 17:31:58,373 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 1300, loss[loss=0.1116, beats_loss=0.00966, ecapa_loss=0.0001568, whisper_loss=0.1004, over 15954.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0106, ecapa_loss=0.0001513, whisper_loss=0.08982, over 3792492.06 frames. ], batch size: 62, lr: 3.15e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:32:01,615 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2766420.0, ans=0.1 2024-08-14 17:32:03,816 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.63 vs. limit=15.0 2024-08-14 17:32:08,182 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2766420.0, ans=0.125 2024-08-14 17:32:09,682 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2766420.0, ans=0.0 2024-08-14 17:32:14,898 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.38 vs. limit=22.5 2024-08-14 17:32:17,957 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 19 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-14 17:32:19,303 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 17 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-14 17:32:21,624 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2766520.0, ans=0.09899494936611666 2024-08-14 17:32:21,667 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2766520.0, ans=0.125 2024-08-14 17:32:23,106 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2766520.0, ans=0.2 2024-08-14 17:32:29,276 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2766620.0, ans=0.125 2024-08-14 17:32:31,531 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.917e+01 2.335e+01 2.518e+01 2.895e+01 4.834e+01, threshold=5.035e+01, percent-clipped=0.0 2024-08-14 17:32:39,147 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2766620.0, ans=0.07 2024-08-14 17:32:54,386 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2766720.0, ans=0.1 2024-08-14 17:33:02,007 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 32 from LS+wenet, 30 from Vox, 27 fro AS 2024-08-14 17:33:02,282 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2766820.0, ans=0.1 2024-08-14 17:33:03,765 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 19 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-14 17:33:05,948 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2766820.0, ans=0.0 2024-08-14 17:33:17,797 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 1350, loss[loss=0.09264, beats_loss=0.01144, ecapa_loss=0.0001133, whisper_loss=0.08007, over 15758.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01067, ecapa_loss=0.0001499, whisper_loss=0.09, over 3827208.75 frames. ], batch size: 60, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:33:46,113 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2767020.0, ans=0.125 2024-08-14 17:33:46,207 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.39 vs. limit=15.0 2024-08-14 17:33:53,891 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2767120.0, ans=0.1 2024-08-14 17:34:21,999 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2767220.0, ans=0.125 2024-08-14 17:34:35,808 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-14 17:34:42,736 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 1400, loss[loss=0.09138, beats_loss=0.01003, ecapa_loss=0.0001089, whisper_loss=0.08026, over 16563.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01057, ecapa_loss=0.0001501, whisper_loss=0.09028, over 3810062.96 frames. ], batch size: 59, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:34:44,968 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 16 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-14 17:34:49,995 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2767420.0, ans=0.125 2024-08-14 17:34:55,867 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 29 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-14 17:35:09,837 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2767520.0, ans=0.125 2024-08-14 17:35:11,284 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2767520.0, ans=0.0 2024-08-14 17:35:16,934 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 27 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-14 17:35:18,444 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.871e+01 2.291e+01 2.563e+01 2.822e+01 1.881e+02, threshold=5.126e+01, percent-clipped=2.0 2024-08-14 17:35:22,690 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2767620.0, ans=0.2 2024-08-14 17:35:34,839 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2767720.0, ans=0.1 2024-08-14 17:36:41,313 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 1450, loss[loss=0.1107, beats_loss=0.00745, ecapa_loss=0.0001711, whisper_loss=0.1016, over 15091.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01047, ecapa_loss=0.0001507, whisper_loss=0.09033, over 3786678.43 frames. ], batch size: 58, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:36:54,471 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2767920.0, ans=0.125 2024-08-14 17:36:59,266 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2768020.0, ans=0.0 2024-08-14 17:37:06,041 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 39 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-14 17:37:16,124 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 20 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-14 17:37:16,387 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2768120.0, ans=0.0 2024-08-14 17:37:21,571 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2768120.0, ans=0.125 2024-08-14 17:37:22,914 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2768120.0, ans=0.125 2024-08-14 17:37:50,916 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.45 vs. limit=22.5 2024-08-14 17:37:53,822 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2768320.0, ans=0.125 2024-08-14 17:38:00,141 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 23 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-14 17:38:03,770 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 1500, loss[loss=0.1187, beats_loss=0.009735, ecapa_loss=0.000152, whisper_loss=0.1075, over 22329.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01047, ecapa_loss=0.0001501, whisper_loss=0.09048, over 3772020.02 frames. ], batch size: 89, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:38:11,728 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2768420.0, ans=0.2 2024-08-14 17:38:26,881 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2768520.0, ans=0.125 2024-08-14 17:38:37,798 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.749e+01 2.247e+01 2.495e+01 2.740e+01 8.085e+01, threshold=4.990e+01, percent-clipped=1.0 2024-08-14 17:38:41,906 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=17.53 vs. limit=15.0 2024-08-14 17:38:43,994 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-14 17:38:44,609 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2768620.0, ans=0.125 2024-08-14 17:39:18,941 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.86 vs. limit=22.5 2024-08-14 17:39:26,009 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 1550, loss[loss=0.09865, beats_loss=0.009947, ecapa_loss=0.0001444, whisper_loss=0.08726, over 23159.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01056, ecapa_loss=0.0001493, whisper_loss=0.08944, over 3775427.36 frames. ], batch size: 90, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:39:31,417 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2768920.0, ans=0.125 2024-08-14 17:39:39,630 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2768920.0, ans=0.0 2024-08-14 17:39:42,400 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 18 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-14 17:40:07,954 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2769120.0, ans=0.0 2024-08-14 17:40:11,171 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.46 vs. limit=6.0 2024-08-14 17:40:21,836 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2769220.0, ans=0.125 2024-08-14 17:40:30,501 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2769320.0, ans=0.0 2024-08-14 17:40:36,226 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-14 17:40:45,603 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 1600, loss[loss=0.09371, beats_loss=0.009837, ecapa_loss=0.0001833, whisper_loss=0.08204, over 18442.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01049, ecapa_loss=0.0001489, whisper_loss=0.09045, over 3804083.62 frames. ], batch size: 75, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:41:14,423 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2769520.0, ans=0.2 2024-08-14 17:41:17,986 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.041e+01 2.359e+01 2.603e+01 2.856e+01 4.128e+01, threshold=5.205e+01, percent-clipped=0.0 2024-08-14 17:41:21,444 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-14 17:41:47,738 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 39 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-14 17:41:49,164 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2769820.0, ans=0.0 2024-08-14 17:41:52,870 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 32 from LS+wenet, 30 from Vox, 26 fro AS 2024-08-14 17:42:01,587 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 1650, loss[loss=0.1033, beats_loss=0.009657, ecapa_loss=0.0001623, whisper_loss=0.09203, over 21398.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0105, ecapa_loss=0.000149, whisper_loss=0.09067, over 3829408.15 frames. ], batch size: 85, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:42:03,413 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 18 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-14 17:42:23,611 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 18 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-14 17:42:28,270 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2770020.0, ans=0.04949747468305833 2024-08-14 17:42:31,678 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.50 vs. limit=15.0 2024-08-14 17:42:37,714 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 26 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-14 17:42:50,854 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.53 vs. limit=15.0 2024-08-14 17:42:53,514 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 16 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-14 17:42:56,628 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=8.994e+00 2024-08-14 17:43:00,843 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 16 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-14 17:43:02,633 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2770320.0, ans=0.04949747468305833 2024-08-14 17:43:06,809 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 22 from LS+wenet, 25 from Vox, 21 fro AS 2024-08-14 17:43:14,776 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=2.080e-02 2024-08-14 17:43:18,616 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 1700, loss[loss=0.1197, beats_loss=0.008555, ecapa_loss=0.0001493, whisper_loss=0.1097, over 20903.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01049, ecapa_loss=0.0001482, whisper_loss=0.09097, over 3839492.48 frames. ], batch size: 77, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:43:26,528 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 19 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-14 17:43:51,117 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.801e+01 2.335e+01 2.576e+01 2.933e+01 1.462e+02, threshold=5.153e+01, percent-clipped=1.0 2024-08-14 17:44:05,012 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 34 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-14 17:44:14,040 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2770720.0, ans=0.0 2024-08-14 17:44:32,357 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 13 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-14 17:44:34,713 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 1750, loss[loss=0.119, beats_loss=0.01096, ecapa_loss=0.00011, whisper_loss=0.1069, over 24868.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01051, ecapa_loss=0.0001477, whisper_loss=0.09082, over 3848333.33 frames. ], batch size: 93, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:44:37,000 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2770920.0, ans=0.125 2024-08-14 17:44:39,604 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 14 from LS+wenet, 13 from Vox, 45 fro AS 2024-08-14 17:44:41,462 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 30 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-14 17:45:03,807 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2771120.0, ans=0.2 2024-08-14 17:45:15,144 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2771120.0, ans=0.05 2024-08-14 17:45:39,259 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.73 vs. limit=22.5 2024-08-14 17:45:50,238 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 1800, loss[loss=0.0969, beats_loss=0.01003, ecapa_loss=0.0001466, whisper_loss=0.0854, over 21286.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01048, ecapa_loss=0.0001476, whisper_loss=0.09067, over 3857187.53 frames. ], batch size: 83, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:45:50,888 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2771420.0, ans=0.1 2024-08-14 17:46:00,872 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 23 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-14 17:46:17,153 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2771520.0, ans=0.0 2024-08-14 17:46:17,160 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2771520.0, ans=0.125 2024-08-14 17:46:22,150 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.297e+01 2.564e+01 2.917e+01 8.345e+01, threshold=5.127e+01, percent-clipped=1.0 2024-08-14 17:46:34,019 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2771620.0, ans=0.125 2024-08-14 17:46:42,054 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.83 vs. limit=15.0 2024-08-14 17:46:51,791 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2771820.0, ans=0.0 2024-08-14 17:47:06,287 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 1850, loss[loss=0.09869, beats_loss=0.009073, ecapa_loss=0.0001774, whisper_loss=0.08785, over 16994.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01044, ecapa_loss=0.0001491, whisper_loss=0.09116, over 3854534.66 frames. ], batch size: 67, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:47:38,380 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2772120.0, ans=0.125 2024-08-14 17:47:55,455 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2772220.0, ans=0.0 2024-08-14 17:47:55,528 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2772220.0, ans=0.0 2024-08-14 17:48:07,138 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 20 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-14 17:48:08,904 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.57 vs. limit=22.5 2024-08-14 17:48:17,827 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.90 vs. limit=15.0 2024-08-14 17:48:21,570 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 1900, loss[loss=0.115, beats_loss=0.009315, ecapa_loss=0.0001384, whisper_loss=0.1043, over 16634.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01053, ecapa_loss=0.0001491, whisper_loss=0.0907, over 3831351.14 frames. ], batch size: 63, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:48:25,006 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-14 17:48:25,575 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2772420.0, ans=0.2 2024-08-14 17:48:32,088 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.50 vs. limit=12.0 2024-08-14 17:48:35,060 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.14 vs. limit=6.0 2024-08-14 17:48:35,317 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.74 vs. limit=12.0 2024-08-14 17:48:48,314 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 27 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-14 17:48:53,593 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.23 vs. limit=15.0 2024-08-14 17:48:54,145 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.844e+01 2.272e+01 2.538e+01 2.800e+01 8.979e+01, threshold=5.075e+01, percent-clipped=2.0 2024-08-14 17:48:58,399 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2772620.0, ans=0.0 2024-08-14 17:49:07,272 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 18 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-14 17:49:15,932 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 19 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-14 17:49:23,859 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.34 vs. limit=15.0 2024-08-14 17:49:37,921 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 1950, loss[loss=0.108, beats_loss=0.0103, ecapa_loss=0.000146, whisper_loss=0.09624, over 22197.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01057, ecapa_loss=0.0001486, whisper_loss=0.09014, over 3816169.49 frames. ], batch size: 89, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:49:49,907 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2772920.0, ans=0.1 2024-08-14 17:49:57,563 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2773020.0, ans=0.035 2024-08-14 17:50:12,095 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.119e-02 2024-08-14 17:50:30,916 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2773220.0, ans=0.07 2024-08-14 17:50:33,849 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2773220.0, ans=0.0 2024-08-14 17:50:38,402 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2773220.0, ans=0.05 2024-08-14 17:50:44,651 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2773320.0, ans=0.125 2024-08-14 17:50:47,831 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.84 vs. limit=15.0 2024-08-14 17:50:55,452 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2773420.0, ans=0.125 2024-08-14 17:50:56,319 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 2000, loss[loss=0.1213, beats_loss=0.009155, ecapa_loss=0.0001361, whisper_loss=0.1108, over 17529.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01054, ecapa_loss=0.0001478, whisper_loss=0.09032, over 3817175.86 frames. ], batch size: 66, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:51:19,291 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-14 17:51:23,419 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2773520.0, ans=0.125 2024-08-14 17:51:29,332 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.744e+01 2.380e+01 2.636e+01 2.886e+01 1.186e+02, threshold=5.271e+01, percent-clipped=1.0 2024-08-14 17:51:37,425 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2773620.0, ans=0.0 2024-08-14 17:51:47,298 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.23 vs. limit=15.0 2024-08-14 17:51:58,403 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=2773820.0, ans=0.95 2024-08-14 17:52:02,962 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2773820.0, ans=0.0 2024-08-14 17:52:14,412 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 2050, loss[loss=0.115, beats_loss=0.008427, ecapa_loss=0.0001515, whisper_loss=0.1051, over 18826.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01057, ecapa_loss=0.0001484, whisper_loss=0.09001, over 3842403.14 frames. ], batch size: 70, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:52:29,192 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 17 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-14 17:52:34,101 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.80 vs. limit=15.0 2024-08-14 17:52:43,086 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 22 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-14 17:52:47,759 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2774120.0, ans=0.125 2024-08-14 17:52:48,835 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 18 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-14 17:52:56,843 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 29 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-14 17:53:07,824 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.48 vs. limit=15.0 2024-08-14 17:53:15,954 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 21 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-14 17:53:26,868 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2774320.0, ans=0.0 2024-08-14 17:53:30,575 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 2100, loss[loss=0.1023, beats_loss=0.009452, ecapa_loss=0.0001583, whisper_loss=0.09122, over 15906.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01067, ecapa_loss=0.0001485, whisper_loss=0.08917, over 3823056.79 frames. ], batch size: 63, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:53:31,079 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 17:53:42,186 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.94 vs. limit=15.0 2024-08-14 17:53:46,470 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 19 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-14 17:54:03,468 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.339e+01 2.575e+01 2.832e+01 4.254e+01, threshold=5.150e+01, percent-clipped=0.0 2024-08-14 17:54:12,248 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2774620.0, ans=0.125 2024-08-14 17:54:22,978 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2774720.0, ans=0.125 2024-08-14 17:54:24,425 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2774720.0, ans=0.0 2024-08-14 17:54:26,497 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.35 vs. limit=12.0 2024-08-14 17:54:27,630 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2774720.0, ans=0.125 2024-08-14 17:54:36,948 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2774820.0, ans=0.2 2024-08-14 17:54:38,818 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2774820.0, ans=0.125 2024-08-14 17:54:44,450 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2774820.0, ans=0.125 2024-08-14 17:54:47,724 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2774920.0, ans=0.1 2024-08-14 17:54:48,977 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 2150, loss[loss=0.1076, beats_loss=0.009315, ecapa_loss=0.000152, whisper_loss=0.09677, over 17696.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.0107, ecapa_loss=0.0001492, whisper_loss=0.08926, over 3816765.47 frames. ], batch size: 69, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:55:02,963 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2775020.0, ans=0.2 2024-08-14 17:55:29,885 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.89 vs. limit=15.0 2024-08-14 17:55:32,033 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 24 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-14 17:55:40,291 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2775220.0, ans=0.125 2024-08-14 17:55:44,097 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.25 vs. limit=15.0 2024-08-14 17:55:49,539 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 28 from LS+wenet, 33 from Vox, 32 fro AS 2024-08-14 17:55:58,618 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2775320.0, ans=0.125 2024-08-14 17:55:59,043 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.74 vs. limit=15.0 2024-08-14 17:56:06,064 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 2200, loss[loss=0.106, beats_loss=0.01214, ecapa_loss=0.0001591, whisper_loss=0.09225, over 21865.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01073, ecapa_loss=0.0001494, whisper_loss=0.08944, over 3826355.59 frames. ], batch size: 88, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:56:09,496 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2775420.0, ans=0.125 2024-08-14 17:56:36,800 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.971e+01 2.388e+01 2.686e+01 3.163e+01 6.240e+01, threshold=5.371e+01, percent-clipped=1.0 2024-08-14 17:56:43,698 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2775620.0, ans=0.2 2024-08-14 17:57:04,561 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 33 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-14 17:57:10,858 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 21 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-14 17:57:18,931 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 36 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-14 17:57:19,453 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.70 vs. limit=6.0 2024-08-14 17:57:21,387 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 2250, loss[loss=0.1068, beats_loss=0.01219, ecapa_loss=0.0001283, whisper_loss=0.09335, over 22661.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01074, ecapa_loss=0.0001501, whisper_loss=0.08991, over 3832965.21 frames. ], batch size: 90, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:57:22,212 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2775920.0, ans=0.1 2024-08-14 17:57:27,281 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2775920.0, ans=0.2 2024-08-14 17:57:32,875 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.62 vs. limit=15.0 2024-08-14 17:57:33,984 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2775920.0, ans=0.1 2024-08-14 17:57:40,127 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2776020.0, ans=0.125 2024-08-14 17:57:54,526 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2776120.0, ans=0.125 2024-08-14 17:57:55,765 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2776120.0, ans=0.0 2024-08-14 17:57:58,176 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.39 vs. limit=15.0 2024-08-14 17:58:40,935 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 2300, loss[loss=0.09723, beats_loss=0.01062, ecapa_loss=0.000138, whisper_loss=0.08523, over 17410.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01068, ecapa_loss=0.0001497, whisper_loss=0.09105, over 3877414.86 frames. ], batch size: 69, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:58:41,140 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 19 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-14 17:58:58,386 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.89 vs. limit=12.0 2024-08-14 17:59:07,441 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2776520.0, ans=0.125 2024-08-14 17:59:12,858 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.946e+01 2.401e+01 2.652e+01 3.055e+01 1.168e+02, threshold=5.304e+01, percent-clipped=4.0 2024-08-14 17:59:21,266 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 35 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-14 17:59:40,712 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 14 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-14 17:59:48,877 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2776820.0, ans=0.1 2024-08-14 17:59:57,825 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 2350, loss[loss=0.1149, beats_loss=0.00779, ecapa_loss=0.0002152, whisper_loss=0.105, over 21604.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01059, ecapa_loss=0.0001527, whisper_loss=0.09172, over 3846865.59 frames. ], batch size: 89, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:59:59,118 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.12 vs. limit=15.0 2024-08-14 18:00:06,308 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 21 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-14 18:00:25,184 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2777020.0, ans=0.125 2024-08-14 18:00:26,381 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 20 from LS+wenet, 27 from Vox, 22 fro AS 2024-08-14 18:00:27,004 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2777020.0, ans=0.125 2024-08-14 18:00:43,426 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2777120.0, ans=0.1 2024-08-14 18:00:49,991 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2777220.0, ans=0.125 2024-08-14 18:00:50,443 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.17 vs. limit=15.0 2024-08-14 18:00:53,118 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2777220.0, ans=0.0 2024-08-14 18:01:04,817 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2777320.0, ans=0.0 2024-08-14 18:01:07,890 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 20 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-14 18:01:09,723 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2777320.0, ans=0.125 2024-08-14 18:01:12,993 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2777320.0, ans=0.0 2024-08-14 18:01:18,383 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2777420.0, ans=0.1 2024-08-14 18:01:19,116 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 2400, loss[loss=0.123, beats_loss=0.008655, ecapa_loss=0.0001671, whisper_loss=0.1127, over 22479.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01045, ecapa_loss=0.0001538, whisper_loss=0.09213, over 3867907.24 frames. ], batch size: 90, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:01:43,597 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 25 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-14 18:01:52,701 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.865e+01 2.341e+01 2.588e+01 3.015e+01 2.629e+02, threshold=5.175e+01, percent-clipped=2.0 2024-08-14 18:02:08,078 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 20 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-14 18:02:16,435 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 33 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-14 18:02:20,947 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2777720.0, ans=0.0 2024-08-14 18:02:31,191 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2777820.0, ans=0.1 2024-08-14 18:02:42,126 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 2450, loss[loss=0.09042, beats_loss=0.01618, ecapa_loss=0.0001082, whisper_loss=0.07316, over 23454.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01057, ecapa_loss=0.0001535, whisper_loss=0.09183, over 3890094.97 frames. ], batch size: 94, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:03:28,629 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 19 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-14 18:03:38,177 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2778220.0, ans=0.0 2024-08-14 18:03:44,857 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2778220.0, ans=0.0 2024-08-14 18:03:59,627 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 25 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-14 18:04:01,215 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2778320.0, ans=0.125 2024-08-14 18:04:03,851 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 2500, loss[loss=0.08193, beats_loss=0.01044, ecapa_loss=0.0001816, whisper_loss=0.06967, over 14489.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01059, ecapa_loss=0.0001526, whisper_loss=0.09206, over 3899250.91 frames. ], batch size: 59, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:04:14,460 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2778420.0, ans=0.1 2024-08-14 18:04:27,519 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-14 18:04:32,560 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-14 18:04:39,418 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.915e+01 2.393e+01 2.682e+01 2.958e+01 4.919e+01, threshold=5.365e+01, percent-clipped=0.0 2024-08-14 18:04:39,686 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 24 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-14 18:04:46,405 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.97 vs. limit=22.5 2024-08-14 18:04:58,629 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2778720.0, ans=0.125 2024-08-14 18:05:02,215 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.00 vs. limit=15.0 2024-08-14 18:05:24,973 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 2550, loss[loss=0.1052, beats_loss=0.009715, ecapa_loss=0.0001534, whisper_loss=0.09394, over 19817.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01053, ecapa_loss=0.0001526, whisper_loss=0.09254, over 3893729.81 frames. ], batch size: 76, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:05:34,024 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 23 from LS+wenet, 30 from Vox, 26 fro AS 2024-08-14 18:05:36,985 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 28 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-14 18:05:43,714 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2779020.0, ans=0.125 2024-08-14 18:05:46,837 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2779020.0, ans=0.125 2024-08-14 18:05:52,713 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-14 18:06:02,070 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 28 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-14 18:06:03,102 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.54 vs. limit=10.0 2024-08-14 18:06:06,309 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.31 vs. limit=10.0 2024-08-14 18:06:22,808 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2779220.0, ans=0.125 2024-08-14 18:06:22,863 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2779220.0, ans=0.125 2024-08-14 18:06:26,891 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 26 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-14 18:06:30,104 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2779320.0, ans=0.125 2024-08-14 18:06:38,373 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2779320.0, ans=0.0 2024-08-14 18:06:46,534 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 2600, loss[loss=0.1105, beats_loss=0.009347, ecapa_loss=0.0001328, whisper_loss=0.09979, over 15706.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.0105, ecapa_loss=0.0001532, whisper_loss=0.09277, over 3891574.49 frames. ], batch size: 59, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:06:53,773 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2779420.0, ans=0.1 2024-08-14 18:07:03,905 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2779520.0, ans=0.0 2024-08-14 18:07:05,657 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2779520.0, ans=0.0 2024-08-14 18:07:05,665 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2779520.0, ans=0.125 2024-08-14 18:07:08,265 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 21 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-14 18:07:21,163 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.268e+01 2.541e+01 2.782e+01 4.582e+01, threshold=5.082e+01, percent-clipped=0.0 2024-08-14 18:07:21,318 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-14 18:07:34,888 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 19 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-14 18:07:38,834 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2779720.0, ans=0.1 2024-08-14 18:07:50,170 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2779720.0, ans=0.125 2024-08-14 18:07:50,209 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2779720.0, ans=0.125 2024-08-14 18:07:53,026 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2779820.0, ans=0.125 2024-08-14 18:07:54,296 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 16 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-14 18:08:05,387 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2779820.0, ans=0.1 2024-08-14 18:08:07,905 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 2650, loss[loss=0.09588, beats_loss=0.01169, ecapa_loss=0.0001698, whisper_loss=0.08249, over 19463.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01055, ecapa_loss=0.0001525, whisper_loss=0.09269, over 3889948.79 frames. ], batch size: 81, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:08:10,152 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.77 vs. limit=15.0 2024-08-14 18:08:15,849 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2779920.0, ans=0.09899494936611666 2024-08-14 18:08:17,532 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 31 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-14 18:08:27,065 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2780020.0, ans=0.2 2024-08-14 18:08:38,617 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 18 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-14 18:09:29,942 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 2700, loss[loss=0.1024, beats_loss=0.01312, ecapa_loss=0.0001347, whisper_loss=0.08793, over 20928.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01061, ecapa_loss=0.0001527, whisper_loss=0.09176, over 3890621.03 frames. ], batch size: 84, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:09:31,478 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-14 18:09:37,436 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.50 vs. limit=5.0 2024-08-14 18:09:38,168 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.692e-02 2024-08-14 18:09:42,790 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2780420.0, ans=0.0 2024-08-14 18:09:49,548 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2780520.0, ans=0.0 2024-08-14 18:10:03,551 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.995e+01 2.336e+01 2.550e+01 2.927e+01 5.134e+01, threshold=5.101e+01, percent-clipped=1.0 2024-08-14 18:10:03,822 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 24 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-14 18:10:15,358 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2780620.0, ans=0.1 2024-08-14 18:10:29,864 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 25 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-14 18:10:44,182 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2780820.0, ans=0.125 2024-08-14 18:10:49,464 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2780820.0, ans=0.125 2024-08-14 18:10:52,248 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 2750, loss[loss=0.08855, beats_loss=0.01052, ecapa_loss=0.0001249, whisper_loss=0.07678, over 17829.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01068, ecapa_loss=0.0001517, whisper_loss=0.09094, over 3884744.61 frames. ], batch size: 67, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:10:52,459 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 35 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-14 18:10:55,033 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2780920.0, ans=0.2 2024-08-14 18:11:01,603 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 25 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-14 18:11:14,780 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2781020.0, ans=0.2 2024-08-14 18:11:15,790 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 30 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-14 18:11:21,580 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 18 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-14 18:11:30,421 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2781120.0, ans=0.035 2024-08-14 18:11:40,693 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 20 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-14 18:11:53,008 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2781220.0, ans=0.125 2024-08-14 18:11:54,438 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2781220.0, ans=0.125 2024-08-14 18:12:11,009 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2781320.0, ans=0.125 2024-08-14 18:12:16,183 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 2800, loss[loss=0.1288, beats_loss=0.007758, ecapa_loss=0.0001459, whisper_loss=0.1195, over 16485.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01061, ecapa_loss=0.0001519, whisper_loss=0.09092, over 3863581.58 frames. ], batch size: 61, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:12:17,706 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 22 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-14 18:12:33,664 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=2781520.0, ans=0.05 2024-08-14 18:12:41,228 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.73 vs. limit=15.0 2024-08-14 18:12:45,776 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-14 18:12:48,049 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.654e+01 2.377e+01 2.677e+01 2.938e+01 4.458e+01, threshold=5.354e+01, percent-clipped=0.0 2024-08-14 18:13:09,568 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-14 18:13:21,344 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2781820.0, ans=0.125 2024-08-14 18:13:21,435 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2781820.0, ans=0.1 2024-08-14 18:13:25,126 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 29 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-14 18:13:33,041 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.13 vs. limit=22.5 2024-08-14 18:13:33,322 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 2850, loss[loss=0.1058, beats_loss=0.01078, ecapa_loss=0.0001609, whisper_loss=0.09338, over 16879.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01064, ecapa_loss=0.000153, whisper_loss=0.09094, over 3865618.74 frames. ], batch size: 69, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:13:40,394 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.03 vs. limit=15.0 2024-08-14 18:13:43,726 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 26 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-14 18:13:48,214 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2782020.0, ans=0.2 2024-08-14 18:14:07,940 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2782120.0, ans=0.125 2024-08-14 18:14:13,028 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 24 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-14 18:14:21,810 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 19 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-14 18:14:41,028 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 30 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-14 18:14:48,072 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 2900, loss[loss=0.07151, beats_loss=0.01015, ecapa_loss=0.0001503, whisper_loss=0.05986, over 14737.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0107, ecapa_loss=0.0001525, whisper_loss=0.09011, over 3831214.92 frames. ], batch size: 60, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:14:48,300 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 32 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-14 18:14:52,493 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 25 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-14 18:15:18,365 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.302e+01 2.501e+01 2.806e+01 3.501e+01, threshold=5.003e+01, percent-clipped=0.0 2024-08-14 18:15:20,599 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2782620.0, ans=0.0 2024-08-14 18:15:29,769 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2782620.0, ans=0.125 2024-08-14 18:15:36,575 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2782720.0, ans=0.0 2024-08-14 18:15:56,981 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2782820.0, ans=0.07 2024-08-14 18:16:01,847 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 20 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-14 18:16:02,902 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.28 vs. limit=22.5 2024-08-14 18:16:03,367 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 2950, loss[loss=0.08806, beats_loss=0.01188, ecapa_loss=0.0001601, whisper_loss=0.07458, over 18696.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01059, ecapa_loss=0.0001555, whisper_loss=0.0912, over 3857555.82 frames. ], batch size: 77, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:16:03,613 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 33 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-14 18:16:10,419 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 12 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-14 18:16:13,245 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2782920.0, ans=0.2 2024-08-14 18:16:14,674 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 24 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-14 18:16:18,722 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 19 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-14 18:16:34,234 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 23 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-14 18:16:43,082 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.35 vs. limit=15.0 2024-08-14 18:17:01,075 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2783220.0, ans=10.0 2024-08-14 18:17:01,921 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 18 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-14 18:17:18,122 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 3000, loss[loss=0.09095, beats_loss=0.01314, ecapa_loss=0.0001281, whisper_loss=0.07653, over 20919.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01066, ecapa_loss=0.0001555, whisper_loss=0.09071, over 3890347.49 frames. ], batch size: 85, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:17:18,123 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-14 18:17:58,834 INFO [train_multi_KD3.py:1149] (1/4) Epoch 20, validation on ASR_libri: loss=0.2511, beats_loss=0, ecapa_loss=0.0005401, whisper_loss=0.2457, over 922467.00 frames. 2024-08-14 18:18:19,533 INFO [train_multi_KD3.py:1149] (1/4) Epoch 20, validation on SV_voxceleb1: loss=0.004329, beats_loss=0, ecapa_loss=0.0004329, whisper_loss=0, over 939242.00 frames. 2024-08-14 18:20:16,581 INFO [train_multi_KD3.py:1149] (1/4) Epoch 20, validation on AT_audioset: loss=0.02338, beats_loss=0.02338, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 18:20:16,585 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 30639MB 2024-08-14 18:20:27,418 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 19 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-14 18:20:48,254 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.794e+01 2.403e+01 2.631e+01 2.938e+01 2.975e+02, threshold=5.261e+01, percent-clipped=1.0 2024-08-14 18:20:51,589 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2783620.0, ans=0.0 2024-08-14 18:21:03,650 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2783720.0, ans=0.05 2024-08-14 18:21:30,846 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 3050, loss[loss=0.09839, beats_loss=0.009673, ecapa_loss=0.0001918, whisper_loss=0.0868, over 20172.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01067, ecapa_loss=0.0001552, whisper_loss=0.0913, over 3893043.22 frames. ], batch size: 86, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:21:31,932 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.63 vs. limit=15.0 2024-08-14 18:21:37,136 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 33 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-14 18:21:37,324 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2783920.0, ans=0.125 2024-08-14 18:21:37,329 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=2783920.0, ans=0.95 2024-08-14 18:21:46,363 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 26 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-14 18:21:56,985 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-14 18:22:09,344 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2784120.0, ans=0.0 2024-08-14 18:22:16,639 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2784220.0, ans=0.125 2024-08-14 18:22:19,312 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2784220.0, ans=0.125 2024-08-14 18:22:19,359 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2784220.0, ans=0.125 2024-08-14 18:22:27,065 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 17 from LS+wenet, 10 from Vox, 30 fro AS 2024-08-14 18:22:30,262 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2784320.0, ans=0.5 2024-08-14 18:22:35,498 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 22 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-14 18:22:42,364 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2784320.0, ans=0.125 2024-08-14 18:22:43,787 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2784320.0, ans=0.0 2024-08-14 18:22:46,336 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 3100, loss[loss=0.118, beats_loss=0.01082, ecapa_loss=0.0001059, whisper_loss=0.1061, over 19276.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01069, ecapa_loss=0.0001553, whisper_loss=0.09115, over 3889530.68 frames. ], batch size: 73, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:23:01,030 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=2784520.0, ans=0.2 2024-08-14 18:23:03,501 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 25 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-14 18:23:16,106 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.964e+01 2.365e+01 2.545e+01 2.848e+01 4.706e+01, threshold=5.089e+01, percent-clipped=0.0 2024-08-14 18:23:22,173 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2784620.0, ans=0.125 2024-08-14 18:23:25,170 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 35 from LS+wenet, 14 from Vox, 43 fro AS 2024-08-14 18:23:29,462 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-14 18:23:34,678 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 29 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-14 18:23:35,336 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2784720.0, ans=0.125 2024-08-14 18:23:35,363 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2784720.0, ans=0.125 2024-08-14 18:23:46,335 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 18:23:51,767 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2784820.0, ans=0.2 2024-08-14 18:23:56,749 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 3150, loss[loss=0.1322, beats_loss=0.006704, ecapa_loss=0.0001774, whisper_loss=0.1238, over 16776.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01066, ecapa_loss=0.0001545, whisper_loss=0.09214, over 3895546.85 frames. ], batch size: 64, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:23:58,750 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2784920.0, ans=0.1 2024-08-14 18:24:13,873 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-14 18:24:27,510 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2785120.0, ans=0.0 2024-08-14 18:24:32,767 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2785120.0, ans=0.0 2024-08-14 18:24:45,204 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2785220.0, ans=0.0 2024-08-14 18:24:45,208 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2785220.0, ans=0.125 2024-08-14 18:24:49,080 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 18 from LS+wenet, 25 from Vox, 20 fro AS 2024-08-14 18:24:52,005 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2785320.0, ans=0.125 2024-08-14 18:24:53,358 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2785320.0, ans=0.0 2024-08-14 18:25:06,059 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 3200, loss[loss=0.106, beats_loss=0.01209, ecapa_loss=0.0001341, whisper_loss=0.09255, over 19530.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01068, ecapa_loss=0.0001539, whisper_loss=0.09173, over 3888134.71 frames. ], batch size: 76, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:25:09,746 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.10 vs. limit=22.5 2024-08-14 18:25:18,008 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.79 vs. limit=15.0 2024-08-14 18:25:25,939 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 28 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-14 18:25:34,558 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.500e+00 2024-08-14 18:25:35,320 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.274e+01 2.528e+01 2.834e+01 7.598e+01, threshold=5.056e+01, percent-clipped=2.0 2024-08-14 18:25:40,039 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2785620.0, ans=0.125 2024-08-14 18:25:45,352 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 21 from LS+wenet, 17 from Vox, 16 fro AS 2024-08-14 18:25:49,797 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2785720.0, ans=0.2 2024-08-14 18:25:51,028 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2785720.0, ans=0.125 2024-08-14 18:25:59,038 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.518e-03 2024-08-14 18:26:04,993 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.33 vs. limit=15.0 2024-08-14 18:26:15,108 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 3250, loss[loss=0.1184, beats_loss=0.005687, ecapa_loss=0.000168, whisper_loss=0.111, over 15799.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.0106, ecapa_loss=0.0001542, whisper_loss=0.09179, over 3847447.03 frames. ], batch size: 58, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:26:39,938 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2786020.0, ans=0.0 2024-08-14 18:27:22,310 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 3300, loss[loss=0.07149, beats_loss=0.0126, ecapa_loss=0.0001568, whisper_loss=0.05732, over 16040.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01067, ecapa_loss=0.0001537, whisper_loss=0.09122, over 3848755.12 frames. ], batch size: 65, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:27:27,719 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.05 vs. limit=15.0 2024-08-14 18:27:33,481 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.57 vs. limit=15.0 2024-08-14 18:27:49,192 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2786620.0, ans=0.125 2024-08-14 18:27:51,320 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.770e+01 2.316e+01 2.463e+01 2.771e+01 4.814e+01, threshold=4.926e+01, percent-clipped=0.0 2024-08-14 18:27:52,091 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.97 vs. limit=15.0 2024-08-14 18:27:57,504 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2786620.0, ans=0.125 2024-08-14 18:28:05,167 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2786720.0, ans=0.125 2024-08-14 18:28:25,322 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2786820.0, ans=0.125 2024-08-14 18:28:30,073 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 3350, loss[loss=0.1147, beats_loss=0.01158, ecapa_loss=0.0001467, whisper_loss=0.1016, over 18485.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01084, ecapa_loss=0.0001525, whisper_loss=0.09124, over 3880029.92 frames. ], batch size: 75, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:28:30,906 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.82 vs. limit=15.0 2024-08-14 18:28:52,079 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.04 vs. limit=10.0 2024-08-14 18:28:58,403 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 24 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-14 18:29:04,035 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2787120.0, ans=0.1 2024-08-14 18:29:11,200 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2787220.0, ans=0.125 2024-08-14 18:29:31,868 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 22 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-14 18:29:39,670 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 3400, loss[loss=0.09727, beats_loss=0.01067, ecapa_loss=0.0001476, whisper_loss=0.08512, over 19034.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01078, ecapa_loss=0.0001529, whisper_loss=0.09106, over 3890951.00 frames. ], batch size: 78, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:29:55,080 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2787520.0, ans=0.0 2024-08-14 18:30:07,952 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.775e+01 2.360e+01 2.659e+01 3.040e+01 2.409e+02, threshold=5.318e+01, percent-clipped=1.0 2024-08-14 18:30:18,002 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 20 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-14 18:30:34,669 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 19 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-14 18:30:40,476 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2787820.0, ans=0.125 2024-08-14 18:30:48,134 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 3450, loss[loss=0.09358, beats_loss=0.01238, ecapa_loss=0.0001626, whisper_loss=0.07958, over 21746.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01076, ecapa_loss=0.0001536, whisper_loss=0.09016, over 3883768.26 frames. ], batch size: 94, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:30:48,594 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2787920.0, ans=0.125 2024-08-14 18:30:59,097 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2787920.0, ans=0.2 2024-08-14 18:31:01,884 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2788020.0, ans=0.2 2024-08-14 18:31:07,006 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.87 vs. limit=22.5 2024-08-14 18:31:10,856 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2788020.0, ans=0.125 2024-08-14 18:31:30,234 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.71 vs. limit=15.0 2024-08-14 18:31:30,831 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 17 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-14 18:31:39,406 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2788220.0, ans=0.0 2024-08-14 18:31:41,968 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2788320.0, ans=0.1 2024-08-14 18:31:42,281 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.60 vs. limit=15.0 2024-08-14 18:31:55,062 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 3500, loss[loss=0.0983, beats_loss=0.01247, ecapa_loss=0.0001252, whisper_loss=0.08459, over 22484.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0107, ecapa_loss=0.0001531, whisper_loss=0.09055, over 3876316.53 frames. ], batch size: 90, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:32:20,785 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 25 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-14 18:32:23,469 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.371e+01 2.585e+01 2.886e+01 6.376e+01, threshold=5.170e+01, percent-clipped=1.0 2024-08-14 18:32:24,195 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2788620.0, ans=0.0 2024-08-14 18:32:25,105 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 17 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-14 18:32:32,003 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 20 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-14 18:32:36,055 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 27 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-14 18:32:45,774 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 21 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-14 18:32:51,218 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2788820.0, ans=0.125 2024-08-14 18:32:52,787 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2788820.0, ans=0.0 2024-08-14 18:33:03,394 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 3550, loss[loss=0.1077, beats_loss=0.01012, ecapa_loss=0.0001612, whisper_loss=0.096, over 22472.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01066, ecapa_loss=0.0001531, whisper_loss=0.09077, over 3888981.90 frames. ], batch size: 89, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:33:14,300 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 22 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-14 18:33:38,840 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2789120.0, ans=0.125 2024-08-14 18:33:40,521 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.31 vs. limit=22.5 2024-08-14 18:33:52,454 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 16 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-14 18:33:53,750 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 19 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-14 18:33:57,940 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2789320.0, ans=0.1 2024-08-14 18:34:03,484 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-14 18:34:11,512 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 3600, loss[loss=0.09228, beats_loss=0.01343, ecapa_loss=0.000162, whisper_loss=0.07723, over 22354.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01079, ecapa_loss=0.0001531, whisper_loss=0.09006, over 3916938.40 frames. ], batch size: 93, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:34:27,355 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.22 vs. limit=12.0 2024-08-14 18:34:32,276 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 27 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-14 18:34:36,145 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-14 18:34:39,593 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.737e+01 2.323e+01 2.540e+01 2.892e+01 4.287e+01, threshold=5.080e+01, percent-clipped=0.0 2024-08-14 18:34:52,731 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2789720.0, ans=0.0 2024-08-14 18:34:54,352 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.29 vs. limit=22.5 2024-08-14 18:34:56,790 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2789720.0, ans=0.0 2024-08-14 18:34:58,211 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2789720.0, ans=0.1 2024-08-14 18:35:03,802 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2789720.0, ans=0.125 2024-08-14 18:35:08,882 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 21 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-14 18:35:12,146 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2789820.0, ans=0.125 2024-08-14 18:35:19,219 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 3650, loss[loss=0.128, beats_loss=0.007162, ecapa_loss=0.000161, whisper_loss=0.1193, over 14953.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01069, ecapa_loss=0.0001539, whisper_loss=0.09076, over 3890868.62 frames. ], batch size: 58, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:35:22,632 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.61 vs. limit=15.0 2024-08-14 18:35:33,435 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2790020.0, ans=0.0 2024-08-14 18:35:40,045 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2790020.0, ans=0.125 2024-08-14 18:35:42,161 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-14 18:35:49,653 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.28 vs. limit=22.5 2024-08-14 18:35:50,741 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.91 vs. limit=15.0 2024-08-14 18:36:01,276 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-14 18:36:10,598 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2790220.0, ans=0.2 2024-08-14 18:36:17,355 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 16 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-14 18:36:26,219 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 3700, loss[loss=0.09289, beats_loss=0.01302, ecapa_loss=0.0001564, whisper_loss=0.07831, over 20473.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0107, ecapa_loss=0.0001548, whisper_loss=0.09019, over 3880005.00 frames. ], batch size: 87, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:36:32,191 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2790420.0, ans=0.125 2024-08-14 18:36:54,538 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.766e+01 2.282e+01 2.542e+01 2.895e+01 4.405e+01, threshold=5.084e+01, percent-clipped=0.0 2024-08-14 18:37:06,297 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.68 vs. limit=22.5 2024-08-14 18:37:06,889 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 20 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-14 18:37:27,226 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2790820.0, ans=0.0 2024-08-14 18:37:28,144 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 15 from Vox, 48 fro AS 2024-08-14 18:37:33,786 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 3750, loss[loss=0.1113, beats_loss=0.009028, ecapa_loss=0.0001887, whisper_loss=0.1004, over 18171.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01077, ecapa_loss=0.0001528, whisper_loss=0.09054, over 3881809.71 frames. ], batch size: 73, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:37:38,413 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2790920.0, ans=0.1 2024-08-14 18:37:42,329 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2790920.0, ans=0.1 2024-08-14 18:37:54,604 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2791020.0, ans=0.125 2024-08-14 18:38:15,777 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=13.79 vs. limit=15.0 2024-08-14 18:38:25,702 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2791220.0, ans=0.125 2024-08-14 18:38:32,390 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2791320.0, ans=0.1 2024-08-14 18:38:41,674 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 3800, loss[loss=0.08564, beats_loss=0.01025, ecapa_loss=0.0001718, whisper_loss=0.07366, over 17251.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01076, ecapa_loss=0.0001538, whisper_loss=0.09025, over 3871994.86 frames. ], batch size: 70, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:38:49,768 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 23 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-14 18:38:51,352 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2791420.0, ans=0.0 2024-08-14 18:38:53,043 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.37 vs. limit=15.0 2024-08-14 18:38:57,877 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2791520.0, ans=0.09899494936611666 2024-08-14 18:39:00,543 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2791520.0, ans=0.125 2024-08-14 18:39:02,238 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.12 vs. limit=15.0 2024-08-14 18:39:05,983 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2791520.0, ans=0.1 2024-08-14 18:39:09,488 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.945e+01 2.383e+01 2.672e+01 2.913e+01 4.805e+01, threshold=5.345e+01, percent-clipped=0.0 2024-08-14 18:39:20,909 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2791720.0, ans=0.125 2024-08-14 18:39:23,677 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2791720.0, ans=0.125 2024-08-14 18:39:24,190 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.40 vs. limit=6.0 2024-08-14 18:39:33,732 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 27 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-14 18:39:40,950 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2791820.0, ans=0.125 2024-08-14 18:39:48,309 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 3850, loss[loss=0.1174, beats_loss=0.008858, ecapa_loss=0.0001467, whisper_loss=0.107, over 23367.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01082, ecapa_loss=0.000154, whisper_loss=0.08997, over 3878192.28 frames. ], batch size: 88, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:39:55,325 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 35 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-14 18:39:57,221 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.14 vs. limit=22.5 2024-08-14 18:40:16,611 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 25 from LS+wenet, 24 from Vox, 20 fro AS 2024-08-14 18:40:22,411 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-14 18:40:31,811 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2792220.0, ans=0.0 2024-08-14 18:40:51,151 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.90 vs. limit=15.0 2024-08-14 18:40:55,863 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 3900, loss[loss=0.1266, beats_loss=0.007598, ecapa_loss=0.0001499, whisper_loss=0.1175, over 17454.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01072, ecapa_loss=0.0001541, whisper_loss=0.09128, over 3877674.09 frames. ], batch size: 64, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:40:57,479 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 23 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-14 18:41:06,144 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2792420.0, ans=0.125 2024-08-14 18:41:07,886 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.36 vs. limit=15.0 2024-08-14 18:41:18,099 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2792520.0, ans=0.09899494936611666 2024-08-14 18:41:24,895 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.416e+01 2.719e+01 3.088e+01 3.540e+02, threshold=5.437e+01, percent-clipped=1.0 2024-08-14 18:41:25,666 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2792620.0, ans=0.0 2024-08-14 18:41:33,080 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-14 18:41:43,002 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2792720.0, ans=0.0 2024-08-14 18:41:49,447 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 30 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-14 18:41:56,571 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2792820.0, ans=0.125 2024-08-14 18:41:59,307 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.24 vs. limit=15.0 2024-08-14 18:42:03,887 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 3950, loss[loss=0.1364, beats_loss=0.008595, ecapa_loss=0.0001371, whisper_loss=0.1264, over 19960.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01061, ecapa_loss=0.000156, whisper_loss=0.09216, over 3884628.30 frames. ], batch size: 72, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:42:04,315 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2792920.0, ans=0.1 2024-08-14 18:42:07,156 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2792920.0, ans=0.0 2024-08-14 18:42:09,382 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 21 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-14 18:42:10,614 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 25 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-14 18:42:13,669 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2792920.0, ans=0.125 2024-08-14 18:42:14,922 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2792920.0, ans=0.125 2024-08-14 18:42:17,996 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.52 vs. limit=15.0 2024-08-14 18:42:19,654 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-14 18:42:26,687 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 21 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-14 18:42:29,500 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2793120.0, ans=0.1 2024-08-14 18:42:34,523 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 18 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-14 18:43:00,170 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 21 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-14 18:43:10,698 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 4000, loss[loss=0.09829, beats_loss=0.009982, ecapa_loss=0.0001762, whisper_loss=0.08655, over 22314.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01059, ecapa_loss=0.0001556, whisper_loss=0.09209, over 3909217.43 frames. ], batch size: 95, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:43:11,142 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2793420.0, ans=0.1 2024-08-14 18:43:12,033 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 23 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-14 18:43:19,308 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2793420.0, ans=0.0 2024-08-14 18:43:36,181 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.89 vs. limit=12.0 2024-08-14 18:43:39,527 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.723e+01 2.386e+01 2.659e+01 3.102e+01 4.594e+01, threshold=5.318e+01, percent-clipped=0.0 2024-08-14 18:43:40,999 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 21 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-14 18:43:49,572 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 19 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-14 18:43:50,060 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2793620.0, ans=0.1 2024-08-14 18:43:55,707 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2793720.0, ans=0.125 2024-08-14 18:44:12,717 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=2793820.0, ans=15.0 2024-08-14 18:44:19,639 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 4050, loss[loss=0.1042, beats_loss=0.009694, ecapa_loss=0.0001655, whisper_loss=0.0929, over 15721.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01061, ecapa_loss=0.0001555, whisper_loss=0.09122, over 3859614.41 frames. ], batch size: 63, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:44:20,227 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2793920.0, ans=0.125 2024-08-14 18:44:24,049 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2793920.0, ans=0.125 2024-08-14 18:44:40,681 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.39 vs. limit=10.0 2024-08-14 18:44:43,434 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.52 vs. limit=6.0 2024-08-14 18:44:52,746 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2794120.0, ans=0.1 2024-08-14 18:44:56,531 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 33 from LS+wenet, 32 from Vox, 29 fro AS 2024-08-14 18:45:12,992 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 26 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-14 18:45:27,658 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 4100, loss[loss=0.1026, beats_loss=0.009374, ecapa_loss=0.0002047, whisper_loss=0.09116, over 14399.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01058, ecapa_loss=0.0001558, whisper_loss=0.09176, over 3874165.71 frames. ], batch size: 61, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:45:49,040 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2794520.0, ans=0.2 2024-08-14 18:45:57,075 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.054e+01 2.354e+01 2.603e+01 2.918e+01 6.130e+01, threshold=5.207e+01, percent-clipped=1.0 2024-08-14 18:46:23,190 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2794820.0, ans=0.125 2024-08-14 18:46:36,141 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 4150, loss[loss=0.08743, beats_loss=0.01211, ecapa_loss=0.0001176, whisper_loss=0.07414, over 22898.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01062, ecapa_loss=0.0001549, whisper_loss=0.09198, over 3882080.40 frames. ], batch size: 90, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:46:43,768 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.90 vs. limit=22.5 2024-08-14 18:46:46,101 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2794920.0, ans=0.1 2024-08-14 18:47:03,811 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.09 vs. limit=15.0 2024-08-14 18:47:12,708 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 24 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-14 18:47:14,305 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 22 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-14 18:47:14,863 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten.whitening_limit, batch_count=2795120.0, ans=15.0 2024-08-14 18:47:17,224 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 27 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-14 18:47:22,596 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 17 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-14 18:47:26,777 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2795220.0, ans=0.125 2024-08-14 18:47:29,435 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-14 18:47:30,754 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2795320.0, ans=0.125 2024-08-14 18:47:44,035 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 4200, loss[loss=0.1046, beats_loss=0.01177, ecapa_loss=0.0001308, whisper_loss=0.09154, over 16038.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01065, ecapa_loss=0.0001543, whisper_loss=0.09166, over 3895034.90 frames. ], batch size: 62, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:47:49,565 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 25 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-14 18:47:53,711 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 19 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-14 18:47:59,818 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.34 vs. limit=10.0 2024-08-14 18:48:02,057 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 23 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-14 18:48:12,379 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.893e+01 2.416e+01 2.672e+01 2.930e+01 6.892e+01, threshold=5.345e+01, percent-clipped=1.0 2024-08-14 18:48:14,240 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2795620.0, ans=0.1 2024-08-14 18:48:14,657 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.07 vs. limit=15.0 2024-08-14 18:48:26,778 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2795720.0, ans=0.1 2024-08-14 18:48:33,759 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.32 vs. limit=10.0 2024-08-14 18:48:47,017 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 17 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-14 18:48:47,390 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2795820.0, ans=0.1 2024-08-14 18:48:52,306 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 4250, loss[loss=0.09805, beats_loss=0.008428, ecapa_loss=0.0001635, whisper_loss=0.08799, over 14337.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01067, ecapa_loss=0.0001543, whisper_loss=0.09102, over 3884296.82 frames. ], batch size: 58, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:49:15,655 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 19 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-14 18:49:25,837 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.60 vs. limit=15.0 2024-08-14 18:49:32,385 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2796120.0, ans=0.1 2024-08-14 18:49:50,745 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 20 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-14 18:49:58,112 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.97 vs. limit=6.0 2024-08-14 18:50:02,746 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 4300, loss[loss=0.0941, beats_loss=0.009308, ecapa_loss=0.000129, whisper_loss=0.0835, over 16996.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01069, ecapa_loss=0.0001543, whisper_loss=0.09015, over 3886932.50 frames. ], batch size: 64, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:50:11,606 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2796420.0, ans=0.125 2024-08-14 18:50:14,554 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.456e+01 2024-08-14 18:50:32,226 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 17 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-14 18:50:32,493 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2796620.0, ans=0.125 2024-08-14 18:50:34,936 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.793e+01 2.393e+01 2.675e+01 3.079e+01 4.317e+01, threshold=5.351e+01, percent-clipped=0.0 2024-08-14 18:50:38,180 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.48 vs. limit=15.0 2024-08-14 18:50:51,693 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 34 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-14 18:50:57,532 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.98 vs. limit=15.0 2024-08-14 18:51:06,688 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 24 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-14 18:51:07,062 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_na.min_abs, batch_count=2796820.0, ans=0.02 2024-08-14 18:51:18,066 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 4350, loss[loss=0.08184, beats_loss=0.01049, ecapa_loss=0.0001368, whisper_loss=0.06998, over 17143.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01069, ecapa_loss=0.0001538, whisper_loss=0.08992, over 3845462.93 frames. ], batch size: 69, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:51:27,504 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.96 vs. limit=6.0 2024-08-14 18:51:27,911 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 12 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-14 18:51:30,963 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2796920.0, ans=0.125 2024-08-14 18:51:34,960 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-14 18:51:39,475 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2797020.0, ans=0.125 2024-08-14 18:51:44,025 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.02 vs. limit=15.0 2024-08-14 18:51:45,508 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2797020.0, ans=0.1 2024-08-14 18:51:58,720 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-14 18:52:02,134 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2797220.0, ans=0.2 2024-08-14 18:52:11,194 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 35 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-14 18:52:16,039 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=2797220.0, ans=15.0 2024-08-14 18:52:18,901 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2024-08-14 18:52:22,138 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.07 vs. limit=15.0 2024-08-14 18:52:24,348 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=2797320.0, ans=0.1 2024-08-14 18:52:32,669 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 4400, loss[loss=0.1098, beats_loss=0.009582, ecapa_loss=0.0001548, whisper_loss=0.09871, over 15828.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01066, ecapa_loss=0.0001544, whisper_loss=0.09037, over 3831899.09 frames. ], batch size: 59, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:52:53,793 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2797520.0, ans=0.95 2024-08-14 18:53:00,459 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-14 18:53:04,931 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.379e+01 2.659e+01 2.952e+01 7.187e+01, threshold=5.319e+01, percent-clipped=1.0 2024-08-14 18:53:32,141 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 21 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-14 18:53:35,802 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2797820.0, ans=0.125 2024-08-14 18:53:38,500 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 18:53:48,953 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 4450, loss[loss=0.1071, beats_loss=0.008322, ecapa_loss=0.0002125, whisper_loss=0.09663, over 15670.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01055, ecapa_loss=0.0001556, whisper_loss=0.09124, over 3836228.20 frames. ], batch size: 63, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:53:52,343 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2797920.0, ans=0.0 2024-08-14 18:53:54,277 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2797920.0, ans=0.125 2024-08-14 18:53:57,270 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2797920.0, ans=0.0 2024-08-14 18:54:02,263 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.12 vs. limit=22.5 2024-08-14 18:54:15,811 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 18 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-14 18:54:30,114 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.20 vs. limit=22.5 2024-08-14 18:54:31,847 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 18:54:51,987 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2798320.0, ans=0.125 2024-08-14 18:54:56,471 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2798320.0, ans=0.0 2024-08-14 18:55:06,967 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 4500, loss[loss=0.1151, beats_loss=0.009695, ecapa_loss=0.0001767, whisper_loss=0.1036, over 21256.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01051, ecapa_loss=0.0001563, whisper_loss=0.09186, over 3848163.65 frames. ], batch size: 90, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:55:09,187 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2798420.0, ans=0.1 2024-08-14 18:55:13,201 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2798420.0, ans=0.125 2024-08-14 18:55:17,950 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 19 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-14 18:55:18,241 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2798420.0, ans=0.125 2024-08-14 18:55:42,776 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.761e+01 2.288e+01 2.644e+01 2.918e+01 3.847e+02, threshold=5.287e+01, percent-clipped=3.0 2024-08-14 18:55:46,662 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.42 vs. limit=22.5 2024-08-14 18:55:58,985 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2798720.0, ans=0.1 2024-08-14 18:56:01,740 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2798720.0, ans=0.125 2024-08-14 18:56:11,010 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.33 vs. limit=10.0 2024-08-14 18:56:26,116 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 4550, loss[loss=0.1237, beats_loss=0.01071, ecapa_loss=0.000125, whisper_loss=0.1118, over 23898.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01056, ecapa_loss=0.0001561, whisper_loss=0.09177, over 3875699.45 frames. ], batch size: 92, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:56:33,333 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2798920.0, ans=0.125 2024-08-14 18:56:37,792 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2798920.0, ans=0.125 2024-08-14 18:56:58,934 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2799120.0, ans=0.125 2024-08-14 18:57:03,483 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-14 18:57:15,153 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2799220.0, ans=0.0 2024-08-14 18:57:22,461 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 28 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-14 18:57:32,566 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.58 vs. limit=15.0 2024-08-14 18:57:34,774 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 17 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-14 18:57:43,384 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 4600, loss[loss=0.1091, beats_loss=0.009454, ecapa_loss=0.0001598, whisper_loss=0.09801, over 18799.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01066, ecapa_loss=0.0001546, whisper_loss=0.09106, over 3871648.08 frames. ], batch size: 74, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:58:04,395 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 8 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-14 18:58:08,118 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.84 vs. limit=15.0 2024-08-14 18:58:15,992 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.853e+01 2.391e+01 2.667e+01 2.855e+01 4.020e+01, threshold=5.333e+01, percent-clipped=0.0 2024-08-14 18:58:19,546 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2799620.0, ans=0.125 2024-08-14 18:58:27,805 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2799720.0, ans=0.1 2024-08-14 18:58:42,067 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.60 vs. limit=22.5 2024-08-14 18:58:43,258 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2799820.0, ans=0.2 2024-08-14 18:58:53,677 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 26 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-14 18:58:55,855 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.08 vs. limit=22.5 2024-08-14 18:58:58,046 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 4650, loss[loss=0.08223, beats_loss=0.01371, ecapa_loss=0.0001154, whisper_loss=0.06737, over 17633.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01068, ecapa_loss=0.0001554, whisper_loss=0.09061, over 3879925.54 frames. ], batch size: 69, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:58:59,995 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2799920.0, ans=0.2 2024-08-14 18:59:33,751 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.21 vs. limit=15.0 2024-08-14 18:59:58,159 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.23 vs. limit=22.5 2024-08-14 19:00:01,088 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2800320.0, ans=0.05 2024-08-14 19:00:05,073 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 25 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-14 19:00:17,649 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 4700, loss[loss=0.07913, beats_loss=0.01227, ecapa_loss=0.0002111, whisper_loss=0.06475, over 20532.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01068, ecapa_loss=0.0001549, whisper_loss=0.09088, over 3874757.70 frames. ], batch size: 93, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:00:22,106 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 21 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-14 19:00:29,403 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2800420.0, ans=0.0 2024-08-14 19:00:30,595 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 25 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-14 19:00:37,308 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.81 vs. limit=22.5 2024-08-14 19:00:39,954 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2800520.0, ans=0.1 2024-08-14 19:00:42,645 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 19 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-14 19:00:48,145 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2800620.0, ans=0.125 2024-08-14 19:00:50,847 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.338e+01 2.588e+01 2.905e+01 3.899e+01, threshold=5.177e+01, percent-clipped=0.0 2024-08-14 19:00:55,999 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2800620.0, ans=0.0 2024-08-14 19:01:12,676 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2800720.0, ans=0.125 2024-08-14 19:01:32,060 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 32 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-14 19:01:33,176 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 4750, loss[loss=0.1087, beats_loss=0.01114, ecapa_loss=0.0001377, whisper_loss=0.09618, over 23780.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0107, ecapa_loss=0.0001551, whisper_loss=0.09051, over 3853614.23 frames. ], batch size: 93, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:01:45,304 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2800920.0, ans=0.125 2024-08-14 19:01:53,108 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=2801020.0, ans=0.05 2024-08-14 19:01:54,189 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 19 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-14 19:02:00,301 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.96 vs. limit=15.0 2024-08-14 19:02:17,378 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 23 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-14 19:02:20,567 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2801220.0, ans=0.0 2024-08-14 19:02:30,550 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2801220.0, ans=0.07 2024-08-14 19:02:39,427 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2801320.0, ans=0.0 2024-08-14 19:02:47,818 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 4800, loss[loss=0.09527, beats_loss=0.01161, ecapa_loss=0.0001535, whisper_loss=0.08212, over 23237.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01068, ecapa_loss=0.0001562, whisper_loss=0.09109, over 3866320.60 frames. ], batch size: 95, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:02:59,020 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2801420.0, ans=0.125 2024-08-14 19:03:00,336 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2801420.0, ans=0.0 2024-08-14 19:03:06,091 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2801520.0, ans=0.125 2024-08-14 19:03:07,169 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-14 19:03:20,155 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.921e+01 2.344e+01 2.546e+01 2.876e+01 4.578e+02, threshold=5.092e+01, percent-clipped=1.0 2024-08-14 19:04:01,152 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 4850, loss[loss=0.1144, beats_loss=0.009998, ecapa_loss=0.0001788, whisper_loss=0.1026, over 22626.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01067, ecapa_loss=0.0001566, whisper_loss=0.09144, over 3894708.64 frames. ], batch size: 93, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:04:15,142 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 23 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-14 19:04:31,504 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2802120.0, ans=0.1 2024-08-14 19:04:33,007 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2802120.0, ans=0.0 2024-08-14 19:04:34,274 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2802120.0, ans=0.0 2024-08-14 19:04:37,189 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2802120.0, ans=0.1 2024-08-14 19:04:38,489 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2802120.0, ans=0.125 2024-08-14 19:05:17,484 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 4900, loss[loss=0.09353, beats_loss=0.01062, ecapa_loss=0.0001608, whisper_loss=0.08131, over 22059.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01066, ecapa_loss=0.0001558, whisper_loss=0.09096, over 3889759.13 frames. ], batch size: 89, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:05:22,201 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 19 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-14 19:05:25,985 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2802420.0, ans=0.0 2024-08-14 19:05:27,585 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2802420.0, ans=0.125 2024-08-14 19:05:52,211 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.355e+01 2.636e+01 2.883e+01 6.029e+01, threshold=5.271e+01, percent-clipped=1.0 2024-08-14 19:06:01,368 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2802620.0, ans=0.2 2024-08-14 19:06:35,289 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.98 vs. limit=15.0 2024-08-14 19:06:36,037 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 26 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-14 19:06:36,351 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2802820.0, ans=0.1 2024-08-14 19:06:38,446 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 4950, loss[loss=0.1071, beats_loss=0.009751, ecapa_loss=0.000151, whisper_loss=0.09581, over 23320.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01054, ecapa_loss=0.0001563, whisper_loss=0.09168, over 3883718.32 frames. ], batch size: 92, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:06:46,348 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 24 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-14 19:06:49,159 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 19 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-14 19:06:49,603 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2802920.0, ans=0.1 2024-08-14 19:07:00,718 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 23 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-14 19:07:10,893 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2803120.0, ans=0.125 2024-08-14 19:07:24,470 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 24 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-14 19:07:26,309 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2803220.0, ans=0.2 2024-08-14 19:07:30,310 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 21 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-14 19:07:34,644 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 18 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-14 19:07:54,043 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 5000, loss[loss=0.08815, beats_loss=0.01303, ecapa_loss=0.0001229, whisper_loss=0.07389, over 18357.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01053, ecapa_loss=0.0001571, whisper_loss=0.09166, over 3873990.22 frames. ], batch size: 73, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:07:55,964 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 20 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-14 19:08:01,834 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2803420.0, ans=0.1 2024-08-14 19:08:23,465 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 19 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-14 19:08:24,850 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 31 from Vox, 29 fro AS 2024-08-14 19:08:25,936 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.838e+01 2.350e+01 2.620e+01 2.995e+01 1.741e+02, threshold=5.241e+01, percent-clipped=2.0 2024-08-14 19:08:39,294 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-14 19:08:44,980 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 24 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-14 19:09:06,234 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 5050, loss[loss=0.09018, beats_loss=0.01057, ecapa_loss=0.0001536, whisper_loss=0.07808, over 13737.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01065, ecapa_loss=0.0001554, whisper_loss=0.09113, over 3868719.82 frames. ], batch size: 56, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:09:21,706 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2804020.0, ans=0.1 2024-08-14 19:09:27,596 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 28 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-14 19:09:31,604 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 33 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-14 19:09:37,824 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2804120.0, ans=0.125 2024-08-14 19:09:44,994 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.93 vs. limit=22.5 2024-08-14 19:10:06,159 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 19 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-14 19:10:16,767 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 34 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-14 19:10:21,042 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 5100, loss[loss=0.1071, beats_loss=0.01306, ecapa_loss=0.0001124, whisper_loss=0.09288, over 23158.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01072, ecapa_loss=0.0001535, whisper_loss=0.09141, over 3871927.58 frames. ], batch size: 91, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:10:25,777 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2804420.0, ans=0.125 2024-08-14 19:10:26,926 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 23 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-14 19:10:45,715 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2804520.0, ans=0.125 2024-08-14 19:10:45,752 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2804520.0, ans=0.0 2024-08-14 19:10:47,421 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2804520.0, ans=0.1 2024-08-14 19:10:52,131 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2804620.0, ans=0.0 2024-08-14 19:10:56,829 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.976e+01 2.368e+01 2.597e+01 2.934e+01 4.134e+01, threshold=5.194e+01, percent-clipped=0.0 2024-08-14 19:11:13,482 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-14 19:11:24,487 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2804820.0, ans=0.1 2024-08-14 19:11:24,637 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.90 vs. limit=15.0 2024-08-14 19:11:40,537 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 5150, loss[loss=0.1022, beats_loss=0.01121, ecapa_loss=0.0001265, whisper_loss=0.08975, over 15668.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01067, ecapa_loss=0.0001535, whisper_loss=0.09235, over 3878025.78 frames. ], batch size: 59, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:11:54,230 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 18 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-14 19:11:54,595 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2805020.0, ans=0.07 2024-08-14 19:12:00,412 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2805020.0, ans=0.2 2024-08-14 19:12:00,479 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2805020.0, ans=0.0 2024-08-14 19:12:22,287 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2805120.0, ans=0.125 2024-08-14 19:12:43,252 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-14 19:12:54,911 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 5200, loss[loss=0.1062, beats_loss=0.01073, ecapa_loss=0.0001548, whisper_loss=0.09392, over 20274.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01069, ecapa_loss=0.0001525, whisper_loss=0.0914, over 3882439.85 frames. ], batch size: 84, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:13:11,176 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2805520.0, ans=0.025 2024-08-14 19:13:12,668 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2805520.0, ans=0.125 2024-08-14 19:13:15,412 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2805520.0, ans=0.1 2024-08-14 19:13:22,550 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-14 19:13:25,985 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.14 vs. limit=10.0 2024-08-14 19:13:28,222 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.784e+01 2.358e+01 2.582e+01 2.808e+01 4.877e+01, threshold=5.164e+01, percent-clipped=0.0 2024-08-14 19:13:32,108 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2805620.0, ans=0.0 2024-08-14 19:13:40,876 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=5.670e+05 2024-08-14 19:13:48,532 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.15 vs. limit=15.0 2024-08-14 19:13:49,418 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 19 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-14 19:13:50,100 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2805720.0, ans=0.125 2024-08-14 19:14:10,337 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 5250, loss[loss=0.1005, beats_loss=0.008486, ecapa_loss=0.0001478, whisper_loss=0.09058, over 16339.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01065, ecapa_loss=0.0001527, whisper_loss=0.09151, over 3833729.07 frames. ], batch size: 65, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:14:28,845 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 25 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-14 19:14:37,188 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2806020.0, ans=0.125 2024-08-14 19:14:40,099 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2806120.0, ans=0.0 2024-08-14 19:14:40,140 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2806120.0, ans=0.125 2024-08-14 19:14:44,593 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 17 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-14 19:14:48,717 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-14 19:14:56,669 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0 2024-08-14 19:14:56,714 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.27 vs. limit=6.0 2024-08-14 19:15:00,490 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-14 19:15:00,822 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2806220.0, ans=0.1 2024-08-14 19:15:20,171 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 25 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-14 19:15:27,954 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 5300, loss[loss=0.07758, beats_loss=0.01311, ecapa_loss=0.0001664, whisper_loss=0.06281, over 19009.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01059, ecapa_loss=0.0001516, whisper_loss=0.09179, over 3857033.76 frames. ], batch size: 79, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:15:43,785 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.24 vs. limit=15.0 2024-08-14 19:16:02,014 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.957e+01 2.268e+01 2.456e+01 2.845e+01 4.034e+01, threshold=4.912e+01, percent-clipped=0.0 2024-08-14 19:16:02,701 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2806620.0, ans=0.0 2024-08-14 19:16:07,313 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2806620.0, ans=0.0 2024-08-14 19:16:08,937 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2806620.0, ans=0.2 2024-08-14 19:16:19,390 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.46 vs. limit=6.0 2024-08-14 19:16:21,782 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2806720.0, ans=0.2 2024-08-14 19:16:39,352 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.16 vs. limit=15.0 2024-08-14 19:16:44,961 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2806920.0, ans=0.125 2024-08-14 19:16:45,777 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 5350, loss[loss=0.09253, beats_loss=0.01143, ecapa_loss=0.0001526, whisper_loss=0.07958, over 21685.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.0106, ecapa_loss=0.0001521, whisper_loss=0.09137, over 3859817.03 frames. ], batch size: 87, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:16:50,761 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 22 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-14 19:17:10,306 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2807020.0, ans=0.1 2024-08-14 19:17:22,079 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2807120.0, ans=0.2 2024-08-14 19:17:27,533 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2807120.0, ans=0.125 2024-08-14 19:17:41,995 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 26 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-14 19:17:48,326 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.61 vs. limit=12.0 2024-08-14 19:18:02,576 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2807320.0, ans=0.125 2024-08-14 19:18:13,086 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 5400, loss[loss=0.1034, beats_loss=0.01035, ecapa_loss=0.0001534, whisper_loss=0.0915, over 20082.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01054, ecapa_loss=0.0001514, whisper_loss=0.09163, over 3851959.67 frames. ], batch size: 81, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:18:16,852 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2807420.0, ans=0.1 2024-08-14 19:18:31,659 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2807520.0, ans=0.125 2024-08-14 19:18:39,818 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2807520.0, ans=0.125 2024-08-14 19:18:50,158 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.977e+01 2.371e+01 2.761e+01 3.113e+01 5.866e+01, threshold=5.523e+01, percent-clipped=1.0 2024-08-14 19:18:53,897 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.57 vs. limit=15.0 2024-08-14 19:18:55,396 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2807620.0, ans=0.2 2024-08-14 19:18:55,782 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.76 vs. limit=15.0 2024-08-14 19:19:13,791 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2807720.0, ans=0.125 2024-08-14 19:19:17,956 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2807720.0, ans=0.2 2024-08-14 19:19:43,243 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 5450, loss[loss=0.1158, beats_loss=0.008742, ecapa_loss=0.0001753, whisper_loss=0.1053, over 17470.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0106, ecapa_loss=0.0001522, whisper_loss=0.09104, over 3863370.03 frames. ], batch size: 69, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:19:53,554 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2807920.0, ans=0.125 2024-08-14 19:20:41,228 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2808120.0, ans=0.125 2024-08-14 19:21:03,989 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2808320.0, ans=0.125 2024-08-14 19:21:08,787 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.85 vs. limit=22.5 2024-08-14 19:21:12,128 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2808320.0, ans=0.2 2024-08-14 19:21:23,753 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 5500, loss[loss=0.1101, beats_loss=0.01162, ecapa_loss=0.0001485, whisper_loss=0.09704, over 23256.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01062, ecapa_loss=0.0001531, whisper_loss=0.09075, over 3864471.49 frames. ], batch size: 94, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:21:26,880 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.40 vs. limit=15.0 2024-08-14 19:21:27,948 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2808420.0, ans=0.0 2024-08-14 19:21:33,731 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2808420.0, ans=0.1 2024-08-14 19:22:02,869 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 27 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-14 19:22:09,218 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.011e+01 2.424e+01 2.779e+01 3.099e+01 3.330e+02, threshold=5.557e+01, percent-clipped=2.0 2024-08-14 19:23:11,637 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 5550, loss[loss=0.06793, beats_loss=0.01425, ecapa_loss=0.000154, whisper_loss=0.05214, over 13837.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01064, ecapa_loss=0.0001539, whisper_loss=0.09067, over 3867051.49 frames. ], batch size: 59, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:23:33,766 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2809020.0, ans=0.125 2024-08-14 19:23:57,996 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2809120.0, ans=0.125 2024-08-14 19:24:00,237 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 29 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-14 19:24:12,791 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 19:24:12,873 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2809220.0, ans=0.0 2024-08-14 19:24:29,551 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-14 19:24:32,962 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2809220.0, ans=0.5 2024-08-14 19:24:36,039 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2809320.0, ans=0.0 2024-08-14 19:24:38,563 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=2809320.0, ans=0.02 2024-08-14 19:24:51,711 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-14 19:24:52,697 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 5600, loss[loss=0.09508, beats_loss=0.01102, ecapa_loss=0.0001708, whisper_loss=0.08235, over 20833.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01077, ecapa_loss=0.000153, whisper_loss=0.09029, over 3873728.37 frames. ], batch size: 88, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:25:24,319 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.944e+01 2.292e+01 2.694e+01 2.993e+01 3.874e+01, threshold=5.387e+01, percent-clipped=0.0 2024-08-14 19:25:29,389 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2809620.0, ans=0.95 2024-08-14 19:25:33,619 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2809620.0, ans=0.125 2024-08-14 19:25:34,698 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 26 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-14 19:25:35,053 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2809720.0, ans=0.2 2024-08-14 19:25:37,703 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2809720.0, ans=0.125 2024-08-14 19:25:53,398 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 35 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-14 19:25:55,063 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-14 19:25:58,068 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2809820.0, ans=0.125 2024-08-14 19:26:02,416 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2809820.0, ans=0.125 2024-08-14 19:26:02,798 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.84 vs. limit=22.5 2024-08-14 19:26:04,766 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 5650, loss[loss=0.09138, beats_loss=0.01208, ecapa_loss=0.0001309, whisper_loss=0.07799, over 22260.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01081, ecapa_loss=0.000153, whisper_loss=0.08976, over 3903656.02 frames. ], batch size: 91, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:26:15,992 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2809920.0, ans=0.1 2024-08-14 19:26:21,489 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 23 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-14 19:26:23,117 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2810020.0, ans=0.2 2024-08-14 19:26:36,491 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2810120.0, ans=0.04949747468305833 2024-08-14 19:26:48,548 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-14 19:26:48,802 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2810220.0, ans=0.125 2024-08-14 19:27:06,785 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.204e+01 2024-08-14 19:27:14,436 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2810320.0, ans=0.125 2024-08-14 19:27:19,456 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 5700, loss[loss=0.09518, beats_loss=0.007224, ecapa_loss=0.0001997, whisper_loss=0.08596, over 14171.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01076, ecapa_loss=0.0001535, whisper_loss=0.08983, over 3928571.20 frames. ], batch size: 57, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:27:32,302 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 19:27:47,718 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-14 19:27:51,795 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.948e+01 2.307e+01 2.514e+01 2.816e+01 4.087e+01, threshold=5.028e+01, percent-clipped=0.0 2024-08-14 19:27:55,358 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2810620.0, ans=0.125 2024-08-14 19:28:00,255 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2810620.0, ans=0.1 2024-08-14 19:28:01,429 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 23 from LS+wenet, 18 from Vox, 51 fro AS 2024-08-14 19:28:05,689 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2810720.0, ans=0.0 2024-08-14 19:28:14,881 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.05 vs. limit=6.0 2024-08-14 19:28:17,556 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2810820.0, ans=0.125 2024-08-14 19:28:27,716 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2810820.0, ans=0.125 2024-08-14 19:28:30,613 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2810820.0, ans=0.125 2024-08-14 19:28:31,554 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 22 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-14 19:28:32,838 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 5750, loss[loss=0.08947, beats_loss=0.01324, ecapa_loss=0.0001443, whisper_loss=0.07479, over 19559.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01081, ecapa_loss=0.0001538, whisper_loss=0.08921, over 3917097.85 frames. ], batch size: 81, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:28:39,259 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2810920.0, ans=0.1 2024-08-14 19:28:39,296 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2810920.0, ans=0.1 2024-08-14 19:29:15,442 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2811120.0, ans=0.125 2024-08-14 19:29:23,370 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.99 vs. limit=22.5 2024-08-14 19:29:34,434 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-14 19:29:42,377 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2811320.0, ans=0.2 2024-08-14 19:29:49,370 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 5800, loss[loss=0.08298, beats_loss=0.00993, ecapa_loss=0.0001579, whisper_loss=0.07147, over 16820.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01079, ecapa_loss=0.0001538, whisper_loss=0.08913, over 3893754.70 frames. ], batch size: 66, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:29:50,100 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2811420.0, ans=0.0 2024-08-14 19:30:09,377 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 28 from LS+wenet, 10 from Vox, 36 fro AS 2024-08-14 19:30:14,062 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 19:30:14,177 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2811520.0, ans=0.2 2024-08-14 19:30:17,044 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-14 19:30:22,222 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.769e+01 2.246e+01 2.501e+01 2.765e+01 4.187e+01, threshold=5.003e+01, percent-clipped=0.0 2024-08-14 19:30:47,732 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2811820.0, ans=0.125 2024-08-14 19:30:48,044 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=2811820.0, ans=15.0 2024-08-14 19:30:50,919 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.52 vs. limit=12.0 2024-08-14 19:30:54,527 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 26 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-14 19:30:56,332 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2811820.0, ans=0.04949747468305833 2024-08-14 19:31:03,307 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 5850, loss[loss=0.0996, beats_loss=0.01121, ecapa_loss=0.0001355, whisper_loss=0.08703, over 21427.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01082, ecapa_loss=0.0001537, whisper_loss=0.08896, over 3909156.83 frames. ], batch size: 84, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:31:07,883 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2811920.0, ans=0.125 2024-08-14 19:31:12,365 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.01 vs. limit=22.5 2024-08-14 19:31:16,937 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.12 vs. limit=15.0 2024-08-14 19:31:30,218 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2812020.0, ans=0.1 2024-08-14 19:31:40,275 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 14 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-14 19:31:42,052 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2812120.0, ans=0.0 2024-08-14 19:32:07,168 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.56 vs. limit=15.0 2024-08-14 19:32:16,370 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 5900, loss[loss=0.1089, beats_loss=0.01141, ecapa_loss=0.0001486, whisper_loss=0.09603, over 21787.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01081, ecapa_loss=0.0001528, whisper_loss=0.08932, over 3885235.73 frames. ], batch size: 84, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:32:32,640 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2812520.0, ans=0.125 2024-08-14 19:32:36,872 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2812520.0, ans=0.125 2024-08-14 19:32:49,366 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.347e+01 2.667e+01 3.027e+01 4.357e+01, threshold=5.334e+01, percent-clipped=0.0 2024-08-14 19:33:08,837 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.56 vs. limit=15.0 2024-08-14 19:33:22,964 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2812820.0, ans=0.125 2024-08-14 19:33:30,986 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 5950, loss[loss=0.09389, beats_loss=0.009216, ecapa_loss=0.0002001, whisper_loss=0.08268, over 14359.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01083, ecapa_loss=0.0001544, whisper_loss=0.08943, over 3885742.13 frames. ], batch size: 59, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:33:32,921 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2812920.0, ans=0.0 2024-08-14 19:33:55,819 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2813020.0, ans=0.2 2024-08-14 19:34:04,062 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.99 vs. limit=22.5 2024-08-14 19:34:04,203 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.33 vs. limit=22.5 2024-08-14 19:34:08,858 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 19 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-14 19:34:13,688 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2813120.0, ans=0.2 2024-08-14 19:34:13,915 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.51 vs. limit=15.0 2024-08-14 19:34:18,011 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2813220.0, ans=0.1 2024-08-14 19:34:22,287 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2813220.0, ans=0.125 2024-08-14 19:34:22,369 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2813220.0, ans=0.125 2024-08-14 19:34:23,895 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.83 vs. limit=15.0 2024-08-14 19:34:33,730 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=2813320.0, ans=0.02 2024-08-14 19:34:38,686 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2813320.0, ans=0.0 2024-08-14 19:34:45,145 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 6000, loss[loss=0.0998, beats_loss=0.01235, ecapa_loss=0.0001175, whisper_loss=0.08627, over 23195.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01088, ecapa_loss=0.0001528, whisper_loss=0.08911, over 3892835.82 frames. ], batch size: 91, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:34:45,145 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-14 19:35:23,102 INFO [train_multi_KD3.py:1149] (1/4) Epoch 20, validation on ASR_libri: loss=0.2526, beats_loss=0, ecapa_loss=0.0005442, whisper_loss=0.2472, over 922467.00 frames. 2024-08-14 19:35:42,512 INFO [train_multi_KD3.py:1149] (1/4) Epoch 20, validation on SV_voxceleb1: loss=0.004201, beats_loss=0, ecapa_loss=0.0004201, whisper_loss=0, over 939242.00 frames. 2024-08-14 19:37:36,253 INFO [train_multi_KD3.py:1149] (1/4) Epoch 20, validation on AT_audioset: loss=0.02345, beats_loss=0.02345, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 19:37:36,257 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 30639MB 2024-08-14 19:38:01,371 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2813520.0, ans=0.125 2024-08-14 19:38:07,756 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 13 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-14 19:38:10,592 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.756e+01 2.295e+01 2.518e+01 2.791e+01 2.335e+02, threshold=5.037e+01, percent-clipped=2.0 2024-08-14 19:38:16,219 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2813620.0, ans=0.125 2024-08-14 19:38:26,518 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2813720.0, ans=0.125 2024-08-14 19:38:31,121 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 32 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-14 19:38:42,049 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.40 vs. limit=15.0 2024-08-14 19:38:45,982 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 17 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-14 19:38:50,446 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2813820.0, ans=0.125 2024-08-14 19:38:53,069 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 6050, loss[loss=0.1144, beats_loss=0.01048, ecapa_loss=0.000153, whisper_loss=0.1024, over 22520.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01078, ecapa_loss=0.0001539, whisper_loss=0.09041, over 3880709.96 frames. ], batch size: 89, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:38:56,470 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2813920.0, ans=0.0 2024-08-14 19:39:01,071 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2813920.0, ans=0.0 2024-08-14 19:39:04,110 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2813920.0, ans=0.1 2024-08-14 19:39:11,249 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2814020.0, ans=0.125 2024-08-14 19:39:12,580 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2814020.0, ans=0.2 2024-08-14 19:39:14,705 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.78 vs. limit=15.0 2024-08-14 19:39:27,866 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.46 vs. limit=15.0 2024-08-14 19:39:28,575 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 23 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-14 19:39:37,200 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 28 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-14 19:39:49,728 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2814220.0, ans=0.1 2024-08-14 19:39:53,203 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.05 vs. limit=22.5 2024-08-14 19:39:59,804 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.84 vs. limit=15.0 2024-08-14 19:40:06,326 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 6100, loss[loss=0.08895, beats_loss=0.01411, ecapa_loss=0.000153, whisper_loss=0.07331, over 17779.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01072, ecapa_loss=0.0001556, whisper_loss=0.09096, over 3901354.80 frames. ], batch size: 73, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:40:06,677 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 29 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-14 19:40:12,406 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-14 19:40:21,479 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 17 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-14 19:40:38,720 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.752e+01 2.270e+01 2.572e+01 2.867e+01 4.147e+01, threshold=5.145e+01, percent-clipped=0.0 2024-08-14 19:40:59,446 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2814720.0, ans=0.1 2024-08-14 19:41:02,658 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.07 vs. limit=15.0 2024-08-14 19:41:18,687 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2814920.0, ans=0.2 2024-08-14 19:41:19,560 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 6150, loss[loss=0.0799, beats_loss=0.01234, ecapa_loss=0.0001618, whisper_loss=0.06594, over 18996.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01089, ecapa_loss=0.0001552, whisper_loss=0.08987, over 3914729.90 frames. ], batch size: 82, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:41:28,532 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-14 19:41:43,224 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 23 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-14 19:41:47,940 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2815120.0, ans=0.125 2024-08-14 19:41:59,172 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2815120.0, ans=0.125 2024-08-14 19:42:05,155 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2815220.0, ans=0.0 2024-08-14 19:42:32,971 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 6200, loss[loss=0.08014, beats_loss=0.01196, ecapa_loss=0.000138, whisper_loss=0.06681, over 18886.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01097, ecapa_loss=0.0001544, whisper_loss=0.0889, over 3897564.90 frames. ], batch size: 76, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:42:41,084 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2815420.0, ans=0.1 2024-08-14 19:42:45,728 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.50 vs. limit=22.5 2024-08-14 19:43:05,942 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.681e+01 2.332e+01 2.614e+01 2.876e+01 4.461e+01, threshold=5.229e+01, percent-clipped=0.0 2024-08-14 19:43:30,397 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 22 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-14 19:43:32,774 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.00 vs. limit=15.0 2024-08-14 19:43:48,141 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 6250, loss[loss=0.06461, beats_loss=0.01371, ecapa_loss=0.0001192, whisper_loss=0.0497, over 13863.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01085, ecapa_loss=0.0001544, whisper_loss=0.08898, over 3900816.17 frames. ], batch size: 56, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:43:53,099 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2815920.0, ans=0.125 2024-08-14 19:44:01,275 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 20 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-14 19:44:13,068 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2816020.0, ans=0.125 2024-08-14 19:44:26,397 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2816120.0, ans=0.0 2024-08-14 19:44:28,850 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 26 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-14 19:44:42,928 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2816220.0, ans=0.125 2024-08-14 19:44:45,416 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 17 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-14 19:44:53,067 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2816320.0, ans=0.0 2024-08-14 19:44:57,045 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 22 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-14 19:45:01,419 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 6300, loss[loss=0.1166, beats_loss=0.008113, ecapa_loss=0.0001872, whisper_loss=0.1066, over 22098.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01079, ecapa_loss=0.0001555, whisper_loss=0.08906, over 3887989.46 frames. ], batch size: 86, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:45:03,039 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 33 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-14 19:45:03,457 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2816420.0, ans=0.125 2024-08-14 19:45:08,000 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.86 vs. limit=10.0 2024-08-14 19:45:28,012 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2816520.0, ans=0.0 2024-08-14 19:45:33,112 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.966e+01 2.242e+01 2.428e+01 2.656e+01 5.822e+01, threshold=4.856e+01, percent-clipped=1.0 2024-08-14 19:45:46,718 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2816720.0, ans=0.125 2024-08-14 19:46:13,710 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 6350, loss[loss=0.104, beats_loss=0.01112, ecapa_loss=0.0001826, whisper_loss=0.09107, over 21849.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01075, ecapa_loss=0.0001559, whisper_loss=0.09005, over 3939958.86 frames. ], batch size: 91, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:46:15,529 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 11 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-14 19:46:26,877 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2817020.0, ans=0.2 2024-08-14 19:46:45,167 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2817120.0, ans=0.125 2024-08-14 19:46:59,335 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2817220.0, ans=0.09899494936611666 2024-08-14 19:47:10,691 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 24 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-14 19:47:28,804 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 6400, loss[loss=0.09576, beats_loss=0.01083, ecapa_loss=0.0001629, whisper_loss=0.0833, over 19968.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01069, ecapa_loss=0.0001558, whisper_loss=0.09049, over 3897659.31 frames. ], batch size: 81, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:47:29,921 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.54 vs. limit=15.0 2024-08-14 19:47:40,823 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2817420.0, ans=0.125 2024-08-14 19:47:58,314 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 29 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-14 19:48:01,124 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.350e+01 2.618e+01 2.916e+01 9.868e+01, threshold=5.236e+01, percent-clipped=1.0 2024-08-14 19:48:01,345 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-14 19:48:07,653 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2817620.0, ans=0.125 2024-08-14 19:48:35,299 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 24 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-14 19:48:36,992 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.24 vs. limit=15.0 2024-08-14 19:48:41,453 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 31 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-14 19:48:42,684 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 6450, loss[loss=0.1257, beats_loss=0.009087, ecapa_loss=0.0001632, whisper_loss=0.115, over 19674.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01075, ecapa_loss=0.0001541, whisper_loss=0.0908, over 3938019.34 frames. ], batch size: 74, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:48:47,434 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2817920.0, ans=0.1 2024-08-14 19:48:58,859 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 30 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-14 19:49:24,426 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 19 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-14 19:49:26,430 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2818120.0, ans=0.125 2024-08-14 19:49:26,465 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2818120.0, ans=0.125 2024-08-14 19:49:36,571 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 22 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-14 19:49:38,670 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2818220.0, ans=0.1 2024-08-14 19:49:40,536 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 32 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-14 19:49:47,923 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 30 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-14 19:50:00,103 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 6500, loss[loss=0.09353, beats_loss=0.0108, ecapa_loss=0.000161, whisper_loss=0.08111, over 14659.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01071, ecapa_loss=0.0001542, whisper_loss=0.09113, over 3926685.14 frames. ], batch size: 58, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:50:35,371 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.395e+01 2.629e+01 2.951e+01 4.669e+01, threshold=5.259e+01, percent-clipped=0.0 2024-08-14 19:50:57,952 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2818720.0, ans=0.0 2024-08-14 19:51:16,465 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 6550, loss[loss=0.1006, beats_loss=0.01098, ecapa_loss=0.0001441, whisper_loss=0.08818, over 16879.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01069, ecapa_loss=0.0001533, whisper_loss=0.09117, over 3962597.68 frames. ], batch size: 67, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:51:20,259 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2818920.0, ans=0.125 2024-08-14 19:51:21,551 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 22 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-14 19:51:31,031 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-14 19:51:46,269 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=2819120.0, ans=0.05 2024-08-14 19:52:02,189 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2819220.0, ans=0.2 2024-08-14 19:52:12,304 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2819220.0, ans=0.125 2024-08-14 19:52:22,040 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2819320.0, ans=0.125 2024-08-14 19:52:24,918 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=2819320.0, ans=0.2 2024-08-14 19:52:34,578 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2819420.0, ans=0.0 2024-08-14 19:52:36,018 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 6600, loss[loss=0.1014, beats_loss=0.01127, ecapa_loss=0.000167, whisper_loss=0.08851, over 22395.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01072, ecapa_loss=0.0001539, whisper_loss=0.09169, over 3977397.85 frames. ], batch size: 91, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:53:08,890 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2819620.0, ans=0.1 2024-08-14 19:53:13,113 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.013e+01 2.460e+01 2.689e+01 3.191e+01 5.178e+01, threshold=5.378e+01, percent-clipped=0.0 2024-08-14 19:53:15,394 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.40 vs. limit=15.0 2024-08-14 19:53:17,955 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2819620.0, ans=0.125 2024-08-14 19:53:22,968 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 26 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-14 19:53:27,690 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 27 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-14 19:53:29,706 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2819720.0, ans=0.125 2024-08-14 19:53:30,809 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 20 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-14 19:53:51,299 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2819820.0, ans=0.1 2024-08-14 19:53:55,282 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 6650, loss[loss=0.09765, beats_loss=0.01108, ecapa_loss=0.0001408, whisper_loss=0.08516, over 16180.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01072, ecapa_loss=0.0001541, whisper_loss=0.09142, over 3961178.53 frames. ], batch size: 63, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:53:57,778 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.11 vs. limit=15.0 2024-08-14 19:54:01,675 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 19 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-14 19:54:10,969 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2820020.0, ans=0.0 2024-08-14 19:54:13,342 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 34 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-14 19:54:40,468 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2820120.0, ans=0.1 2024-08-14 19:54:43,531 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.23 vs. limit=15.0 2024-08-14 19:55:05,976 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2820320.0, ans=0.1 2024-08-14 19:55:12,851 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2820320.0, ans=0.125 2024-08-14 19:55:15,208 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 6700, loss[loss=0.1168, beats_loss=0.008514, ecapa_loss=0.00015, whisper_loss=0.1068, over 15190.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01074, ecapa_loss=0.0001542, whisper_loss=0.09087, over 3941710.14 frames. ], batch size: 57, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:55:19,216 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.84 vs. limit=15.0 2024-08-14 19:55:36,162 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.08 vs. limit=15.0 2024-08-14 19:55:50,073 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.753e+01 2.346e+01 2.527e+01 2.810e+01 4.755e+01, threshold=5.054e+01, percent-clipped=0.0 2024-08-14 19:56:18,968 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 20 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-14 19:56:32,301 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 6750, loss[loss=0.1238, beats_loss=0.009808, ecapa_loss=0.0001522, whisper_loss=0.1125, over 22299.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01065, ecapa_loss=0.0001552, whisper_loss=0.09141, over 3928440.26 frames. ], batch size: 84, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:56:39,228 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-14 19:56:46,131 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2820920.0, ans=0.1 2024-08-14 19:57:06,622 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.39 vs. limit=15.0 2024-08-14 19:57:11,037 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2821120.0, ans=0.025 2024-08-14 19:57:17,428 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2821120.0, ans=0.0 2024-08-14 19:57:32,458 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-14 19:57:39,415 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 19 from LS+wenet, 22 from Vox, 48 fro AS 2024-08-14 19:57:50,922 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 6800, loss[loss=0.09495, beats_loss=0.01127, ecapa_loss=0.0001581, whisper_loss=0.0821, over 15834.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01068, ecapa_loss=0.0001548, whisper_loss=0.09104, over 3932590.34 frames. ], batch size: 66, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:57:57,219 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2821420.0, ans=0.2 2024-08-14 19:57:57,739 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.81 vs. limit=10.0 2024-08-14 19:58:00,429 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2821420.0, ans=0.0 2024-08-14 19:58:03,529 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2821420.0, ans=0.0 2024-08-14 19:58:27,157 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.777e+01 2.395e+01 2.601e+01 3.094e+01 9.420e+01, threshold=5.202e+01, percent-clipped=3.0 2024-08-14 19:58:31,105 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.52 vs. limit=22.5 2024-08-14 19:58:35,418 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2821620.0, ans=0.125 2024-08-14 19:58:36,725 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 29 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-14 19:58:38,536 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2821720.0, ans=0.125 2024-08-14 19:58:43,648 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2821720.0, ans=0.125 2024-08-14 19:58:56,379 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 19 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-14 19:58:58,788 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.96 vs. limit=15.0 2024-08-14 19:59:08,216 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 6850, loss[loss=0.1025, beats_loss=0.009068, ecapa_loss=0.0001463, whisper_loss=0.09197, over 16866.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01063, ecapa_loss=0.0001546, whisper_loss=0.09067, over 3880737.87 frames. ], batch size: 67, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:59:25,134 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2822020.0, ans=0.0 2024-08-14 19:59:44,587 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 22 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-14 19:59:49,255 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 19 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-14 19:59:52,582 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 21 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-14 20:00:06,027 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 31 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-14 20:00:23,625 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 6900, loss[loss=0.1005, beats_loss=0.01193, ecapa_loss=0.0001395, whisper_loss=0.08717, over 22320.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01071, ecapa_loss=0.0001543, whisper_loss=0.09048, over 3884012.21 frames. ], batch size: 90, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 20:00:33,327 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2822420.0, ans=0.0 2024-08-14 20:00:34,861 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2822420.0, ans=0.125 2024-08-14 20:00:55,136 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2822620.0, ans=0.04949747468305833 2024-08-14 20:00:59,557 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.809e+01 2.271e+01 2.537e+01 2.771e+01 4.123e+01, threshold=5.074e+01, percent-clipped=0.0 2024-08-14 20:01:33,648 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=2822820.0, ans=15.0 2024-08-14 20:01:40,079 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 6950, loss[loss=0.1165, beats_loss=0.009615, ecapa_loss=0.0001899, whisper_loss=0.105, over 22954.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01073, ecapa_loss=0.0001542, whisper_loss=0.09082, over 3868825.44 frames. ], batch size: 95, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 20:01:40,678 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2822920.0, ans=0.125 2024-08-14 20:01:43,624 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2822920.0, ans=0.0 2024-08-14 20:01:57,175 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 22 from LS+wenet, 32 from Vox, 37 fro AS 2024-08-14 20:02:00,016 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 27 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-14 20:02:19,686 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2823120.0, ans=0.125 2024-08-14 20:02:20,799 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 25 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-14 20:02:34,349 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 21 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-14 20:02:42,270 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2823320.0, ans=0.0 2024-08-14 20:02:42,693 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.90 vs. limit=15.0 2024-08-14 20:02:45,017 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-14 20:02:46,628 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2823320.0, ans=0.125 2024-08-14 20:02:55,615 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 7000, loss[loss=0.09727, beats_loss=0.01241, ecapa_loss=0.0001246, whisper_loss=0.08361, over 23571.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0107, ecapa_loss=0.0001541, whisper_loss=0.0908, over 3902012.68 frames. ], batch size: 91, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 20:03:00,654 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.04 vs. limit=10.0 2024-08-14 20:03:01,510 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2823420.0, ans=0.125 2024-08-14 20:03:25,141 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.15 vs. limit=22.5 2024-08-14 20:03:27,239 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-14 20:03:29,702 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.975e+01 2.376e+01 2.618e+01 2.959e+01 4.269e+01, threshold=5.237e+01, percent-clipped=0.0 2024-08-14 20:03:31,472 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 19 from LS+wenet, 27 from Vox, 24 fro AS 2024-08-14 20:03:36,117 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2823620.0, ans=0.0 2024-08-14 20:03:43,184 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 23 from LS+wenet, 23 from Vox, 48 fro AS 2024-08-14 20:04:03,056 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.93 vs. limit=22.5 2024-08-14 20:04:09,299 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 7050, loss[loss=0.1025, beats_loss=0.00889, ecapa_loss=0.0001776, whisper_loss=0.09184, over 20128.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0107, ecapa_loss=0.0001556, whisper_loss=0.09064, over 3872324.50 frames. ], batch size: 80, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 20:04:15,310 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 27 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-14 20:04:20,555 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.30 vs. limit=15.0 2024-08-14 20:04:22,153 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.52 vs. limit=22.5 2024-08-14 20:04:27,461 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2824020.0, ans=0.035 2024-08-14 20:04:27,607 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2824020.0, ans=0.2 2024-08-14 20:04:29,132 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2824020.0, ans=0.125 2024-08-14 20:04:36,846 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2824020.0, ans=0.125 2024-08-14 20:04:39,878 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2824120.0, ans=0.125 2024-08-14 20:04:40,953 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 31 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-14 20:04:47,388 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 29 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-14 20:04:55,452 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 23 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-14 20:05:07,540 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2824220.0, ans=0.125 2024-08-14 20:05:14,225 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 19 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-14 20:05:16,053 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 25 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-14 20:05:24,658 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 7100, loss[loss=0.1015, beats_loss=0.009526, ecapa_loss=0.0001705, whisper_loss=0.09025, over 19539.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01063, ecapa_loss=0.0001548, whisper_loss=0.0913, over 3877588.85 frames. ], batch size: 80, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:05:31,032 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2824420.0, ans=0.125 2024-08-14 20:05:45,927 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 26 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-14 20:05:53,879 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2824620.0, ans=0.0 2024-08-14 20:06:00,631 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.830e+01 2.271e+01 2.594e+01 2.929e+01 4.373e+01, threshold=5.188e+01, percent-clipped=0.0 2024-08-14 20:06:00,881 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 25 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-14 20:06:11,369 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2824720.0, ans=0.0 2024-08-14 20:06:17,001 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 36 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-14 20:06:18,959 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.80 vs. limit=15.0 2024-08-14 20:06:29,376 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2824820.0, ans=0.1 2024-08-14 20:06:32,099 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2824820.0, ans=0.0 2024-08-14 20:06:35,041 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2824820.0, ans=0.95 2024-08-14 20:06:38,643 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 7150, loss[loss=0.08683, beats_loss=0.0108, ecapa_loss=0.0001563, whisper_loss=0.07446, over 21770.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01062, ecapa_loss=0.000154, whisper_loss=0.09197, over 3895955.72 frames. ], batch size: 91, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:06:44,194 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.66 vs. limit=6.0 2024-08-14 20:06:44,297 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn2.whiten.whitening_limit, batch_count=2824920.0, ans=22.5 2024-08-14 20:06:49,588 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 22 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-14 20:07:02,552 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2825020.0, ans=0.1 2024-08-14 20:07:06,120 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.48 vs. limit=22.5 2024-08-14 20:07:13,921 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.12 vs. limit=15.0 2024-08-14 20:07:45,927 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 27 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-14 20:07:47,474 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 24 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-14 20:07:47,850 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2825320.0, ans=0.05 2024-08-14 20:07:53,186 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 7200, loss[loss=0.1151, beats_loss=0.00874, ecapa_loss=0.0001724, whisper_loss=0.1047, over 19520.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01065, ecapa_loss=0.0001549, whisper_loss=0.09108, over 3912416.53 frames. ], batch size: 79, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:08:00,905 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2825420.0, ans=0.0 2024-08-14 20:08:03,607 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2825420.0, ans=0.0 2024-08-14 20:08:27,895 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.392e+01 2.665e+01 3.083e+01 4.439e+01, threshold=5.330e+01, percent-clipped=0.0 2024-08-14 20:08:30,034 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2825620.0, ans=0.125 2024-08-14 20:08:31,180 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2825620.0, ans=0.125 2024-08-14 20:08:32,885 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2825620.0, ans=0.04949747468305833 2024-08-14 20:08:44,030 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.45 vs. limit=15.0 2024-08-14 20:09:01,599 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=6.0 2024-08-14 20:09:06,722 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 7250, loss[loss=0.1037, beats_loss=0.01134, ecapa_loss=0.000153, whisper_loss=0.09087, over 19086.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01075, ecapa_loss=0.0001533, whisper_loss=0.09023, over 3893159.74 frames. ], batch size: 77, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:09:07,274 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=2825920.0, ans=10.0 2024-08-14 20:09:15,429 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2825920.0, ans=0.125 2024-08-14 20:09:18,503 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.45 vs. limit=15.0 2024-08-14 20:09:37,947 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.28 vs. limit=15.0 2024-08-14 20:09:46,496 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 25 from LS+wenet, 22 from Vox, 18 fro AS 2024-08-14 20:09:46,906 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2826120.0, ans=0.0 2024-08-14 20:09:51,212 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 18 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-14 20:10:04,216 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.40 vs. limit=15.0 2024-08-14 20:10:19,746 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 7300, loss[loss=0.07617, beats_loss=0.01239, ecapa_loss=0.0001417, whisper_loss=0.06237, over 14613.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01069, ecapa_loss=0.0001541, whisper_loss=0.09084, over 3889494.74 frames. ], batch size: 60, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:11:08,528 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2826520.0, ans=0.125 2024-08-14 20:11:17,196 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 19 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-14 20:11:22,191 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 24 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-14 20:11:25,518 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2826620.0, ans=0.125 2024-08-14 20:11:26,231 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.805e+01 2.413e+01 2.619e+01 3.021e+01 6.286e+01, threshold=5.238e+01, percent-clipped=1.0 2024-08-14 20:11:27,208 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.86 vs. limit=10.0 2024-08-14 20:11:35,881 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2826720.0, ans=0.0 2024-08-14 20:11:41,605 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2826720.0, ans=0.125 2024-08-14 20:11:59,051 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2826820.0, ans=0.125 2024-08-14 20:12:04,216 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 7350, loss[loss=0.121, beats_loss=0.009467, ecapa_loss=0.000173, whisper_loss=0.1098, over 22922.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01077, ecapa_loss=0.0001532, whisper_loss=0.09039, over 3877162.78 frames. ], batch size: 90, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:12:18,013 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2827020.0, ans=0.0 2024-08-14 20:12:20,955 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=12.84 vs. limit=12.0 2024-08-14 20:12:25,397 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.80 vs. limit=15.0 2024-08-14 20:12:26,961 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.72 vs. limit=15.0 2024-08-14 20:12:29,975 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 23 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-14 20:12:33,210 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2827120.0, ans=0.125 2024-08-14 20:12:39,157 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2827120.0, ans=0.0 2024-08-14 20:12:40,547 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2827120.0, ans=0.125 2024-08-14 20:12:50,899 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 19 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-14 20:13:11,554 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2827320.0, ans=0.1 2024-08-14 20:13:14,100 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 24 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-14 20:13:21,561 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 7400, loss[loss=0.09376, beats_loss=0.009895, ecapa_loss=0.0001864, whisper_loss=0.08201, over 20695.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01077, ecapa_loss=0.0001538, whisper_loss=0.08975, over 3865112.91 frames. ], batch size: 89, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:13:28,387 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.57 vs. limit=10.0 2024-08-14 20:13:29,575 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2827420.0, ans=0.0 2024-08-14 20:13:36,051 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2827520.0, ans=0.2 2024-08-14 20:13:37,129 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 32 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-14 20:13:49,116 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2827520.0, ans=0.0 2024-08-14 20:13:59,726 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.726e+01 2.369e+01 2.692e+01 3.042e+01 1.751e+02, threshold=5.383e+01, percent-clipped=2.0 2024-08-14 20:14:12,751 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.91 vs. limit=15.0 2024-08-14 20:14:15,223 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 18 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-14 20:14:15,578 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2827720.0, ans=0.125 2024-08-14 20:14:32,747 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2827820.0, ans=0.125 2024-08-14 20:14:40,244 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 7450, loss[loss=0.1041, beats_loss=0.01014, ecapa_loss=0.0001836, whisper_loss=0.09208, over 19654.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01081, ecapa_loss=0.0001536, whisper_loss=0.08987, over 3836917.83 frames. ], batch size: 80, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:14:40,462 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 24 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-14 20:14:45,050 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2827920.0, ans=0.1 2024-08-14 20:14:47,571 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 14 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-14 20:14:55,197 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2827920.0, ans=0.025 2024-08-14 20:14:55,273 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2827920.0, ans=0.05 2024-08-14 20:14:56,226 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 34 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-14 20:15:05,015 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2828020.0, ans=0.125 2024-08-14 20:15:33,465 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 32 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-14 20:15:37,682 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2828220.0, ans=0.2 2024-08-14 20:16:15,666 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 7500, loss[loss=0.1052, beats_loss=0.008277, ecapa_loss=0.0001411, whisper_loss=0.0955, over 15995.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01077, ecapa_loss=0.0001547, whisper_loss=0.08965, over 3846873.87 frames. ], batch size: 59, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:16:28,329 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2828420.0, ans=0.1 2024-08-14 20:16:35,676 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.40 vs. limit=22.5 2024-08-14 20:16:52,812 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2828620.0, ans=0.0 2024-08-14 20:17:01,581 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.865e+01 2.331e+01 2.565e+01 2.874e+01 3.652e+01, threshold=5.131e+01, percent-clipped=0.0 2024-08-14 20:17:05,557 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 16 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-14 20:17:51,656 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 7550, loss[loss=0.1077, beats_loss=0.01085, ecapa_loss=0.0001559, whisper_loss=0.09532, over 21665.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01071, ecapa_loss=0.0001545, whisper_loss=0.09036, over 3852035.61 frames. ], batch size: 90, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:17:52,795 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.06 vs. limit=15.0 2024-08-14 20:18:01,744 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.40 vs. limit=22.5 2024-08-14 20:18:18,084 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2829020.0, ans=0.125 2024-08-14 20:18:30,420 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2829120.0, ans=0.125 2024-08-14 20:18:39,544 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 28 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-14 20:18:46,732 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2829220.0, ans=0.015 2024-08-14 20:18:57,411 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-14 20:19:03,284 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 28 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-14 20:19:06,259 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2829320.0, ans=0.125 2024-08-14 20:19:25,944 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 7600, loss[loss=0.09513, beats_loss=0.01087, ecapa_loss=0.0002192, whisper_loss=0.08207, over 21250.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0107, ecapa_loss=0.000155, whisper_loss=0.09017, over 3866338.58 frames. ], batch size: 94, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:19:33,610 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 26 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 20:19:57,558 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2829520.0, ans=0.125 2024-08-14 20:20:04,985 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2829620.0, ans=0.0 2024-08-14 20:20:08,918 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.760e+01 2.345e+01 2.622e+01 3.093e+01 1.598e+02, threshold=5.244e+01, percent-clipped=3.0 2024-08-14 20:20:22,993 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2829720.0, ans=0.0 2024-08-14 20:20:23,910 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 34 from LS+wenet, 12 from Vox, 44 fro AS 2024-08-14 20:20:35,271 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2829820.0, ans=0.125 2024-08-14 20:20:36,945 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2829820.0, ans=0.1 2024-08-14 20:20:43,824 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2829820.0, ans=0.125 2024-08-14 20:20:46,440 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 7650, loss[loss=0.1098, beats_loss=0.01115, ecapa_loss=0.0001279, whisper_loss=0.09738, over 17951.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01061, ecapa_loss=0.0001542, whisper_loss=0.0913, over 3870077.66 frames. ], batch size: 70, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:21:02,245 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2830020.0, ans=0.0 2024-08-14 20:21:03,048 WARNING [optim.py:496] (1/4) Scaling gradients by 0.061782095581293106, model_norm_threshold=52.43657684326172 2024-08-14 20:21:03,219 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.23, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.640e+05, grad_sumsq=1.648e+07, orig_rms_sq=9.952e-03 2024-08-14 20:21:06,404 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 20 from LS+wenet, 36 from Vox, 34 fro AS 2024-08-14 20:21:08,430 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2830020.0, ans=0.125 2024-08-14 20:21:09,525 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 24 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-14 20:21:22,127 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 24 from LS+wenet, 23 from Vox, 47 fro AS 2024-08-14 20:21:31,282 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.85 vs. limit=15.0 2024-08-14 20:21:36,442 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.87 vs. limit=12.0 2024-08-14 20:21:40,188 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2830220.0, ans=0.125 2024-08-14 20:21:44,598 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2830320.0, ans=0.2 2024-08-14 20:21:53,627 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.62 vs. limit=6.0 2024-08-14 20:21:54,574 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 18 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-14 20:21:57,166 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 7700, loss[loss=0.09171, beats_loss=0.012, ecapa_loss=0.0001512, whisper_loss=0.07821, over 22207.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01056, ecapa_loss=0.0001549, whisper_loss=0.09114, over 3877514.30 frames. ], batch size: 91, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:21:58,987 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-08-14 20:22:05,529 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 21 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-14 20:22:07,322 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2830420.0, ans=0.2 2024-08-14 20:22:11,223 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 23 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-14 20:22:13,073 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2830520.0, ans=0.125 2024-08-14 20:22:24,121 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2830620.0, ans=0.5 2024-08-14 20:22:30,739 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.939e+01 2.409e+01 2.589e+01 2.990e+01 8.487e+02, threshold=5.178e+01, percent-clipped=3.0 2024-08-14 20:22:52,915 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2830820.0, ans=0.125 2024-08-14 20:23:02,700 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 37 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-14 20:23:08,410 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 7750, loss[loss=0.108, beats_loss=0.0098, ecapa_loss=0.0001494, whisper_loss=0.09668, over 20309.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01059, ecapa_loss=0.0001542, whisper_loss=0.09014, over 3853124.97 frames. ], batch size: 80, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:23:10,199 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2830920.0, ans=0.125 2024-08-14 20:23:14,247 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 22 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-14 20:23:20,344 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2830920.0, ans=0.125 2024-08-14 20:23:24,597 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.732e+01 2024-08-14 20:23:27,511 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.27 vs. limit=15.0 2024-08-14 20:23:30,328 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2831020.0, ans=0.2 2024-08-14 20:23:33,048 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2831020.0, ans=0.1 2024-08-14 20:23:37,432 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2831120.0, ans=0.125 2024-08-14 20:23:42,996 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2831120.0, ans=0.125 2024-08-14 20:24:02,433 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.66 vs. limit=15.0 2024-08-14 20:24:19,809 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 7800, loss[loss=0.1041, beats_loss=0.01106, ecapa_loss=0.0002012, whisper_loss=0.091, over 20276.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01054, ecapa_loss=0.0001535, whisper_loss=0.09086, over 3868631.70 frames. ], batch size: 88, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:24:40,374 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2831520.0, ans=0.2 2024-08-14 20:24:41,557 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 28 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-14 20:24:54,724 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.968e+01 2.359e+01 2.580e+01 2.928e+01 4.088e+01, threshold=5.160e+01, percent-clipped=0.0 2024-08-14 20:24:55,866 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.58 vs. limit=15.0 2024-08-14 20:24:56,443 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-14 20:25:10,113 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2831720.0, ans=0.125 2024-08-14 20:25:14,133 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2831720.0, ans=0.125 2024-08-14 20:25:15,725 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2831720.0, ans=0.05 2024-08-14 20:25:22,398 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 19 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-14 20:25:22,701 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2831820.0, ans=0.0 2024-08-14 20:25:22,719 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2831820.0, ans=0.125 2024-08-14 20:25:24,073 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2831820.0, ans=0.0 2024-08-14 20:25:31,629 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.37 vs. limit=22.5 2024-08-14 20:25:32,187 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 7850, loss[loss=0.1118, beats_loss=0.007571, ecapa_loss=0.0001695, whisper_loss=0.1026, over 15613.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01057, ecapa_loss=0.0001529, whisper_loss=0.09055, over 3892759.79 frames. ], batch size: 62, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:25:37,299 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2831920.0, ans=0.125 2024-08-14 20:25:38,204 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 27 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-14 20:25:51,552 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2832020.0, ans=0.0 2024-08-14 20:25:54,342 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2832020.0, ans=0.2 2024-08-14 20:26:19,420 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 18 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-14 20:26:24,223 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2832220.0, ans=0.0 2024-08-14 20:26:43,268 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 7900, loss[loss=0.09487, beats_loss=0.01033, ecapa_loss=0.0001833, whisper_loss=0.08271, over 14630.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01058, ecapa_loss=0.000153, whisper_loss=0.09116, over 3891257.58 frames. ], batch size: 60, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:26:51,802 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2024-08-14 20:26:54,141 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 22 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-14 20:26:57,332 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2832520.0, ans=0.125 2024-08-14 20:27:08,416 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 19 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-14 20:27:13,038 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2832620.0, ans=0.125 2024-08-14 20:27:18,402 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.856e+01 2.337e+01 2.582e+01 2.870e+01 4.311e+01, threshold=5.164e+01, percent-clipped=0.0 2024-08-14 20:27:39,475 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2832720.0, ans=0.2 2024-08-14 20:27:48,256 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2832820.0, ans=0.5 2024-08-14 20:27:53,945 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 33 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-14 20:27:56,463 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 7950, loss[loss=0.1124, beats_loss=0.009806, ecapa_loss=0.0001648, whisper_loss=0.1009, over 22271.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01062, ecapa_loss=0.0001536, whisper_loss=0.0913, over 3889566.96 frames. ], batch size: 92, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:27:58,192 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 24 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-14 20:28:00,872 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.93 vs. limit=22.5 2024-08-14 20:28:05,926 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 22 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-14 20:28:12,965 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2833020.0, ans=0.125 2024-08-14 20:28:19,936 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 20:28:36,015 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2833120.0, ans=0.1 2024-08-14 20:28:56,309 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 20 from LS+wenet, 33 from Vox, 36 fro AS 2024-08-14 20:28:59,151 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 32 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-14 20:29:02,753 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.97 vs. limit=12.0 2024-08-14 20:29:06,614 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2833320.0, ans=0.0 2024-08-14 20:29:09,148 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 8000, loss[loss=0.1078, beats_loss=0.009394, ecapa_loss=0.0001487, whisper_loss=0.09688, over 19779.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01061, ecapa_loss=0.0001532, whisper_loss=0.09152, over 3885293.64 frames. ], batch size: 75, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:29:17,283 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 20:29:26,779 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2833520.0, ans=0.125 2024-08-14 20:29:27,912 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-14 20:29:34,665 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.14 vs. limit=22.5 2024-08-14 20:29:35,231 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 18 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-14 20:29:39,769 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.871e-01 2024-08-14 20:29:43,233 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.801e+01 2.316e+01 2.668e+01 3.025e+01 4.748e+01, threshold=5.335e+01, percent-clipped=0.0 2024-08-14 20:29:58,027 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=2833720.0, ans=0.02 2024-08-14 20:30:06,478 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2833820.0, ans=0.125 2024-08-14 20:30:20,488 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 8050, loss[loss=0.09089, beats_loss=0.01059, ecapa_loss=0.0001553, whisper_loss=0.07874, over 22837.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01067, ecapa_loss=0.0001523, whisper_loss=0.09129, over 3923984.44 frames. ], batch size: 92, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:30:48,139 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 23 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-14 20:31:03,208 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.18 vs. limit=15.0 2024-08-14 20:31:04,177 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2834220.0, ans=0.5 2024-08-14 20:31:26,400 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 22 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-14 20:31:31,732 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 8100, loss[loss=0.08867, beats_loss=0.009706, ecapa_loss=0.0001648, whisper_loss=0.07732, over 17477.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01064, ecapa_loss=0.0001535, whisper_loss=0.09147, over 3934998.87 frames. ], batch size: 70, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:31:48,327 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2834520.0, ans=0.0 2024-08-14 20:31:52,126 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-14 20:31:54,064 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.33 vs. limit=15.0 2024-08-14 20:31:55,211 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 22 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-14 20:31:56,901 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2834520.0, ans=0.0 2024-08-14 20:32:04,364 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2834620.0, ans=0.125 2024-08-14 20:32:06,609 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.290e+01 2.522e+01 2.889e+01 4.208e+01, threshold=5.043e+01, percent-clipped=0.0 2024-08-14 20:32:09,983 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2834620.0, ans=0.1 2024-08-14 20:32:22,914 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2834720.0, ans=0.0 2024-08-14 20:32:24,263 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2834720.0, ans=0.5 2024-08-14 20:32:33,335 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=2834820.0, ans=10.0 2024-08-14 20:32:39,768 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.81 vs. limit=15.0 2024-08-14 20:32:41,007 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 20 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-14 20:32:45,017 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 8150, loss[loss=0.0943, beats_loss=0.01247, ecapa_loss=0.0001441, whisper_loss=0.0804, over 20734.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.0106, ecapa_loss=0.0001537, whisper_loss=0.09137, over 3937191.78 frames. ], batch size: 86, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:32:47,089 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2834920.0, ans=0.0 2024-08-14 20:32:58,796 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2835020.0, ans=0.125 2024-08-14 20:33:07,811 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2835020.0, ans=0.125 2024-08-14 20:33:13,934 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.95 vs. limit=15.0 2024-08-14 20:33:18,574 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 14 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-14 20:33:27,253 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2835120.0, ans=0.0 2024-08-14 20:33:31,171 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2835220.0, ans=0.125 2024-08-14 20:33:37,069 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 20:33:44,362 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.32 vs. limit=22.5 2024-08-14 20:33:48,727 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2835320.0, ans=0.0 2024-08-14 20:33:50,266 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2835320.0, ans=0.125 2024-08-14 20:33:56,194 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2835320.0, ans=0.125 2024-08-14 20:33:58,476 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 8200, loss[loss=0.09185, beats_loss=0.01235, ecapa_loss=0.0001533, whisper_loss=0.07797, over 14886.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01065, ecapa_loss=0.0001528, whisper_loss=0.09067, over 3912608.48 frames. ], batch size: 58, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:34:11,457 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 33 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-14 20:34:13,617 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2835520.0, ans=0.0 2024-08-14 20:34:19,043 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 34 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-14 20:34:28,946 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 23 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-14 20:34:32,438 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 23 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-14 20:34:33,573 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.288e+01 2.494e+01 2.883e+01 1.855e+02, threshold=4.988e+01, percent-clipped=1.0 2024-08-14 20:34:40,217 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2835620.0, ans=0.125 2024-08-14 20:34:41,240 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 39 from LS+wenet, 12 from Vox, 43 fro AS 2024-08-14 20:34:53,114 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 17 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-14 20:34:57,590 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2835820.0, ans=0.07 2024-08-14 20:35:10,883 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 8250, loss[loss=0.09931, beats_loss=0.009413, ecapa_loss=0.0001896, whisper_loss=0.088, over 19100.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01061, ecapa_loss=0.000153, whisper_loss=0.09098, over 3922385.82 frames. ], batch size: 83, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:35:11,426 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2835920.0, ans=0.125 2024-08-14 20:35:30,441 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2836020.0, ans=0.0 2024-08-14 20:35:46,687 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2836120.0, ans=0.125 2024-08-14 20:36:06,428 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2836220.0, ans=0.125 2024-08-14 20:36:09,201 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2836320.0, ans=0.1 2024-08-14 20:36:13,612 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2836320.0, ans=0.125 2024-08-14 20:36:23,079 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 8300, loss[loss=0.1117, beats_loss=0.009733, ecapa_loss=0.0001719, whisper_loss=0.1003, over 20713.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01064, ecapa_loss=0.0001524, whisper_loss=0.09078, over 3933485.73 frames. ], batch size: 84, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:36:23,700 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2836420.0, ans=0.125 2024-08-14 20:36:39,757 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 20:36:51,606 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.51 vs. limit=15.0 2024-08-14 20:36:54,037 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2836620.0, ans=0.125 2024-08-14 20:36:55,394 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2836620.0, ans=0.1 2024-08-14 20:36:57,777 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.801e+01 2.392e+01 2.726e+01 3.062e+01 2.103e+02, threshold=5.453e+01, percent-clipped=2.0 2024-08-14 20:37:01,027 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2836620.0, ans=0.0 2024-08-14 20:37:05,701 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2836720.0, ans=0.0 2024-08-14 20:37:18,560 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.97 vs. limit=15.0 2024-08-14 20:37:34,525 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 8350, loss[loss=0.09136, beats_loss=0.01133, ecapa_loss=0.0001409, whisper_loss=0.07863, over 17612.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01065, ecapa_loss=0.0001537, whisper_loss=0.09025, over 3944436.31 frames. ], batch size: 71, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:37:37,672 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 23 from LS+wenet, 19 from Vox, 49 fro AS 2024-08-14 20:38:08,568 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=2837120.0, ans=0.025 2024-08-14 20:38:17,999 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 22 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-14 20:38:19,694 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-14 20:38:25,331 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2837220.0, ans=0.125 2024-08-14 20:38:42,742 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 32 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-14 20:38:43,066 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=7.521e-01 2024-08-14 20:38:45,364 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-14 20:38:46,947 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 8400, loss[loss=0.1095, beats_loss=0.01126, ecapa_loss=0.0001722, whisper_loss=0.09654, over 22103.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01061, ecapa_loss=0.0001544, whisper_loss=0.0915, over 3936989.91 frames. ], batch size: 88, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:38:50,668 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2837420.0, ans=0.2 2024-08-14 20:39:03,512 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 29 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-14 20:39:04,085 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.70 vs. limit=15.0 2024-08-14 20:39:07,978 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 24 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-14 20:39:15,484 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 35 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-14 20:39:22,158 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.308e+01 2.540e+01 2.813e+01 3.907e+01, threshold=5.081e+01, percent-clipped=0.0 2024-08-14 20:39:36,902 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 24 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-14 20:39:45,764 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2837820.0, ans=0.125 2024-08-14 20:39:54,969 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.55 vs. limit=6.0 2024-08-14 20:39:59,897 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 8450, loss[loss=0.09359, beats_loss=0.009173, ecapa_loss=0.0001753, whisper_loss=0.08267, over 15501.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01057, ecapa_loss=0.0001549, whisper_loss=0.09171, over 3938291.84 frames. ], batch size: 60, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:40:03,221 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2837920.0, ans=0.0 2024-08-14 20:40:04,656 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2837920.0, ans=0.0 2024-08-14 20:40:08,885 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 13 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-14 20:40:23,717 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2838020.0, ans=0.1 2024-08-14 20:40:39,892 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 19 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-14 20:40:53,659 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2838220.0, ans=0.0 2024-08-14 20:41:02,426 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=15.02 vs. limit=15.0 2024-08-14 20:41:11,480 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 8500, loss[loss=0.1232, beats_loss=0.01033, ecapa_loss=0.0001442, whisper_loss=0.1114, over 22884.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01065, ecapa_loss=0.0001543, whisper_loss=0.09122, over 3913487.34 frames. ], batch size: 87, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:41:20,542 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 18 from LS+wenet, 19 from Vox, 55 fro AS 2024-08-14 20:41:35,178 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2838520.0, ans=0.0 2024-08-14 20:41:36,872 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2838520.0, ans=0.2 2024-08-14 20:41:39,112 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-14 20:41:40,818 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2838620.0, ans=0.1 2024-08-14 20:41:45,637 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.375e+01 2.644e+01 3.031e+01 3.106e+02, threshold=5.288e+01, percent-clipped=1.0 2024-08-14 20:41:53,169 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2838720.0, ans=0.2 2024-08-14 20:41:58,652 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 30 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 20:42:04,432 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 27 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-14 20:42:08,866 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2838820.0, ans=0.125 2024-08-14 20:42:15,798 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 35 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-14 20:42:17,194 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 21 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-14 20:42:20,299 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2838820.0, ans=0.125 2024-08-14 20:42:22,951 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 8550, loss[loss=0.1183, beats_loss=0.009477, ecapa_loss=0.0001585, whisper_loss=0.1073, over 17430.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01059, ecapa_loss=0.0001532, whisper_loss=0.09176, over 3908977.25 frames. ], batch size: 67, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:42:34,724 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2838920.0, ans=0.0 2024-08-14 20:42:37,302 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 22 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-14 20:42:41,744 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2839020.0, ans=0.0 2024-08-14 20:42:55,276 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.26 vs. limit=15.0 2024-08-14 20:42:59,421 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2839120.0, ans=0.125 2024-08-14 20:43:02,031 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2839120.0, ans=0.125 2024-08-14 20:43:10,712 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-14 20:43:21,294 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.74 vs. limit=10.0 2024-08-14 20:43:24,037 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2839320.0, ans=0.95 2024-08-14 20:43:24,114 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2839320.0, ans=0.125 2024-08-14 20:43:35,492 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 8600, loss[loss=0.1254, beats_loss=0.008751, ecapa_loss=0.0001352, whisper_loss=0.1153, over 24429.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01066, ecapa_loss=0.000152, whisper_loss=0.09135, over 3932107.68 frames. ], batch size: 91, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:44:01,335 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2839520.0, ans=0.015 2024-08-14 20:44:10,915 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.454e+01 2.758e+01 3.025e+01 4.750e+01, threshold=5.517e+01, percent-clipped=0.0 2024-08-14 20:44:13,427 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.63 vs. limit=10.0 2024-08-14 20:44:28,228 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2839720.0, ans=0.0 2024-08-14 20:44:46,983 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2839820.0, ans=0.125 2024-08-14 20:44:49,455 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 8650, loss[loss=0.1012, beats_loss=0.01171, ecapa_loss=0.0001474, whisper_loss=0.08801, over 23501.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01067, ecapa_loss=0.0001533, whisper_loss=0.09081, over 3935847.27 frames. ], batch size: 94, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:45:23,412 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 22 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-14 20:45:33,123 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=2840120.0, ans=15.0 2024-08-14 20:45:38,997 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.891e-02 2024-08-14 20:45:47,566 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2840220.0, ans=0.1 2024-08-14 20:45:50,241 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2840320.0, ans=0.125 2024-08-14 20:45:59,824 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 24 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-14 20:46:05,335 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 8700, loss[loss=0.07815, beats_loss=0.01059, ecapa_loss=0.0001653, whisper_loss=0.06591, over 13906.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0107, ecapa_loss=0.0001535, whisper_loss=0.09062, over 3934564.75 frames. ], batch size: 55, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:46:10,358 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2840420.0, ans=0.0 2024-08-14 20:46:13,169 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2840420.0, ans=0.1 2024-08-14 20:46:30,076 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2840520.0, ans=0.125 2024-08-14 20:46:39,499 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.465e+01 2.655e+01 3.081e+01 6.274e+01, threshold=5.311e+01, percent-clipped=1.0 2024-08-14 20:46:39,967 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2840620.0, ans=0.0 2024-08-14 20:46:41,511 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2840620.0, ans=0.0 2024-08-14 20:46:50,650 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.58 vs. limit=22.5 2024-08-14 20:46:50,780 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.18 vs. limit=10.0 2024-08-14 20:46:54,783 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2840720.0, ans=0.125 2024-08-14 20:47:02,666 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-14 20:47:16,205 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 21 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-14 20:47:17,264 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 8750, loss[loss=0.1023, beats_loss=0.009327, ecapa_loss=0.0001206, whisper_loss=0.0918, over 16927.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0107, ecapa_loss=0.000153, whisper_loss=0.09065, over 3913550.70 frames. ], batch size: 62, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:47:31,789 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2841020.0, ans=0.125 2024-08-14 20:47:52,696 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 20:48:04,658 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2841220.0, ans=0.2 2024-08-14 20:48:18,618 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 22 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-14 20:48:25,236 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.55 vs. limit=15.0 2024-08-14 20:48:29,935 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 8800, loss[loss=0.09227, beats_loss=0.01235, ecapa_loss=0.0001556, whisper_loss=0.07837, over 16926.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01073, ecapa_loss=0.0001528, whisper_loss=0.09088, over 3914491.05 frames. ], batch size: 72, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:48:45,982 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2841520.0, ans=0.0 2024-08-14 20:49:02,140 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.90 vs. limit=15.0 2024-08-14 20:49:04,802 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.60 vs. limit=22.5 2024-08-14 20:49:05,598 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 2.267e+01 2.535e+01 2.766e+01 4.137e+01, threshold=5.070e+01, percent-clipped=0.0 2024-08-14 20:49:43,237 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 8850, loss[loss=0.09174, beats_loss=0.0119, ecapa_loss=0.0001242, whisper_loss=0.07861, over 14956.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01071, ecapa_loss=0.0001523, whisper_loss=0.09076, over 3900339.20 frames. ], batch size: 57, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:49:46,375 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 33 from Vox, 28 fro AS 2024-08-14 20:49:50,617 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 19 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-14 20:49:51,270 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.99 vs. limit=15.0 2024-08-14 20:50:10,847 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2842120.0, ans=0.125 2024-08-14 20:50:30,281 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 25 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-14 20:50:44,143 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.91 vs. limit=6.0 2024-08-14 20:50:46,317 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=2842320.0, ans=0.02 2024-08-14 20:50:54,287 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 8900, loss[loss=0.0995, beats_loss=0.01148, ecapa_loss=0.0001497, whisper_loss=0.08652, over 22457.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01062, ecapa_loss=0.0001529, whisper_loss=0.09112, over 3867101.20 frames. ], batch size: 90, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:51:08,676 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 16 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-14 20:51:29,104 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.955e+01 2.335e+01 2.555e+01 2.826e+01 4.520e+01, threshold=5.110e+01, percent-clipped=0.0 2024-08-14 20:51:41,025 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 28 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-14 20:51:54,436 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2842820.0, ans=0.2 2024-08-14 20:52:06,223 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 8950, loss[loss=0.1035, beats_loss=0.01084, ecapa_loss=0.0001681, whisper_loss=0.09095, over 22124.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01065, ecapa_loss=0.0001518, whisper_loss=0.09047, over 3865468.90 frames. ], batch size: 88, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:52:11,268 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.08 vs. limit=15.0 2024-08-14 20:52:47,578 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2843120.0, ans=0.0 2024-08-14 20:52:53,005 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 23 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-14 20:53:07,532 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 21 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-14 20:53:18,857 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 9000, loss[loss=0.09658, beats_loss=0.009549, ecapa_loss=0.0001728, whisper_loss=0.08531, over 20712.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01063, ecapa_loss=0.0001526, whisper_loss=0.09094, over 3875037.73 frames. ], batch size: 87, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:53:18,858 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-14 20:54:01,045 INFO [train_multi_KD3.py:1149] (1/4) Epoch 20, validation on ASR_libri: loss=0.2527, beats_loss=0, ecapa_loss=0.0005268, whisper_loss=0.2474, over 922467.00 frames. 2024-08-14 20:54:16,910 INFO [train_multi_KD3.py:1149] (1/4) Epoch 20, validation on SV_voxceleb1: loss=0.004208, beats_loss=0, ecapa_loss=0.0004208, whisper_loss=0, over 939242.00 frames. 2024-08-14 20:55:09,883 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.3326, 3.0664, 3.3518, 3.2814], device='cuda:1') 2024-08-14 20:55:14,056 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.7439, 2.2421, 2.5104, 2.3178], device='cuda:1') 2024-08-14 20:56:16,660 INFO [train_multi_KD3.py:1149] (1/4) Epoch 20, validation on AT_audioset: loss=0.0236, beats_loss=0.0236, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 20:56:16,664 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 30639MB 2024-08-14 20:56:21,674 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2843420.0, ans=0.0 2024-08-14 20:56:36,337 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2843520.0, ans=0.125 2024-08-14 20:56:47,866 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 21 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-14 20:56:51,972 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.899e+01 2.250e+01 2.510e+01 2.872e+01 4.631e+01, threshold=5.020e+01, percent-clipped=0.0 2024-08-14 20:56:52,696 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2843620.0, ans=0.025 2024-08-14 20:57:05,062 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 19 from LS+wenet, 18 from Vox, 17 fro AS 2024-08-14 20:57:14,313 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2843820.0, ans=0.0 2024-08-14 20:57:14,367 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2843820.0, ans=0.125 2024-08-14 20:57:29,872 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 9050, loss[loss=0.08382, beats_loss=0.01114, ecapa_loss=0.0001493, whisper_loss=0.07119, over 15176.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01057, ecapa_loss=0.0001521, whisper_loss=0.09129, over 3867088.51 frames. ], batch size: 60, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:57:46,759 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2844020.0, ans=0.0 2024-08-14 20:57:55,058 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2844020.0, ans=0.04949747468305833 2024-08-14 20:57:59,582 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2844120.0, ans=0.125 2024-08-14 20:58:03,850 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 22 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-14 20:58:16,924 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 33 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-14 20:58:20,267 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 23 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-14 20:58:22,150 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2844220.0, ans=0.0 2024-08-14 20:58:27,571 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2844320.0, ans=0.1 2024-08-14 20:58:28,949 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2844320.0, ans=0.0 2024-08-14 20:58:34,674 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 34 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-14 20:58:43,187 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 9100, loss[loss=0.1045, beats_loss=0.01216, ecapa_loss=0.0001347, whisper_loss=0.09104, over 21604.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01063, ecapa_loss=0.0001523, whisper_loss=0.09075, over 3827302.97 frames. ], batch size: 89, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 20:58:43,377 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-14 20:59:03,999 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2844520.0, ans=0.125 2024-08-14 20:59:11,360 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2844620.0, ans=0.0 2024-08-14 20:59:15,190 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-14 20:59:16,857 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2844620.0, ans=0.0 2024-08-14 20:59:17,651 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.952e+01 2.420e+01 2.655e+01 2.997e+01 1.110e+02, threshold=5.311e+01, percent-clipped=1.0 2024-08-14 20:59:25,028 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 17 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-14 20:59:43,927 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 32 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-14 20:59:44,278 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2844820.0, ans=0.125 2024-08-14 20:59:48,824 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2844820.0, ans=0.09899494936611666 2024-08-14 20:59:52,988 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2844820.0, ans=0.125 2024-08-14 20:59:55,164 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 9150, loss[loss=0.1076, beats_loss=0.01127, ecapa_loss=0.0001526, whisper_loss=0.09478, over 18932.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01061, ecapa_loss=0.0001518, whisper_loss=0.09123, over 3842678.76 frames. ], batch size: 74, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 20:59:57,144 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2844920.0, ans=0.0 2024-08-14 21:00:00,261 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.06 vs. limit=15.0 2024-08-14 21:00:17,541 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=2845020.0, ans=0.2 2024-08-14 21:00:21,627 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 19 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-14 21:00:42,230 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.58 vs. limit=15.0 2024-08-14 21:00:58,956 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 23 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-14 21:01:03,216 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 36 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-14 21:01:06,098 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 25 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-14 21:01:07,139 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 9200, loss[loss=0.1012, beats_loss=0.009748, ecapa_loss=0.0001496, whisper_loss=0.08995, over 21407.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01071, ecapa_loss=0.0001514, whisper_loss=0.09076, over 3859272.22 frames. ], batch size: 84, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:01:26,474 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.78 vs. limit=15.0 2024-08-14 21:01:30,174 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 21:01:41,302 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.818e+01 2.415e+01 2.661e+01 2.941e+01 2.596e+02, threshold=5.321e+01, percent-clipped=3.0 2024-08-14 21:01:55,426 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 34 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-14 21:02:10,867 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2845820.0, ans=0.2 2024-08-14 21:02:18,816 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 9250, loss[loss=0.08524, beats_loss=0.01664, ecapa_loss=0.0001711, whisper_loss=0.06689, over 20854.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01071, ecapa_loss=0.0001525, whisper_loss=0.09117, over 3891359.04 frames. ], batch size: 91, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:02:25,416 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 25 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-14 21:02:50,052 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.94 vs. limit=12.0 2024-08-14 21:03:00,732 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2846120.0, ans=0.125 2024-08-14 21:03:00,759 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2846120.0, ans=0.0 2024-08-14 21:03:20,757 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2846320.0, ans=0.2 2024-08-14 21:03:33,445 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 9300, loss[loss=0.1001, beats_loss=0.01279, ecapa_loss=0.0001272, whisper_loss=0.08602, over 22514.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01072, ecapa_loss=0.0001536, whisper_loss=0.09091, over 3873379.55 frames. ], batch size: 91, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:03:33,633 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 18 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-14 21:03:38,603 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 27 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-14 21:03:49,367 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2846520.0, ans=0.1 2024-08-14 21:04:06,712 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2846620.0, ans=0.125 2024-08-14 21:04:08,880 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.791e+01 2.351e+01 2.533e+01 2.913e+01 3.870e+01, threshold=5.065e+01, percent-clipped=0.0 2024-08-14 21:04:11,500 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.07 vs. limit=15.0 2024-08-14 21:04:27,562 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-14 21:04:30,712 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 22 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-14 21:04:48,424 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 9350, loss[loss=0.1037, beats_loss=0.01059, ecapa_loss=0.0001485, whisper_loss=0.09163, over 22840.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01072, ecapa_loss=0.000153, whisper_loss=0.09109, over 3868166.88 frames. ], batch size: 92, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:04:53,286 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 23 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-14 21:04:56,368 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2846920.0, ans=0.1 2024-08-14 21:05:10,782 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 31 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-14 21:05:16,975 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2847120.0, ans=0.0 2024-08-14 21:05:21,319 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 24 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-14 21:05:28,585 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2847120.0, ans=0.125 2024-08-14 21:05:28,697 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2847120.0, ans=0.1 2024-08-14 21:05:37,304 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 23 from LS+wenet, 17 from Vox, 49 fro AS 2024-08-14 21:05:47,851 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2847320.0, ans=0.0 2024-08-14 21:05:59,063 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 17 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-14 21:06:01,695 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 9400, loss[loss=0.08055, beats_loss=0.01381, ecapa_loss=0.0001361, whisper_loss=0.06539, over 18042.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01072, ecapa_loss=0.0001534, whisper_loss=0.09124, over 3873870.25 frames. ], batch size: 74, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:06:11,806 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2847420.0, ans=0.0 2024-08-14 21:06:26,401 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2847520.0, ans=0.0 2024-08-14 21:06:38,196 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.317e+01 2.592e+01 2.927e+01 3.881e+01, threshold=5.184e+01, percent-clipped=0.0 2024-08-14 21:06:40,077 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.430e+01 2024-08-14 21:07:14,729 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.09 vs. limit=15.0 2024-08-14 21:07:15,148 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 9450, loss[loss=0.1288, beats_loss=0.007114, ecapa_loss=0.0001529, whisper_loss=0.1201, over 19404.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01073, ecapa_loss=0.0001533, whisper_loss=0.09072, over 3864766.71 frames. ], batch size: 71, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:07:21,320 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-14 21:07:28,438 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 24 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-14 21:07:41,008 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.66 vs. limit=22.5 2024-08-14 21:07:49,434 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2848120.0, ans=0.2 2024-08-14 21:08:01,434 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-14 21:08:04,535 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2848220.0, ans=0.125 2024-08-14 21:08:16,136 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.17 vs. limit=15.0 2024-08-14 21:08:18,708 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2848320.0, ans=0.125 2024-08-14 21:08:28,287 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 9500, loss[loss=0.1108, beats_loss=0.00952, ecapa_loss=0.000178, whisper_loss=0.09953, over 20064.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01072, ecapa_loss=0.0001536, whisper_loss=0.0902, over 3870965.18 frames. ], batch size: 83, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:08:31,443 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 22 from LS+wenet, 29 from Vox, 22 fro AS 2024-08-14 21:08:58,436 INFO [train_multi_KD3.py:844] (1/4) A total of 96 cuts. 30 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-14 21:09:02,994 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2848620.0, ans=0.125 2024-08-14 21:09:03,849 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.637e+01 2.327e+01 2.619e+01 2.918e+01 1.778e+02, threshold=5.238e+01, percent-clipped=2.0 2024-08-14 21:09:18,728 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2848720.0, ans=0.125 2024-08-14 21:09:27,364 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 22 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-14 21:09:31,224 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.18 vs. limit=15.0 2024-08-14 21:09:41,847 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=13.38 vs. limit=15.0 2024-08-14 21:09:42,073 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 9550, loss[loss=0.09514, beats_loss=0.009979, ecapa_loss=0.0001964, whisper_loss=0.0832, over 15627.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01067, ecapa_loss=0.0001545, whisper_loss=0.09032, over 3883014.71 frames. ], batch size: 64, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:09:44,219 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2848920.0, ans=0.0 2024-08-14 21:09:45,548 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2848920.0, ans=0.125 2024-08-14 21:09:57,887 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.31 vs. limit=22.5 2024-08-14 21:10:00,366 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2849020.0, ans=0.0 2024-08-14 21:10:00,640 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.56 vs. limit=10.0 2024-08-14 21:10:01,667 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2849020.0, ans=0.125 2024-08-14 21:10:21,673 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 21 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-14 21:10:31,959 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2849220.0, ans=0.2 2024-08-14 21:10:36,423 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2849220.0, ans=0.125 2024-08-14 21:10:42,569 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.34 vs. limit=15.0 2024-08-14 21:10:47,584 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2849320.0, ans=0.2 2024-08-14 21:10:53,706 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 9600, loss[loss=0.09138, beats_loss=0.01167, ecapa_loss=0.0001694, whisper_loss=0.07802, over 17995.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01069, ecapa_loss=0.0001549, whisper_loss=0.09026, over 3919195.81 frames. ], batch size: 76, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:11:03,390 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2849420.0, ans=0.125 2024-08-14 21:11:08,362 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2849520.0, ans=0.125 2024-08-14 21:11:09,458 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 21 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-14 21:11:22,788 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2849620.0, ans=0.125 2024-08-14 21:11:27,230 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2849620.0, ans=0.0 2024-08-14 21:11:29,633 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.324e+01 2.593e+01 2.905e+01 4.004e+01, threshold=5.186e+01, percent-clipped=0.0 2024-08-14 21:11:53,778 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2849820.0, ans=0.0 2024-08-14 21:12:00,906 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2849820.0, ans=0.0 2024-08-14 21:12:05,452 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.38 vs. limit=22.5 2024-08-14 21:12:07,668 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 9650, loss[loss=0.09345, beats_loss=0.01144, ecapa_loss=0.0001408, whisper_loss=0.0806, over 16310.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01062, ecapa_loss=0.0001555, whisper_loss=0.09029, over 3880417.36 frames. ], batch size: 66, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:12:09,411 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 23 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-14 21:12:39,744 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2850120.0, ans=0.5 2024-08-14 21:12:45,445 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2850120.0, ans=0.125 2024-08-14 21:13:11,008 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2850320.0, ans=0.2 2024-08-14 21:13:20,529 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 9700, loss[loss=0.09946, beats_loss=0.0112, ecapa_loss=0.0001303, whisper_loss=0.08696, over 15018.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01063, ecapa_loss=0.0001559, whisper_loss=0.09052, over 3835316.98 frames. ], batch size: 58, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:13:34,630 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2850520.0, ans=0.07 2024-08-14 21:13:37,899 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2850520.0, ans=0.125 2024-08-14 21:13:41,042 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.10 vs. limit=15.0 2024-08-14 21:13:56,532 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.346e+01 2.562e+01 2.964e+01 3.831e+01, threshold=5.124e+01, percent-clipped=0.0 2024-08-14 21:14:10,045 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 12 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-14 21:14:34,906 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 9750, loss[loss=0.09458, beats_loss=0.01164, ecapa_loss=0.0001473, whisper_loss=0.08146, over 15541.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01065, ecapa_loss=0.0001552, whisper_loss=0.09045, over 3813496.48 frames. ], batch size: 64, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:14:35,523 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2850920.0, ans=0.125 2024-08-14 21:14:47,649 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.84 vs. limit=15.0 2024-08-14 21:14:55,261 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2851020.0, ans=0.0 2024-08-14 21:15:13,002 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2851120.0, ans=0.0 2024-08-14 21:15:26,375 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2851220.0, ans=0.0 2024-08-14 21:15:27,824 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2851220.0, ans=0.07 2024-08-14 21:15:35,264 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2851320.0, ans=0.1 2024-08-14 21:15:49,388 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 9800, loss[loss=0.08789, beats_loss=0.0122, ecapa_loss=0.0001512, whisper_loss=0.07418, over 18695.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01067, ecapa_loss=0.0001545, whisper_loss=0.09024, over 3803435.50 frames. ], batch size: 76, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:15:58,950 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2851420.0, ans=0.125 2024-08-14 21:16:02,072 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2851420.0, ans=0.1 2024-08-14 21:16:11,250 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2851520.0, ans=0.0 2024-08-14 21:16:16,815 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2851520.0, ans=0.125 2024-08-14 21:16:25,418 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.297e+01 2.616e+01 2.876e+01 8.897e+01, threshold=5.231e+01, percent-clipped=1.0 2024-08-14 21:16:31,786 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 21 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-14 21:16:58,420 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2851820.0, ans=0.1 2024-08-14 21:16:58,485 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2851820.0, ans=0.125 2024-08-14 21:17:03,697 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 9850, loss[loss=0.1259, beats_loss=0.008764, ecapa_loss=0.0001411, whisper_loss=0.1157, over 22593.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01077, ecapa_loss=0.0001546, whisper_loss=0.09021, over 3818233.45 frames. ], batch size: 89, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:17:19,027 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 18 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-14 21:17:20,358 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 15 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-14 21:17:39,285 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 28 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-14 21:17:41,343 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten.whitening_limit, batch_count=2852120.0, ans=22.5 2024-08-14 21:18:18,413 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 9900, loss[loss=0.1068, beats_loss=0.01044, ecapa_loss=0.0001764, whisper_loss=0.09457, over 21044.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01083, ecapa_loss=0.0001536, whisper_loss=0.09019, over 3871519.35 frames. ], batch size: 83, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:18:21,729 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 21 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-14 21:18:45,971 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2852520.0, ans=0.0 2024-08-14 21:18:54,345 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.004e+01 2.391e+01 2.621e+01 2.869e+01 9.364e+01, threshold=5.242e+01, percent-clipped=1.0 2024-08-14 21:19:23,434 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 18 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-14 21:19:31,112 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2852820.0, ans=0.125 2024-08-14 21:19:35,081 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 9950, loss[loss=0.103, beats_loss=0.01123, ecapa_loss=0.0001339, whisper_loss=0.09047, over 18656.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0108, ecapa_loss=0.0001534, whisper_loss=0.09011, over 3871109.24 frames. ], batch size: 74, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:19:35,968 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.78 vs. limit=15.0 2024-08-14 21:19:39,769 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2852920.0, ans=0.1 2024-08-14 21:20:04,904 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.52 vs. limit=15.0 2024-08-14 21:20:25,866 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 21 from LS+wenet, 12 from Vox, 20 fro AS 2024-08-14 21:20:32,700 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2853220.0, ans=0.125 2024-08-14 21:20:51,385 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.07 vs. limit=15.0 2024-08-14 21:20:51,962 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 10000, loss[loss=0.1231, beats_loss=0.009339, ecapa_loss=0.0001853, whisper_loss=0.1119, over 21682.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01078, ecapa_loss=0.0001531, whisper_loss=0.09015, over 3862285.64 frames. ], batch size: 88, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:20:53,988 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2853420.0, ans=0.1 2024-08-14 21:21:06,479 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 24 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-14 21:21:12,455 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 32 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-14 21:21:21,354 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 26 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-14 21:21:26,466 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2853620.0, ans=0.125 2024-08-14 21:21:28,830 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.381e+01 2.626e+01 2.960e+01 1.740e+02, threshold=5.252e+01, percent-clipped=1.0 2024-08-14 21:22:01,736 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2853820.0, ans=0.0 2024-08-14 21:22:08,824 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 10050, loss[loss=0.09031, beats_loss=0.01188, ecapa_loss=0.0001453, whisper_loss=0.07698, over 20128.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01074, ecapa_loss=0.0001536, whisper_loss=0.09041, over 3897047.93 frames. ], batch size: 83, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:22:12,382 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=2853920.0, ans=0.05 2024-08-14 21:22:12,391 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2853920.0, ans=0.125 2024-08-14 21:22:15,559 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.81 vs. limit=15.0 2024-08-14 21:22:35,444 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2854020.0, ans=0.2 2024-08-14 21:22:38,698 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2854020.0, ans=10.0 2024-08-14 21:22:45,604 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2854120.0, ans=0.125 2024-08-14 21:22:56,184 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2854220.0, ans=10.0 2024-08-14 21:23:07,985 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.54 vs. limit=10.0 2024-08-14 21:23:17,608 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2854320.0, ans=0.0 2024-08-14 21:23:26,583 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2854320.0, ans=0.125 2024-08-14 21:23:30,697 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 10100, loss[loss=0.1119, beats_loss=0.01052, ecapa_loss=0.0001376, whisper_loss=0.1, over 22530.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0107, ecapa_loss=0.0001541, whisper_loss=0.09067, over 3900628.13 frames. ], batch size: 89, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:23:35,815 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 20 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-14 21:23:36,522 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.49 vs. limit=15.0 2024-08-14 21:23:46,105 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 30 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-14 21:24:01,007 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2854520.0, ans=0.125 2024-08-14 21:24:10,650 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.362e+01 2.668e+01 2.989e+01 1.433e+02, threshold=5.336e+01, percent-clipped=3.0 2024-08-14 21:24:15,194 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2854620.0, ans=0.0 2024-08-14 21:24:46,611 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2854820.0, ans=0.0 2024-08-14 21:24:46,630 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2854820.0, ans=0.125 2024-08-14 21:24:51,921 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2854920.0, ans=0.125 2024-08-14 21:24:52,684 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 10150, loss[loss=0.118, beats_loss=0.009497, ecapa_loss=0.0001848, whisper_loss=0.1066, over 17670.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01072, ecapa_loss=0.0001545, whisper_loss=0.09086, over 3910947.20 frames. ], batch size: 76, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:25:16,840 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-14 21:25:20,648 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2855020.0, ans=0.1 2024-08-14 21:25:32,193 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2855120.0, ans=0.125 2024-08-14 21:25:43,159 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.51 vs. limit=12.0 2024-08-14 21:25:51,742 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 23 from LS+wenet, 21 from Vox, 13 fro AS 2024-08-14 21:25:56,684 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2855320.0, ans=0.125 2024-08-14 21:26:03,983 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2855320.0, ans=0.0 2024-08-14 21:26:10,703 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 10200, loss[loss=0.1062, beats_loss=0.01066, ecapa_loss=0.0001515, whisper_loss=0.09406, over 18098.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01063, ecapa_loss=0.0001552, whisper_loss=0.09077, over 3874769.99 frames. ], batch size: 72, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:26:12,638 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2855420.0, ans=0.0 2024-08-14 21:26:33,722 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 21:26:45,234 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2855620.0, ans=0.125 2024-08-14 21:26:46,188 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.694e+01 2.377e+01 2.660e+01 3.071e+01 4.492e+01, threshold=5.321e+01, percent-clipped=0.0 2024-08-14 21:27:00,407 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.86 vs. limit=15.0 2024-08-14 21:27:07,709 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2855720.0, ans=0.0 2024-08-14 21:27:15,721 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 25 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-14 21:27:15,983 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2855820.0, ans=0.2 2024-08-14 21:27:23,799 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 10250, loss[loss=0.1044, beats_loss=0.01031, ecapa_loss=0.0001554, whisper_loss=0.09257, over 21715.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01064, ecapa_loss=0.0001545, whisper_loss=0.09095, over 3893915.95 frames. ], batch size: 87, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:27:44,684 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2856020.0, ans=0.05 2024-08-14 21:27:48,038 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2856020.0, ans=0.0 2024-08-14 21:28:12,194 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2856220.0, ans=0.0 2024-08-14 21:28:23,877 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2856320.0, ans=0.1 2024-08-14 21:28:31,250 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2856320.0, ans=0.1 2024-08-14 21:28:38,197 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 10300, loss[loss=0.1136, beats_loss=0.008937, ecapa_loss=0.0001365, whisper_loss=0.1033, over 14424.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01051, ecapa_loss=0.0001547, whisper_loss=0.09139, over 3866377.99 frames. ], batch size: 54, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:28:51,665 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 15 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-14 21:29:14,010 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.734e+01 2.284e+01 2.585e+01 2.983e+01 4.241e+01, threshold=5.169e+01, percent-clipped=0.0 2024-08-14 21:29:29,529 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 36 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-14 21:29:44,506 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 14 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-14 21:29:53,025 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 10350, loss[loss=0.1225, beats_loss=0.008112, ecapa_loss=0.0001371, whisper_loss=0.113, over 15044.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01054, ecapa_loss=0.0001543, whisper_loss=0.09138, over 3862018.65 frames. ], batch size: 55, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:29:53,377 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 22 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-14 21:30:12,467 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2857020.0, ans=0.0 2024-08-14 21:30:23,602 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2857120.0, ans=0.0 2024-08-14 21:30:36,713 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.33 vs. limit=15.0 2024-08-14 21:30:52,873 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2857320.0, ans=0.125 2024-08-14 21:31:31,629 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 10400, loss[loss=0.09089, beats_loss=0.009458, ecapa_loss=0.0001431, whisper_loss=0.08, over 14090.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01056, ecapa_loss=0.000155, whisper_loss=0.09079, over 3853843.94 frames. ], batch size: 53, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:31:44,866 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 17 from LS+wenet, 25 from Vox, 22 fro AS 2024-08-14 21:31:45,951 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2857420.0, ans=0.125 2024-08-14 21:31:51,881 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 23 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-14 21:31:59,810 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2857520.0, ans=0.125 2024-08-14 21:32:11,485 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 32 from Vox, 33 fro AS 2024-08-14 21:32:14,826 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.904e+01 2.400e+01 2.611e+01 2.963e+01 4.216e+01, threshold=5.223e+01, percent-clipped=0.0 2024-08-14 21:32:20,994 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2857620.0, ans=0.0 2024-08-14 21:32:39,161 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.36 vs. limit=22.5 2024-08-14 21:32:42,205 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=4.233e+00 2024-08-14 21:32:51,647 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 18 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-14 21:32:58,133 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.04 vs. limit=12.0 2024-08-14 21:32:59,984 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 10450, loss[loss=0.1013, beats_loss=0.01161, ecapa_loss=0.000128, whisper_loss=0.08841, over 20663.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01063, ecapa_loss=0.0001539, whisper_loss=0.09019, over 3850255.82 frames. ], batch size: 80, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:33:11,227 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2857920.0, ans=0.0 2024-08-14 21:33:16,269 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2858020.0, ans=0.0 2024-08-14 21:33:19,614 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2858020.0, ans=0.125 2024-08-14 21:33:47,413 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.66 vs. limit=22.5 2024-08-14 21:34:04,214 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 21:34:09,030 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 26 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-14 21:34:24,085 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2858320.0, ans=0.1 2024-08-14 21:34:29,170 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 10500, loss[loss=0.1109, beats_loss=0.009827, ecapa_loss=0.0001764, whisper_loss=0.09928, over 17931.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01062, ecapa_loss=0.0001551, whisper_loss=0.08996, over 3825396.63 frames. ], batch size: 73, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:34:38,324 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-14 21:34:46,284 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 30 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-14 21:34:49,486 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2858520.0, ans=0.125 2024-08-14 21:34:56,467 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 35 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-14 21:35:10,327 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2858620.0, ans=0.1 2024-08-14 21:35:11,016 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.991e+01 2.314e+01 2.587e+01 2.967e+01 4.494e+01, threshold=5.174e+01, percent-clipped=0.0 2024-08-14 21:35:12,921 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 24 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-14 21:35:43,523 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2858820.0, ans=0.2 2024-08-14 21:35:48,693 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2858820.0, ans=0.125 2024-08-14 21:35:56,190 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 10550, loss[loss=0.07955, beats_loss=0.01355, ecapa_loss=0.000144, whisper_loss=0.06455, over 20114.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01072, ecapa_loss=0.0001548, whisper_loss=0.08931, over 3812672.95 frames. ], batch size: 83, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:36:22,028 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2859020.0, ans=0.125 2024-08-14 21:36:29,913 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2859020.0, ans=0.125 2024-08-14 21:36:34,291 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2859120.0, ans=10.0 2024-08-14 21:36:41,124 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-14 21:36:58,129 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2859220.0, ans=0.125 2024-08-14 21:36:59,164 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-14 21:37:10,269 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2859320.0, ans=0.125 2024-08-14 21:37:25,092 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 10600, loss[loss=0.09699, beats_loss=0.01309, ecapa_loss=0.0001514, whisper_loss=0.08238, over 21506.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01079, ecapa_loss=0.0001532, whisper_loss=0.08903, over 3864623.44 frames. ], batch size: 88, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:37:29,075 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2859420.0, ans=0.0 2024-08-14 21:37:38,004 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2859420.0, ans=0.125 2024-08-14 21:38:07,052 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.377e+01 2.615e+01 3.017e+01 5.904e+01, threshold=5.231e+01, percent-clipped=2.0 2024-08-14 21:38:11,475 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2859620.0, ans=0.125 2024-08-14 21:38:29,180 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 25 from LS+wenet, 10 from Vox, 34 fro AS 2024-08-14 21:38:37,532 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 21:38:41,479 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2859820.0, ans=0.1 2024-08-14 21:38:52,469 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 10650, loss[loss=0.09932, beats_loss=0.00956, ecapa_loss=0.0001589, whisper_loss=0.08817, over 14950.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.0107, ecapa_loss=0.000153, whisper_loss=0.08941, over 3822159.28 frames. ], batch size: 58, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:38:59,945 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2859920.0, ans=0.1 2024-08-14 21:39:02,630 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.63 vs. limit=22.5 2024-08-14 21:39:22,239 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 24 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-14 21:39:36,631 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 35 from LS+wenet, 13 from Vox, 42 fro AS 2024-08-14 21:39:37,885 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 26 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-14 21:39:43,280 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=5.035e-03 2024-08-14 21:39:48,171 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-14 21:40:15,016 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2860420.0, ans=0.125 2024-08-14 21:40:15,851 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 10700, loss[loss=0.1015, beats_loss=0.01142, ecapa_loss=0.0001607, whisper_loss=0.08849, over 20234.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01075, ecapa_loss=0.0001515, whisper_loss=0.08966, over 3830854.70 frames. ], batch size: 85, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:40:28,744 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2860420.0, ans=0.125 2024-08-14 21:40:44,838 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 21 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-14 21:40:57,177 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.913e+01 2.378e+01 2.610e+01 2.912e+01 4.621e+02, threshold=5.220e+01, percent-clipped=2.0 2024-08-14 21:41:40,899 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 10750, loss[loss=0.1046, beats_loss=0.01228, ecapa_loss=0.0001458, whisper_loss=0.09089, over 17750.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01073, ecapa_loss=0.0001516, whisper_loss=0.09045, over 3839325.89 frames. ], batch size: 72, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:41:51,780 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2860920.0, ans=0.2 2024-08-14 21:41:54,973 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-14 21:42:11,371 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2861020.0, ans=0.125 2024-08-14 21:42:38,860 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 25 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-14 21:42:43,134 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2861220.0, ans=0.125 2024-08-14 21:43:09,082 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 10800, loss[loss=0.1155, beats_loss=0.01116, ecapa_loss=0.0001172, whisper_loss=0.1031, over 22289.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01071, ecapa_loss=0.0001522, whisper_loss=0.09145, over 3891231.15 frames. ], batch size: 85, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:43:14,125 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 19 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-14 21:43:19,463 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2861420.0, ans=0.125 2024-08-14 21:43:23,867 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 25 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-14 21:43:27,416 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 22 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-14 21:43:49,122 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.901e+01 2.222e+01 2.555e+01 2.864e+01 4.186e+01, threshold=5.109e+01, percent-clipped=0.0 2024-08-14 21:44:23,952 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 22 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-14 21:44:25,489 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2861820.0, ans=0.0 2024-08-14 21:44:34,058 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 10850, loss[loss=0.09298, beats_loss=0.01425, ecapa_loss=0.0001203, whisper_loss=0.07753, over 15635.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01082, ecapa_loss=0.0001519, whisper_loss=0.0909, over 3900021.76 frames. ], batch size: 62, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:44:36,095 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2861920.0, ans=0.1 2024-08-14 21:44:52,288 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-14 21:44:59,232 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 24 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-14 21:45:08,486 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2862120.0, ans=0.125 2024-08-14 21:45:10,123 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2862120.0, ans=0.125 2024-08-14 21:45:13,618 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.82 vs. limit=15.0 2024-08-14 21:45:24,416 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.67 vs. limit=12.0 2024-08-14 21:45:27,128 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 20 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-14 21:45:32,436 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 22 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-14 21:45:59,105 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 10900, loss[loss=0.1171, beats_loss=0.008411, ecapa_loss=0.000162, whisper_loss=0.107, over 21521.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01074, ecapa_loss=0.0001527, whisper_loss=0.09084, over 3894085.28 frames. ], batch size: 84, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:46:00,037 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2862420.0, ans=0.0 2024-08-14 21:46:17,393 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2862520.0, ans=0.125 2024-08-14 21:46:18,742 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2862520.0, ans=0.125 2024-08-14 21:46:19,943 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 17 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-14 21:46:23,660 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2862520.0, ans=0.125 2024-08-14 21:46:32,360 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2862620.0, ans=0.0 2024-08-14 21:46:39,959 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.934e+01 2.360e+01 2.623e+01 2.983e+01 2.734e+02, threshold=5.246e+01, percent-clipped=0.0 2024-08-14 21:46:42,135 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2862620.0, ans=0.0 2024-08-14 21:46:52,552 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 24 from LS+wenet, 8 from Vox, 31 fro AS 2024-08-14 21:46:54,818 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 19 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-14 21:47:04,501 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.56 vs. limit=15.0 2024-08-14 21:47:10,912 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2862820.0, ans=0.125 2024-08-14 21:47:13,158 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-14 21:47:25,505 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 10950, loss[loss=0.1027, beats_loss=0.009094, ecapa_loss=0.000148, whisper_loss=0.09215, over 20919.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01073, ecapa_loss=0.0001531, whisper_loss=0.09093, over 3917742.12 frames. ], batch size: 83, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:47:25,691 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 26 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-14 21:47:26,516 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.09 vs. limit=22.5 2024-08-14 21:47:49,952 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 30 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-14 21:48:01,013 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2863120.0, ans=0.1 2024-08-14 21:48:08,250 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 31 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-14 21:48:13,712 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 29 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-14 21:48:15,879 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2863220.0, ans=0.07 2024-08-14 21:48:36,244 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2863320.0, ans=0.05 2024-08-14 21:48:43,217 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2863320.0, ans=0.125 2024-08-14 21:48:50,383 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 11000, loss[loss=0.09736, beats_loss=0.01028, ecapa_loss=0.0001776, whisper_loss=0.0853, over 21642.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01072, ecapa_loss=0.0001538, whisper_loss=0.09069, over 3934554.67 frames. ], batch size: 88, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:48:52,554 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2863420.0, ans=0.125 2024-08-14 21:48:55,470 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 16 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-14 21:48:56,817 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 24 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-14 21:48:58,761 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 30 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-14 21:49:16,710 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.48 vs. limit=12.0 2024-08-14 21:49:22,667 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 20 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-14 21:49:26,697 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2863620.0, ans=0.1 2024-08-14 21:49:30,871 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.380e+01 2.630e+01 2.844e+01 1.265e+02, threshold=5.261e+01, percent-clipped=2.0 2024-08-14 21:49:40,569 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 33 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-14 21:49:41,343 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.68 vs. limit=22.5 2024-08-14 21:49:47,227 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 21 from LS+wenet, 23 from Vox, 18 fro AS 2024-08-14 21:49:54,062 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2863720.0, ans=0.125 2024-08-14 21:49:57,057 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.12 vs. limit=15.0 2024-08-14 21:50:15,514 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 11050, loss[loss=0.1193, beats_loss=0.008861, ecapa_loss=0.0001832, whisper_loss=0.1086, over 22631.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01073, ecapa_loss=0.0001521, whisper_loss=0.09118, over 3928461.78 frames. ], batch size: 88, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:50:15,977 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2863920.0, ans=0.0 2024-08-14 21:50:22,154 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 14 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-14 21:50:34,144 WARNING [optim.py:496] (1/4) Scaling gradients by 0.04806042090058327, model_norm_threshold=52.6092414855957 2024-08-14 21:50:34,331 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.31, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.750e+05, grad_sumsq=3.750e+05, orig_rms_sq=1.000e+00 2024-08-14 21:50:40,996 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 24 from LS+wenet, 25 from Vox, 46 fro AS 2024-08-14 21:50:51,025 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2864120.0, ans=0.1 2024-08-14 21:51:04,441 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2864220.0, ans=0.1 2024-08-14 21:51:10,126 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.891e+01 2024-08-14 21:51:15,038 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2864220.0, ans=0.2 2024-08-14 21:51:30,291 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2864320.0, ans=0.125 2024-08-14 21:51:39,821 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 11100, loss[loss=0.08292, beats_loss=0.01485, ecapa_loss=0.0001405, whisper_loss=0.06666, over 20814.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01072, ecapa_loss=0.000152, whisper_loss=0.09108, over 3920832.74 frames. ], batch size: 90, lr: 3.10e-03, grad_scale: 1.152921504606847e+18 2024-08-14 21:51:51,848 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.54 vs. limit=10.0 2024-08-14 21:51:59,382 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2864520.0, ans=0.125 2024-08-14 21:52:04,701 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 35 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-14 21:52:19,134 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2864620.0, ans=0.125 2024-08-14 21:52:19,836 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.762e+01 2.327e+01 2.589e+01 2.870e+01 1.095e+03, threshold=5.179e+01, percent-clipped=2.0 2024-08-14 21:52:27,953 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2864620.0, ans=0.1 2024-08-14 21:52:34,248 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 17 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-14 21:52:39,287 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2864720.0, ans=0.07 2024-08-14 21:53:04,746 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 11150, loss[loss=0.1036, beats_loss=0.007292, ecapa_loss=0.0002201, whisper_loss=0.09414, over 19112.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01061, ecapa_loss=0.0001526, whisper_loss=0.09117, over 3912059.83 frames. ], batch size: 81, lr: 3.10e-03, grad_scale: 1.152921504606847e+18 2024-08-14 21:53:18,937 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2864920.0, ans=0.0 2024-08-14 21:53:19,015 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2864920.0, ans=0.0 2024-08-14 21:53:22,364 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2865020.0, ans=0.2 2024-08-14 21:53:25,192 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 19 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-14 21:53:43,801 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2865120.0, ans=0.125 2024-08-14 21:53:57,402 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2865220.0, ans=0.125 2024-08-14 21:54:20,595 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2865320.0, ans=0.125 2024-08-14 21:54:28,305 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 11200, loss[loss=0.09716, beats_loss=0.01099, ecapa_loss=0.0001418, whisper_loss=0.08475, over 23241.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0106, ecapa_loss=0.000153, whisper_loss=0.09094, over 3891111.47 frames. ], batch size: 91, lr: 3.10e-03, grad_scale: 1.152921504606847e+18 2024-08-14 21:54:35,664 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2865420.0, ans=0.125 2024-08-14 21:54:36,934 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2865420.0, ans=0.2 2024-08-14 21:54:39,868 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 37 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-14 21:54:42,102 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2865420.0, ans=0.125 2024-08-14 21:54:59,966 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 29 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-14 21:55:01,582 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 24 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-14 21:55:05,015 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-14 21:55:07,473 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.876e+01 2.313e+01 2.527e+01 2.769e+01 5.053e+01, threshold=5.054e+01, percent-clipped=0.0 2024-08-14 21:55:11,108 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2865620.0, ans=0.0 2024-08-14 21:55:13,022 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2865620.0, ans=0.125 2024-08-14 21:55:18,199 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-14 21:55:23,851 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=2865720.0, ans=0.02 2024-08-14 21:55:30,383 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2865720.0, ans=0.125 2024-08-14 21:55:32,518 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.27 vs. limit=10.0 2024-08-14 21:55:52,374 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 11250, loss[loss=0.1095, beats_loss=0.01189, ecapa_loss=0.0001417, whisper_loss=0.0962, over 17820.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01056, ecapa_loss=0.0001528, whisper_loss=0.09126, over 3886854.87 frames. ], batch size: 72, lr: 3.10e-03, grad_scale: 1.152921504606847e+18 2024-08-14 21:55:52,930 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2865920.0, ans=0.125 2024-08-14 21:55:52,938 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2865920.0, ans=0.2 2024-08-14 21:55:56,564 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 26 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-14 21:56:05,771 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2865920.0, ans=0.125 2024-08-14 21:56:07,099 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2865920.0, ans=0.0 2024-08-14 21:56:26,007 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 31 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-14 21:56:31,653 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.27 vs. limit=10.0 2024-08-14 21:56:32,652 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2866120.0, ans=0.125 2024-08-14 21:57:18,221 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 11300, loss[loss=0.1139, beats_loss=0.009172, ecapa_loss=0.0001785, whisper_loss=0.1029, over 22078.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01057, ecapa_loss=0.0001524, whisper_loss=0.091, over 3901256.41 frames. ], batch size: 88, lr: 3.10e-03, grad_scale: 1.152921504606847e+18 2024-08-14 21:57:21,734 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 22 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-14 21:57:29,364 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2866420.0, ans=0.125 2024-08-14 21:57:34,285 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2866520.0, ans=0.0 2024-08-14 21:57:41,342 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2866520.0, ans=0.1 2024-08-14 21:57:54,786 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2866620.0, ans=0.0 2024-08-14 21:57:54,998 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.01 vs. limit=15.0 2024-08-14 21:57:56,162 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2866620.0, ans=0.125 2024-08-14 21:57:58,629 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.341e+01 2.610e+01 2.928e+01 1.579e+02, threshold=5.221e+01, percent-clipped=2.0 2024-08-14 21:57:59,836 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2866620.0, ans=0.125 2024-08-14 21:58:25,373 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2866820.0, ans=0.125 2024-08-14 21:58:41,872 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 11350, loss[loss=0.09464, beats_loss=0.01384, ecapa_loss=0.0001546, whisper_loss=0.07926, over 18900.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01055, ecapa_loss=0.0001519, whisper_loss=0.09176, over 3919440.50 frames. ], batch size: 77, lr: 3.10e-03, grad_scale: 1.152921504606847e+18 2024-08-14 21:58:42,793 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2866920.0, ans=0.0 2024-08-14 21:58:56,281 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2866920.0, ans=0.125 2024-08-14 21:58:59,125 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 17 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-14 21:59:12,070 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 23 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-14 21:59:18,811 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.244e+01 2024-08-14 21:59:30,031 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2867120.0, ans=0.125 2024-08-14 22:00:08,089 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 11400, loss[loss=0.12, beats_loss=0.009751, ecapa_loss=0.0001406, whisper_loss=0.1089, over 24056.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0106, ecapa_loss=0.0001518, whisper_loss=0.09157, over 3914416.62 frames. ], batch size: 91, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:00:14,983 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2867420.0, ans=0.125 2024-08-14 22:00:29,319 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.30 vs. limit=15.0 2024-08-14 22:00:32,620 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 19 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-14 22:00:49,823 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.006e+01 2.381e+01 2.566e+01 2.831e+01 4.188e+01, threshold=5.133e+01, percent-clipped=0.0 2024-08-14 22:00:57,747 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2867720.0, ans=0.0 2024-08-14 22:01:10,597 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.49 vs. limit=15.0 2024-08-14 22:01:18,004 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 23 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-14 22:01:31,227 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 11450, loss[loss=0.1106, beats_loss=0.01068, ecapa_loss=0.0001576, whisper_loss=0.09834, over 22930.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01067, ecapa_loss=0.0001513, whisper_loss=0.09132, over 3927318.08 frames. ], batch size: 93, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:01:33,094 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 24 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-14 22:01:34,846 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 21 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-14 22:01:44,901 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 17 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-14 22:01:47,330 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.70 vs. limit=12.0 2024-08-14 22:02:25,065 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2868220.0, ans=0.1 2024-08-14 22:02:25,116 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2868220.0, ans=0.0 2024-08-14 22:02:37,372 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2868320.0, ans=0.125 2024-08-14 22:02:46,009 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2868320.0, ans=0.125 2024-08-14 22:02:53,153 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 11500, loss[loss=0.09987, beats_loss=0.009414, ecapa_loss=0.0001761, whisper_loss=0.0887, over 16653.00 frames. ], tot_loss[loss=0.104, beats_loss=0.0106, ecapa_loss=0.0001518, whisper_loss=0.09188, over 3925528.42 frames. ], batch size: 71, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:02:57,414 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2868420.0, ans=0.125 2024-08-14 22:03:00,875 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2868420.0, ans=0.0 2024-08-14 22:03:20,074 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 15 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-14 22:03:25,587 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2868620.0, ans=0.125 2024-08-14 22:03:34,594 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.943e+01 2.455e+01 2.723e+01 3.029e+01 4.016e+01, threshold=5.445e+01, percent-clipped=0.0 2024-08-14 22:03:41,478 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2868620.0, ans=0.2 2024-08-14 22:03:42,463 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 34 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-14 22:03:45,367 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 23 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-14 22:03:52,453 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 37 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-14 22:03:55,833 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2868720.0, ans=0.125 2024-08-14 22:04:01,036 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-14 22:04:05,766 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 25 from LS+wenet, 29 from Vox, 25 fro AS 2024-08-14 22:04:11,374 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2868820.0, ans=0.1 2024-08-14 22:04:18,413 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 11550, loss[loss=0.07504, beats_loss=0.01064, ecapa_loss=0.0001662, whisper_loss=0.06274, over 17369.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01056, ecapa_loss=0.0001527, whisper_loss=0.09224, over 3929002.88 frames. ], batch size: 74, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:04:34,986 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 20 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-14 22:05:13,531 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 25 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-14 22:05:24,523 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2869320.0, ans=0.125 2024-08-14 22:05:33,027 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.56 vs. limit=12.0 2024-08-14 22:05:36,974 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 22 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-14 22:05:39,274 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.53 vs. limit=12.0 2024-08-14 22:05:39,785 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 11600, loss[loss=0.09113, beats_loss=0.01093, ecapa_loss=0.0001496, whisper_loss=0.0787, over 16407.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01058, ecapa_loss=0.0001535, whisper_loss=0.09177, over 3928676.99 frames. ], batch size: 65, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:05:43,498 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-14 22:05:49,057 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2869420.0, ans=0.0 2024-08-14 22:06:08,939 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 18 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-14 22:06:10,683 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 22 from LS+wenet, 34 from Vox, 33 fro AS 2024-08-14 22:06:11,209 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.51 vs. limit=15.0 2024-08-14 22:06:19,926 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.685e+01 2.332e+01 2.648e+01 3.162e+01 2.380e+02, threshold=5.297e+01, percent-clipped=2.0 2024-08-14 22:06:20,551 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 21 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-14 22:06:25,456 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2869620.0, ans=0.0 2024-08-14 22:07:00,492 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 11650, loss[loss=0.08523, beats_loss=0.01205, ecapa_loss=0.0001453, whisper_loss=0.07172, over 21409.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0106, ecapa_loss=0.0001543, whisper_loss=0.09145, over 3947306.93 frames. ], batch size: 83, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:07:11,426 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2869920.0, ans=0.0 2024-08-14 22:07:22,911 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 24 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-14 22:07:25,910 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 24 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-14 22:07:31,832 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 23 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-14 22:07:36,517 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2870120.0, ans=0.125 2024-08-14 22:07:40,603 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2870120.0, ans=0.125 2024-08-14 22:07:57,759 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 25 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-14 22:08:22,619 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2870420.0, ans=0.125 2024-08-14 22:08:23,714 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 11700, loss[loss=0.1029, beats_loss=0.009393, ecapa_loss=0.0001607, whisper_loss=0.0919, over 21906.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0107, ecapa_loss=0.0001534, whisper_loss=0.09108, over 3978333.42 frames. ], batch size: 88, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:09:01,620 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2870620.0, ans=0.0 2024-08-14 22:09:06,458 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.749e+01 2.421e+01 2.717e+01 3.040e+01 4.718e+01, threshold=5.433e+01, percent-clipped=0.0 2024-08-14 22:09:08,490 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2870620.0, ans=0.125 2024-08-14 22:09:24,948 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2870720.0, ans=0.1 2024-08-14 22:09:26,030 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 21 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-14 22:09:35,712 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 29 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-14 22:09:41,809 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2870820.0, ans=0.125 2024-08-14 22:09:44,118 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.78 vs. limit=6.0 2024-08-14 22:09:45,261 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 11750, loss[loss=0.09615, beats_loss=0.01092, ecapa_loss=0.0001577, whisper_loss=0.08365, over 14190.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01072, ecapa_loss=0.0001526, whisper_loss=0.09134, over 3960362.90 frames. ], batch size: 55, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:09:55,306 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 22 from LS+wenet, 15 from Vox, 54 fro AS 2024-08-14 22:09:57,653 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2870920.0, ans=0.0 2024-08-14 22:10:13,353 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 32 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-14 22:10:15,700 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-14 22:10:25,195 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 28 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-14 22:10:31,897 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2871120.0, ans=0.0 2024-08-14 22:10:38,151 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2871120.0, ans=0.125 2024-08-14 22:10:39,906 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2871120.0, ans=0.125 2024-08-14 22:11:11,091 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 22 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-14 22:11:20,569 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 22 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-14 22:11:22,284 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 11800, loss[loss=0.08983, beats_loss=0.009033, ecapa_loss=0.0001721, whisper_loss=0.07907, over 19939.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01079, ecapa_loss=0.0001527, whisper_loss=0.09086, over 3952027.90 frames. ], batch size: 82, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:11:22,495 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 33 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-14 22:11:30,351 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.061e+00 2024-08-14 22:11:50,717 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2871520.0, ans=0.125 2024-08-14 22:12:03,168 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.906e+01 2.341e+01 2.560e+01 2.807e+01 8.705e+01, threshold=5.119e+01, percent-clipped=2.0 2024-08-14 22:12:13,085 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2871720.0, ans=0.2 2024-08-14 22:12:16,016 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2871720.0, ans=0.2 2024-08-14 22:12:19,410 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 33 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-14 22:12:23,726 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 21 from LS+wenet, 20 from Vox, 17 fro AS 2024-08-14 22:12:23,901 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2871720.0, ans=0.125 2024-08-14 22:12:26,830 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=12.08 vs. limit=12.0 2024-08-14 22:12:27,889 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-14 22:12:55,774 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 11850, loss[loss=0.1105, beats_loss=0.0095, ecapa_loss=0.0001351, whisper_loss=0.09963, over 15383.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01082, ecapa_loss=0.0001536, whisper_loss=0.09081, over 3936247.58 frames. ], batch size: 57, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:13:14,790 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2871920.0, ans=0.2 2024-08-14 22:13:22,453 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2872020.0, ans=0.0 2024-08-14 22:13:34,613 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.57 vs. limit=15.0 2024-08-14 22:13:51,166 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=2872120.0, ans=10.0 2024-08-14 22:13:55,538 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2872120.0, ans=0.125 2024-08-14 22:13:56,920 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.33 vs. limit=15.0 2024-08-14 22:14:28,031 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2872320.0, ans=0.125 2024-08-14 22:14:35,583 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 18 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-14 22:14:48,232 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 11900, loss[loss=0.1158, beats_loss=0.008974, ecapa_loss=0.0001835, whisper_loss=0.105, over 19041.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01077, ecapa_loss=0.0001541, whisper_loss=0.09225, over 3958988.49 frames. ], batch size: 77, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:14:51,282 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2872420.0, ans=0.1 2024-08-14 22:14:53,578 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2872420.0, ans=0.07 2024-08-14 22:14:53,818 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.33 vs. limit=15.0 2024-08-14 22:15:20,798 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2872520.0, ans=10.0 2024-08-14 22:15:22,601 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2872520.0, ans=0.125 2024-08-14 22:15:22,614 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2872520.0, ans=0.125 2024-08-14 22:15:44,667 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.256e+01 2.473e+01 2.865e+01 1.430e+02, threshold=4.947e+01, percent-clipped=1.0 2024-08-14 22:15:55,133 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2872720.0, ans=0.125 2024-08-14 22:16:01,898 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2872720.0, ans=0.125 2024-08-14 22:16:17,093 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 20 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-14 22:16:18,051 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.68 vs. limit=15.0 2024-08-14 22:16:37,238 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 28 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-14 22:16:40,744 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 11950, loss[loss=0.1096, beats_loss=0.01091, ecapa_loss=0.0001012, whisper_loss=0.09772, over 23503.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01064, ecapa_loss=0.0001548, whisper_loss=0.09211, over 3921791.04 frames. ], batch size: 85, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:17:00,365 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2873020.0, ans=0.125 2024-08-14 22:17:12,425 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.65 vs. limit=22.5 2024-08-14 22:17:20,896 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2873120.0, ans=0.1 2024-08-14 22:17:25,528 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.43 vs. limit=12.0 2024-08-14 22:17:40,381 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2873120.0, ans=0.09899494936611666 2024-08-14 22:17:49,481 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 31 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-14 22:17:59,622 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2873220.0, ans=0.125 2024-08-14 22:18:07,689 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2873320.0, ans=0.2 2024-08-14 22:18:21,957 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 12000, loss[loss=0.1229, beats_loss=0.01193, ecapa_loss=0.0001296, whisper_loss=0.1097, over 15145.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01063, ecapa_loss=0.0001535, whisper_loss=0.092, over 3876941.54 frames. ], batch size: 59, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:18:21,958 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-14 22:19:04,404 INFO [train_multi_KD3.py:1149] (1/4) Epoch 20, validation on ASR_libri: loss=0.2521, beats_loss=0, ecapa_loss=0.0005404, whisper_loss=0.2466, over 922467.00 frames. 2024-08-14 22:19:20,794 INFO [train_multi_KD3.py:1149] (1/4) Epoch 20, validation on SV_voxceleb1: loss=0.004324, beats_loss=0, ecapa_loss=0.0004324, whisper_loss=0, over 939242.00 frames. 2024-08-14 22:21:26,118 INFO [train_multi_KD3.py:1149] (1/4) Epoch 20, validation on AT_audioset: loss=0.02348, beats_loss=0.02348, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 22:21:26,122 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 30639MB 2024-08-14 22:21:35,705 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2873420.0, ans=0.125 2024-08-14 22:21:46,578 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2873520.0, ans=0.125 2024-08-14 22:21:51,987 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 34 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-14 22:21:52,734 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.68 vs. limit=15.0 2024-08-14 22:21:55,419 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2873620.0, ans=0.125 2024-08-14 22:21:59,252 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 28 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 22:22:03,635 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.872e+01 2.280e+01 2.500e+01 2.714e+01 9.772e+01, threshold=5.000e+01, percent-clipped=1.0 2024-08-14 22:22:11,746 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2873720.0, ans=0.0 2024-08-14 22:22:33,183 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.33 vs. limit=6.0 2024-08-14 22:22:41,327 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 12050, loss[loss=0.1001, beats_loss=0.009354, ecapa_loss=0.0001469, whisper_loss=0.08925, over 17935.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01062, ecapa_loss=0.0001536, whisper_loss=0.09247, over 3874094.17 frames. ], batch size: 70, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:22:41,609 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 26 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-14 22:22:52,068 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 28 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-14 22:22:53,631 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 17 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-14 22:23:02,831 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2874020.0, ans=0.125 2024-08-14 22:23:12,510 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 32 from Vox, 31 fro AS 2024-08-14 22:23:16,782 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 26 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-14 22:23:43,339 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.98 vs. limit=22.5 2024-08-14 22:23:44,805 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2874320.0, ans=0.125 2024-08-14 22:23:56,925 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 12100, loss[loss=0.0829, beats_loss=0.01216, ecapa_loss=0.0001833, whisper_loss=0.06891, over 13760.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01065, ecapa_loss=0.0001537, whisper_loss=0.0919, over 3895547.16 frames. ], batch size: 60, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:24:33,183 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2874620.0, ans=0.125 2024-08-14 22:24:35,831 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.787e+01 2.418e+01 2.574e+01 2.910e+01 4.724e+01, threshold=5.149e+01, percent-clipped=0.0 2024-08-14 22:24:40,849 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2874620.0, ans=0.0 2024-08-14 22:25:13,581 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 12150, loss[loss=0.1216, beats_loss=0.008567, ecapa_loss=0.0002044, whisper_loss=0.111, over 21677.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01066, ecapa_loss=0.0001541, whisper_loss=0.09187, over 3910749.21 frames. ], batch size: 88, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:25:17,351 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2874920.0, ans=0.125 2024-08-14 22:25:26,424 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.59 vs. limit=15.0 2024-08-14 22:25:27,451 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2875020.0, ans=0.125 2024-08-14 22:25:32,966 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 27 from LS+wenet, 11 from Vox, 34 fro AS 2024-08-14 22:25:43,435 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2875120.0, ans=0.2 2024-08-14 22:25:45,111 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-14 22:26:00,367 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-14 22:26:00,859 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.29 vs. limit=15.0 2024-08-14 22:26:09,712 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2875220.0, ans=0.0 2024-08-14 22:26:11,481 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2024-08-14 22:26:28,819 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 12200, loss[loss=0.1101, beats_loss=0.009108, ecapa_loss=0.0001529, whisper_loss=0.09946, over 20912.00 frames. ], tot_loss[loss=0.104, beats_loss=0.0106, ecapa_loss=0.0001548, whisper_loss=0.0919, over 3913510.04 frames. ], batch size: 80, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:26:29,874 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2875420.0, ans=0.2 2024-08-14 22:26:32,408 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 28 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-14 22:26:42,326 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 20 from LS+wenet, 31 from Vox, 37 fro AS 2024-08-14 22:26:44,051 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2875520.0, ans=0.125 2024-08-14 22:26:47,276 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 29 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-14 22:26:49,270 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.08 vs. limit=12.0 2024-08-14 22:26:54,823 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 15 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-14 22:27:06,353 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.310e+01 2.621e+01 2.982e+01 1.533e+02, threshold=5.242e+01, percent-clipped=1.0 2024-08-14 22:27:10,662 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2875620.0, ans=0.125 2024-08-14 22:27:17,323 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.55 vs. limit=15.0 2024-08-14 22:27:36,079 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 19 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-14 22:27:41,154 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 35 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-14 22:27:43,112 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2875820.0, ans=0.1 2024-08-14 22:27:45,778 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 12250, loss[loss=0.07169, beats_loss=0.01001, ecapa_loss=0.0001459, whisper_loss=0.06022, over 16970.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01064, ecapa_loss=0.000154, whisper_loss=0.09172, over 3918340.56 frames. ], batch size: 66, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:27:46,278 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2875920.0, ans=0.1 2024-08-14 22:27:57,035 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2875920.0, ans=0.0 2024-08-14 22:28:02,247 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.43 vs. limit=15.0 2024-08-14 22:28:10,793 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 35 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-14 22:28:24,308 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 26 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-14 22:28:42,715 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2876220.0, ans=0.125 2024-08-14 22:28:54,127 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2876320.0, ans=0.0 2024-08-14 22:29:02,579 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 12300, loss[loss=0.09675, beats_loss=0.008768, ecapa_loss=0.0001994, whisper_loss=0.08599, over 16952.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01068, ecapa_loss=0.0001549, whisper_loss=0.09099, over 3901911.22 frames. ], batch size: 73, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:29:07,905 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.44 vs. limit=10.0 2024-08-14 22:29:08,908 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 20 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-14 22:29:10,695 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2876420.0, ans=0.07 2024-08-14 22:29:21,131 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2876520.0, ans=0.0 2024-08-14 22:29:28,267 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2876520.0, ans=0.125 2024-08-14 22:29:38,176 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 21 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-14 22:29:39,437 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.742e+01 2.269e+01 2.570e+01 2.894e+01 3.715e+01, threshold=5.139e+01, percent-clipped=0.0 2024-08-14 22:29:57,035 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2876720.0, ans=0.125 2024-08-14 22:29:57,209 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.70 vs. limit=22.5 2024-08-14 22:29:59,390 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 22 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-14 22:30:02,724 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2876820.0, ans=0.0 2024-08-14 22:30:02,780 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2876820.0, ans=0.125 2024-08-14 22:30:07,346 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2876820.0, ans=0.125 2024-08-14 22:30:13,461 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2876820.0, ans=0.5 2024-08-14 22:30:17,864 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 12350, loss[loss=0.1312, beats_loss=0.009001, ecapa_loss=0.0001616, whisper_loss=0.1206, over 23757.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01061, ecapa_loss=0.0001558, whisper_loss=0.09173, over 3888454.92 frames. ], batch size: 89, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:30:20,427 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.69 vs. limit=15.0 2024-08-14 22:30:24,210 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-14 22:30:39,043 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 16 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-14 22:30:42,344 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-14 22:30:51,567 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.11 vs. limit=22.5 2024-08-14 22:31:09,527 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 20 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-14 22:31:15,216 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.90 vs. limit=15.0 2024-08-14 22:31:22,424 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.81 vs. limit=12.0 2024-08-14 22:31:34,819 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 12400, loss[loss=0.1026, beats_loss=0.01016, ecapa_loss=0.0001447, whisper_loss=0.09094, over 22526.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01065, ecapa_loss=0.0001548, whisper_loss=0.09136, over 3918901.29 frames. ], batch size: 90, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:32:04,646 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2877620.0, ans=0.0 2024-08-14 22:32:13,142 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.352e+01 2.633e+01 2.974e+01 1.809e+02, threshold=5.265e+01, percent-clipped=2.0 2024-08-14 22:32:34,360 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2877820.0, ans=0.125 2024-08-14 22:32:44,875 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2877820.0, ans=0.125 2024-08-14 22:32:49,922 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 12450, loss[loss=0.1061, beats_loss=0.01164, ecapa_loss=0.0001417, whisper_loss=0.09301, over 21695.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01067, ecapa_loss=0.0001538, whisper_loss=0.0914, over 3941849.40 frames. ], batch size: 86, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:32:57,549 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 20 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-14 22:32:57,953 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2877920.0, ans=0.125 2024-08-14 22:33:11,032 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 28 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-14 22:33:32,096 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 34 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-14 22:33:32,319 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2878120.0, ans=0.125 2024-08-14 22:33:36,609 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2878220.0, ans=0.125 2024-08-14 22:33:39,750 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2878220.0, ans=0.125 2024-08-14 22:33:47,403 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2878220.0, ans=0.0 2024-08-14 22:33:50,895 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.45 vs. limit=10.0 2024-08-14 22:34:04,799 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 12500, loss[loss=0.1012, beats_loss=0.009579, ecapa_loss=0.0001513, whisper_loss=0.09011, over 20130.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01058, ecapa_loss=0.0001529, whisper_loss=0.09157, over 3905900.61 frames. ], batch size: 81, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:34:11,416 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 29 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-14 22:34:27,427 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2878520.0, ans=0.015 2024-08-14 22:34:38,221 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2878620.0, ans=0.0 2024-08-14 22:34:43,701 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.731e+01 2.340e+01 2.618e+01 2.965e+01 2.177e+02, threshold=5.235e+01, percent-clipped=3.0 2024-08-14 22:34:54,125 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 15 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-14 22:35:02,141 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2878720.0, ans=0.2 2024-08-14 22:35:09,326 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2878820.0, ans=0.1 2024-08-14 22:35:15,458 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2878820.0, ans=0.125 2024-08-14 22:35:21,063 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 12550, loss[loss=0.114, beats_loss=0.01108, ecapa_loss=0.0001472, whisper_loss=0.1015, over 22969.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.0106, ecapa_loss=0.0001538, whisper_loss=0.09203, over 3922416.54 frames. ], batch size: 93, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:35:27,406 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 37 from LS+wenet, 30 from Vox, 27 fro AS 2024-08-14 22:35:28,054 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=2878920.0, ans=15.0 2024-08-14 22:35:42,968 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.34 vs. limit=15.0 2024-08-14 22:36:12,330 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=2879220.0, ans=0.05 2024-08-14 22:36:15,266 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2879220.0, ans=0.125 2024-08-14 22:36:31,497 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 26 from LS+wenet, 31 from Vox, 27 fro AS 2024-08-14 22:36:35,772 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 12600, loss[loss=0.1089, beats_loss=0.009869, ecapa_loss=0.0001316, whisper_loss=0.09776, over 23137.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01059, ecapa_loss=0.0001547, whisper_loss=0.0918, over 3909892.53 frames. ], batch size: 90, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:36:50,537 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 26 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-14 22:37:09,901 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2879620.0, ans=0.125 2024-08-14 22:37:13,831 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.820e+01 2.404e+01 2.680e+01 3.035e+01 5.751e+01, threshold=5.360e+01, percent-clipped=1.0 2024-08-14 22:37:30,739 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2879720.0, ans=0.125 2024-08-14 22:37:50,836 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2879920.0, ans=0.125 2024-08-14 22:37:50,991 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.11 vs. limit=22.5 2024-08-14 22:37:51,541 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 12650, loss[loss=0.1006, beats_loss=0.01011, ecapa_loss=0.0001756, whisper_loss=0.08869, over 21289.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01066, ecapa_loss=0.0001535, whisper_loss=0.09134, over 3899117.83 frames. ], batch size: 86, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:38:41,478 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-14 22:39:09,572 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 12700, loss[loss=0.1001, beats_loss=0.01139, ecapa_loss=0.0001527, whisper_loss=0.08722, over 22821.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01068, ecapa_loss=0.0001531, whisper_loss=0.09116, over 3880622.13 frames. ], batch size: 92, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:39:23,891 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2880520.0, ans=0.1 2024-08-14 22:39:44,801 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2880620.0, ans=0.125 2024-08-14 22:39:48,474 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.668e+01 2.293e+01 2.553e+01 2.866e+01 4.469e+01, threshold=5.107e+01, percent-clipped=0.0 2024-08-14 22:40:07,159 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 19 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-14 22:40:28,371 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 12750, loss[loss=0.1013, beats_loss=0.0123, ecapa_loss=0.0001741, whisper_loss=0.0873, over 18234.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01081, ecapa_loss=0.0001533, whisper_loss=0.09041, over 3903430.10 frames. ], batch size: 77, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:40:32,107 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2880920.0, ans=0.04949747468305833 2024-08-14 22:40:41,112 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2880920.0, ans=0.125 2024-08-14 22:41:15,785 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.18 vs. limit=15.0 2024-08-14 22:41:23,279 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.17 vs. limit=22.5 2024-08-14 22:41:46,174 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 34 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-14 22:41:47,798 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 12800, loss[loss=0.121, beats_loss=0.01038, ecapa_loss=0.0001552, whisper_loss=0.109, over 23586.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01075, ecapa_loss=0.0001545, whisper_loss=0.09073, over 3923224.75 frames. ], batch size: 93, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:41:56,454 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2881420.0, ans=0.04949747468305833 2024-08-14 22:41:58,518 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.93 vs. limit=10.0 2024-08-14 22:42:17,150 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2881520.0, ans=0.1 2024-08-14 22:42:27,545 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.949e+01 2.348e+01 2.569e+01 2.991e+01 4.323e+01, threshold=5.137e+01, percent-clipped=0.0 2024-08-14 22:42:48,530 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 23 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-14 22:42:50,449 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2881820.0, ans=0.2 2024-08-14 22:43:07,190 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 12850, loss[loss=0.1022, beats_loss=0.007879, ecapa_loss=0.0002055, whisper_loss=0.09222, over 16928.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01083, ecapa_loss=0.0001539, whisper_loss=0.0905, over 3881348.34 frames. ], batch size: 69, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:43:17,092 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2881920.0, ans=0.2 2024-08-14 22:43:26,633 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2882020.0, ans=0.125 2024-08-14 22:43:31,064 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2882020.0, ans=0.125 2024-08-14 22:43:34,100 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 35 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-14 22:43:49,128 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2882120.0, ans=0.125 2024-08-14 22:43:51,896 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2882120.0, ans=0.125 2024-08-14 22:43:53,419 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2882220.0, ans=0.0 2024-08-14 22:43:58,558 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2882220.0, ans=0.0 2024-08-14 22:44:01,513 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.79 vs. limit=15.0 2024-08-14 22:44:06,499 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2882220.0, ans=0.1 2024-08-14 22:44:07,492 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 19 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-14 22:44:11,151 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2882320.0, ans=0.0 2024-08-14 22:44:15,035 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 36 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-14 22:44:20,921 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2882320.0, ans=0.125 2024-08-14 22:44:26,238 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 12900, loss[loss=0.09701, beats_loss=0.009821, ecapa_loss=0.0002082, whisper_loss=0.0851, over 19551.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01076, ecapa_loss=0.0001549, whisper_loss=0.08974, over 3856397.43 frames. ], batch size: 84, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:44:30,376 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.05 vs. limit=15.0 2024-08-14 22:44:38,756 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2882420.0, ans=0.1 2024-08-14 22:44:44,734 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 31 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-14 22:44:53,901 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-14 22:44:57,285 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 23 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-14 22:45:06,655 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.839e+01 2.310e+01 2.541e+01 3.103e+01 4.872e+01, threshold=5.081e+01, percent-clipped=0.0 2024-08-14 22:45:13,007 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.04 vs. limit=15.0 2024-08-14 22:45:31,846 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 21 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-14 22:45:46,679 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 12950, loss[loss=0.1163, beats_loss=0.00902, ecapa_loss=0.0001636, whisper_loss=0.1056, over 20932.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01059, ecapa_loss=0.0001561, whisper_loss=0.0912, over 3865187.57 frames. ], batch size: 81, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:45:50,443 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2882920.0, ans=0.0 2024-08-14 22:45:53,971 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2882920.0, ans=0.1 2024-08-14 22:45:58,534 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.21 vs. limit=10.0 2024-08-14 22:45:58,666 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.44 vs. limit=15.0 2024-08-14 22:46:08,174 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2883020.0, ans=0.07 2024-08-14 22:46:10,969 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2883020.0, ans=0.0 2024-08-14 22:46:11,050 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2883020.0, ans=0.0 2024-08-14 22:47:05,138 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 13000, loss[loss=0.101, beats_loss=0.01107, ecapa_loss=0.0001252, whisper_loss=0.08867, over 17332.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01061, ecapa_loss=0.0001551, whisper_loss=0.09171, over 3887561.75 frames. ], batch size: 66, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:47:27,426 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 23 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-14 22:47:28,009 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.92 vs. limit=22.5 2024-08-14 22:47:44,852 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.656e+01 2.265e+01 2.555e+01 2.965e+01 9.531e+01, threshold=5.110e+01, percent-clipped=1.0 2024-08-14 22:47:54,356 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 33 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-14 22:47:55,522 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 17 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-14 22:48:19,374 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 29 from LS+wenet, 26 from Vox, 21 fro AS 2024-08-14 22:48:19,664 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2883820.0, ans=0.125 2024-08-14 22:48:23,016 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 13050, loss[loss=0.105, beats_loss=0.01139, ecapa_loss=0.0001366, whisper_loss=0.0922, over 23937.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01067, ecapa_loss=0.0001552, whisper_loss=0.09086, over 3862377.19 frames. ], batch size: 94, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:48:28,585 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2883920.0, ans=0.125 2024-08-14 22:48:29,836 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2883920.0, ans=0.0 2024-08-14 22:48:42,836 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 27 from LS+wenet, 24 from Vox, 20 fro AS 2024-08-14 22:48:44,807 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.40 vs. limit=15.0 2024-08-14 22:48:48,487 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 19 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-14 22:49:02,507 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2884120.0, ans=0.125 2024-08-14 22:49:03,522 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-14 22:49:08,364 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=2884220.0, ans=10.0 2024-08-14 22:49:13,993 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 26 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-14 22:49:20,574 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2884220.0, ans=0.1 2024-08-14 22:49:39,043 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 13100, loss[loss=0.1147, beats_loss=0.008943, ecapa_loss=0.0001293, whisper_loss=0.1045, over 16797.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01072, ecapa_loss=0.0001534, whisper_loss=0.09003, over 3846768.28 frames. ], batch size: 63, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:50:01,939 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2884520.0, ans=0.125 2024-08-14 22:50:02,444 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.66 vs. limit=15.0 2024-08-14 22:50:04,914 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2884520.0, ans=0.125 2024-08-14 22:50:06,672 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.48 vs. limit=15.0 2024-08-14 22:50:11,321 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.78 vs. limit=10.0 2024-08-14 22:50:17,220 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.35 vs. limit=22.5 2024-08-14 22:50:17,760 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.252e+01 2.466e+01 2.762e+01 3.730e+01, threshold=4.933e+01, percent-clipped=0.0 2024-08-14 22:50:23,984 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 34 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-14 22:50:48,511 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.75 vs. limit=15.0 2024-08-14 22:50:55,045 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 13150, loss[loss=0.1323, beats_loss=0.009569, ecapa_loss=0.0001383, whisper_loss=0.1214, over 18542.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01059, ecapa_loss=0.0001534, whisper_loss=0.09069, over 3829256.54 frames. ], batch size: 69, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:50:55,533 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2884920.0, ans=0.0 2024-08-14 22:51:05,801 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-14 22:51:06,142 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2884920.0, ans=0.04949747468305833 2024-08-14 22:51:06,191 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2884920.0, ans=0.125 2024-08-14 22:51:11,920 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2885020.0, ans=0.0 2024-08-14 22:51:19,128 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 20 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-14 22:51:25,886 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2885120.0, ans=0.0 2024-08-14 22:51:39,216 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 23 from LS+wenet, 10 from Vox, 30 fro AS 2024-08-14 22:51:44,275 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2885220.0, ans=0.1 2024-08-14 22:51:53,549 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 14 from Vox, 44 fro AS 2024-08-14 22:51:58,444 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2885320.0, ans=0.125 2024-08-14 22:52:07,109 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 16 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-14 22:52:11,434 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 13200, loss[loss=0.09955, beats_loss=0.01217, ecapa_loss=0.0001254, whisper_loss=0.08612, over 23361.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0106, ecapa_loss=0.0001535, whisper_loss=0.09052, over 3829967.82 frames. ], batch size: 90, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:52:21,094 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2885420.0, ans=0.125 2024-08-14 22:52:25,305 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2885520.0, ans=0.0 2024-08-14 22:52:43,811 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2885620.0, ans=0.025 2024-08-14 22:52:46,899 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 30 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-14 22:52:49,388 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.780e+01 2.330e+01 2.568e+01 2.844e+01 4.836e+01, threshold=5.136e+01, percent-clipped=0.0 2024-08-14 22:52:50,098 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2885620.0, ans=0.2 2024-08-14 22:53:00,804 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2885720.0, ans=0.0 2024-08-14 22:53:23,760 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2885820.0, ans=0.125 2024-08-14 22:53:27,865 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 13250, loss[loss=0.09452, beats_loss=0.01159, ecapa_loss=0.0001189, whisper_loss=0.08174, over 23032.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01058, ecapa_loss=0.0001541, whisper_loss=0.09071, over 3802815.58 frames. ], batch size: 89, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:53:31,636 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2885920.0, ans=0.125 2024-08-14 22:53:42,906 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-14 22:53:43,285 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2886020.0, ans=0.2 2024-08-14 22:53:54,154 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.12 vs. limit=15.0 2024-08-14 22:53:55,193 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.343e+01 2024-08-14 22:53:56,601 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 14 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-14 22:53:58,547 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2886120.0, ans=0.125 2024-08-14 22:54:19,296 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2886220.0, ans=0.125 2024-08-14 22:54:36,845 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 35 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-14 22:54:42,212 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 13300, loss[loss=0.1054, beats_loss=0.01141, ecapa_loss=0.0001868, whisper_loss=0.09208, over 14607.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01054, ecapa_loss=0.0001535, whisper_loss=0.09162, over 3852460.45 frames. ], batch size: 61, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:54:50,763 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2886420.0, ans=0.125 2024-08-14 22:54:53,218 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 14 from Vox, 51 fro AS 2024-08-14 22:54:59,948 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2886520.0, ans=0.0 2024-08-14 22:55:00,955 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 25 from LS+wenet, 31 from Vox, 37 fro AS 2024-08-14 22:55:04,147 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2886520.0, ans=0.0 2024-08-14 22:55:08,628 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 24 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-14 22:55:20,774 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.690e+01 2.319e+01 2.603e+01 2.951e+01 5.075e+01, threshold=5.206e+01, percent-clipped=0.0 2024-08-14 22:55:26,556 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.00 vs. limit=22.5 2024-08-14 22:55:30,867 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2886720.0, ans=0.125 2024-08-14 22:55:41,539 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.18 vs. limit=15.0 2024-08-14 22:55:44,279 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2886820.0, ans=0.125 2024-08-14 22:55:54,254 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 19 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-14 22:55:57,444 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2886920.0, ans=0.125 2024-08-14 22:55:58,362 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 13350, loss[loss=0.1205, beats_loss=0.008321, ecapa_loss=0.0001755, whisper_loss=0.1105, over 16856.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01058, ecapa_loss=0.0001531, whisper_loss=0.09184, over 3903235.72 frames. ], batch size: 67, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:56:16,037 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2887020.0, ans=0.125 2024-08-14 22:56:29,082 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 22 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-14 22:56:38,639 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2887120.0, ans=0.2 2024-08-14 22:57:01,926 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 28 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-14 22:57:03,588 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2887320.0, ans=0.04949747468305833 2024-08-14 22:57:07,757 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-14 22:57:13,552 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 13400, loss[loss=0.09321, beats_loss=0.01325, ecapa_loss=0.0001478, whisper_loss=0.07848, over 13869.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01067, ecapa_loss=0.0001525, whisper_loss=0.09112, over 3885176.18 frames. ], batch size: 56, lr: 3.09e-03, grad_scale: 1.152921504606847e+18 2024-08-14 22:57:15,360 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 17 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-14 22:57:19,925 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2887420.0, ans=0.125 2024-08-14 22:57:23,001 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2887420.0, ans=0.125 2024-08-14 22:57:25,994 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2887420.0, ans=0.0 2024-08-14 22:57:27,677 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.89 vs. limit=15.0 2024-08-14 22:57:29,413 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2887520.0, ans=0.125 2024-08-14 22:57:32,385 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2887520.0, ans=0.1 2024-08-14 22:57:50,682 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.597e+01 2.321e+01 2.514e+01 2.715e+01 4.037e+01, threshold=5.027e+01, percent-clipped=0.0 2024-08-14 22:57:58,930 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 22 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-14 22:58:14,640 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 20 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-14 22:58:20,763 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 25 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-14 22:58:21,133 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2887820.0, ans=0.0 2024-08-14 22:58:28,184 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 13450, loss[loss=0.1026, beats_loss=0.01131, ecapa_loss=0.0001693, whisper_loss=0.08964, over 21225.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01066, ecapa_loss=0.000153, whisper_loss=0.09044, over 3874176.94 frames. ], batch size: 87, lr: 3.09e-03, grad_scale: 1.152921504606847e+18 2024-08-14 22:58:38,533 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-14 22:59:09,779 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.72 vs. limit=15.0 2024-08-14 22:59:18,218 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2888220.0, ans=0.125 2024-08-14 22:59:26,270 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2888320.0, ans=0.0 2024-08-14 22:59:35,682 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.94 vs. limit=22.5 2024-08-14 22:59:39,730 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2888420.0, ans=0.125 2024-08-14 22:59:40,572 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 13500, loss[loss=0.07263, beats_loss=0.01143, ecapa_loss=0.0001841, whisper_loss=0.05936, over 14874.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01067, ecapa_loss=0.0001532, whisper_loss=0.09059, over 3880482.95 frames. ], batch size: 64, lr: 3.09e-03, grad_scale: 1.152921504606847e+18 2024-08-14 22:59:44,205 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2888420.0, ans=0.125 2024-08-14 22:59:49,057 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2888420.0, ans=0.125 2024-08-14 23:00:04,319 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2888520.0, ans=0.1 2024-08-14 23:00:15,192 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2888620.0, ans=0.0 2024-08-14 23:00:18,915 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.006e+01 2.433e+01 2.626e+01 2.844e+01 4.134e+01, threshold=5.252e+01, percent-clipped=0.0 2024-08-14 23:00:24,430 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=2888620.0, ans=0.025 2024-08-14 23:00:30,328 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2888720.0, ans=0.125 2024-08-14 23:00:31,873 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2888720.0, ans=0.0 2024-08-14 23:00:32,102 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.91 vs. limit=15.0 2024-08-14 23:00:37,271 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 20 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-14 23:00:55,543 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2888920.0, ans=0.125 2024-08-14 23:00:56,341 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 13550, loss[loss=0.08773, beats_loss=0.007477, ecapa_loss=0.0001785, whisper_loss=0.07847, over 17615.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01065, ecapa_loss=0.0001527, whisper_loss=0.09154, over 3888556.13 frames. ], batch size: 72, lr: 3.09e-03, grad_scale: 1.152921504606847e+18 2024-08-14 23:01:05,975 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=2888920.0, ans=0.2 2024-08-14 23:01:23,722 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2889020.0, ans=0.0 2024-08-14 23:01:25,928 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 15 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-14 23:01:27,714 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2889120.0, ans=0.0 2024-08-14 23:01:35,267 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2889120.0, ans=0.2 2024-08-14 23:01:36,249 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 13 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-14 23:01:39,964 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.60 vs. limit=15.0 2024-08-14 23:01:47,422 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.52 vs. limit=15.0 2024-08-14 23:02:06,487 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 31 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-14 23:02:06,988 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2889320.0, ans=0.125 2024-08-14 23:02:09,309 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 13600, loss[loss=0.08584, beats_loss=0.01352, ecapa_loss=0.0001355, whisper_loss=0.07097, over 22554.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01065, ecapa_loss=0.0001525, whisper_loss=0.09165, over 3895566.82 frames. ], batch size: 93, lr: 3.09e-03, grad_scale: 1.152921504606847e+18 2024-08-14 23:02:11,047 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-14 23:02:23,199 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2889520.0, ans=0.0 2024-08-14 23:02:24,164 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 24 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-14 23:02:28,511 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2889520.0, ans=0.0 2024-08-14 23:02:32,513 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 20 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-14 23:02:32,838 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2889520.0, ans=0.125 2024-08-14 23:02:34,019 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 15 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-14 23:02:45,498 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.864e+01 2.266e+01 2.507e+01 2.843e+01 1.129e+02, threshold=5.014e+01, percent-clipped=1.0 2024-08-14 23:02:54,900 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2889720.0, ans=0.125 2024-08-14 23:02:57,836 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2889720.0, ans=0.0 2024-08-14 23:03:05,621 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=2889720.0, ans=10.0 2024-08-14 23:03:11,528 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.18 vs. limit=22.5 2024-08-14 23:03:16,712 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-14 23:03:18,014 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 17 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-14 23:03:22,481 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 13650, loss[loss=0.1157, beats_loss=0.009902, ecapa_loss=0.0001719, whisper_loss=0.1041, over 19889.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01065, ecapa_loss=0.0001538, whisper_loss=0.09139, over 3875945.41 frames. ], batch size: 80, lr: 3.09e-03, grad_scale: 1.152921504606847e+18 2024-08-14 23:03:24,656 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2889920.0, ans=0.1 2024-08-14 23:03:49,094 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2890020.0, ans=0.125 2024-08-14 23:03:56,401 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=2890120.0, ans=0.95 2024-08-14 23:03:59,457 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.74 vs. limit=12.0 2024-08-14 23:04:00,889 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2890120.0, ans=0.0 2024-08-14 23:04:10,861 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 19 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-14 23:04:11,188 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2890220.0, ans=0.125 2024-08-14 23:04:17,652 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.79 vs. limit=15.0 2024-08-14 23:04:28,016 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2890320.0, ans=0.09899494936611666 2024-08-14 23:04:36,948 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=2890420.0, ans=0.05 2024-08-14 23:04:37,799 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 13700, loss[loss=0.1165, beats_loss=0.008566, ecapa_loss=0.0001704, whisper_loss=0.1062, over 20677.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01065, ecapa_loss=0.0001537, whisper_loss=0.0915, over 3911438.33 frames. ], batch size: 80, lr: 3.08e-03, grad_scale: 1.152921504606847e+18 2024-08-14 23:04:59,918 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.87 vs. limit=6.0 2024-08-14 23:05:07,119 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2890620.0, ans=0.125 2024-08-14 23:05:15,383 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.409e+01 2.626e+01 3.025e+01 7.983e+01, threshold=5.251e+01, percent-clipped=2.0 2024-08-14 23:05:20,542 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2890620.0, ans=0.125 2024-08-14 23:05:52,626 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 13750, loss[loss=0.1097, beats_loss=0.008715, ecapa_loss=0.0001747, whisper_loss=0.09928, over 19261.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01057, ecapa_loss=0.0001548, whisper_loss=0.09163, over 3893718.49 frames. ], batch size: 79, lr: 3.08e-03, grad_scale: 1.152921504606847e+18 2024-08-14 23:05:57,640 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2890920.0, ans=0.2 2024-08-14 23:06:14,398 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 23 from LS+wenet, 11 from Vox, 21 fro AS 2024-08-14 23:07:04,399 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.91 vs. limit=15.0 2024-08-14 23:07:07,958 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 13800, loss[loss=0.1086, beats_loss=0.008256, ecapa_loss=0.0001909, whisper_loss=0.09846, over 13248.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01054, ecapa_loss=0.0001542, whisper_loss=0.09139, over 3861352.67 frames. ], batch size: 55, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:07:35,076 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 16 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-14 23:07:46,919 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.261e+01 2.530e+01 3.041e+01 4.646e+01, threshold=5.059e+01, percent-clipped=0.0 2024-08-14 23:08:05,243 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.37 vs. limit=15.0 2024-08-14 23:08:09,966 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2891820.0, ans=0.0 2024-08-14 23:08:14,437 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 25 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-14 23:08:22,809 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 13850, loss[loss=0.09636, beats_loss=0.01361, ecapa_loss=0.0001367, whisper_loss=0.08138, over 16410.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01059, ecapa_loss=0.000153, whisper_loss=0.09141, over 3866290.01 frames. ], batch size: 69, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:08:29,214 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2891920.0, ans=0.125 2024-08-14 23:08:39,165 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2892020.0, ans=0.1 2024-08-14 23:08:58,034 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 26 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-14 23:09:03,345 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2892120.0, ans=0.07 2024-08-14 23:09:05,072 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2892120.0, ans=0.0 2024-08-14 23:09:15,315 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 23:09:18,543 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 24 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-14 23:09:39,151 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2892420.0, ans=0.125 2024-08-14 23:09:40,140 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 13900, loss[loss=0.1018, beats_loss=0.0128, ecapa_loss=0.0001342, whisper_loss=0.0877, over 23595.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01064, ecapa_loss=0.0001516, whisper_loss=0.0917, over 3889644.63 frames. ], batch size: 91, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:09:41,858 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 22 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-14 23:09:51,325 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 18 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-14 23:09:53,001 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2892420.0, ans=0.0 2024-08-14 23:09:54,608 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2892520.0, ans=0.1 2024-08-14 23:10:05,243 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2892520.0, ans=0.125 2024-08-14 23:10:07,861 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-14 23:10:17,699 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2892620.0, ans=0.125 2024-08-14 23:10:20,034 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.891e+01 2.401e+01 2.644e+01 3.017e+01 4.177e+01, threshold=5.288e+01, percent-clipped=0.0 2024-08-14 23:10:20,384 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-14 23:10:23,429 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-14 23:10:33,673 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2892720.0, ans=0.0 2024-08-14 23:10:35,378 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2892720.0, ans=0.0 2024-08-14 23:10:56,041 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 13950, loss[loss=0.1034, beats_loss=0.006938, ecapa_loss=0.0001774, whisper_loss=0.09471, over 18722.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.0106, ecapa_loss=0.0001528, whisper_loss=0.09171, over 3911994.33 frames. ], batch size: 75, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:11:19,669 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2893020.0, ans=0.125 2024-08-14 23:11:21,320 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.55 vs. limit=15.0 2024-08-14 23:11:25,529 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2893120.0, ans=0.1 2024-08-14 23:11:58,251 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 22 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-14 23:12:03,038 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 20 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-14 23:12:11,018 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 14000, loss[loss=0.08245, beats_loss=0.01174, ecapa_loss=0.0001863, whisper_loss=0.06885, over 21933.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01062, ecapa_loss=0.0001522, whisper_loss=0.09184, over 3945142.78 frames. ], batch size: 94, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:12:38,343 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.61 vs. limit=15.0 2024-08-14 23:12:43,972 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2893620.0, ans=0.1 2024-08-14 23:12:48,479 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2893620.0, ans=0.0 2024-08-14 23:12:50,916 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.781e+01 2.312e+01 2.545e+01 2.868e+01 4.909e+01, threshold=5.090e+01, percent-clipped=0.0 2024-08-14 23:12:53,382 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.99 vs. limit=15.0 2024-08-14 23:12:58,659 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 15 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-14 23:13:08,535 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2893720.0, ans=0.0 2024-08-14 23:13:19,383 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2893820.0, ans=0.0 2024-08-14 23:13:22,007 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 17 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-14 23:13:27,431 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 14050, loss[loss=0.1211, beats_loss=0.01121, ecapa_loss=0.000156, whisper_loss=0.1083, over 22167.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01061, ecapa_loss=0.0001527, whisper_loss=0.0915, over 3914624.25 frames. ], batch size: 87, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:13:35,718 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.57 vs. limit=15.0 2024-08-14 23:13:36,471 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 18 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-14 23:13:41,590 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2894020.0, ans=0.125 2024-08-14 23:14:00,850 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2894120.0, ans=0.2 2024-08-14 23:14:05,277 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.16 vs. limit=15.0 2024-08-14 23:14:08,509 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2894120.0, ans=0.125 2024-08-14 23:14:11,870 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.53 vs. limit=10.0 2024-08-14 23:14:14,548 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2894220.0, ans=0.1 2024-08-14 23:14:26,530 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.46 vs. limit=15.0 2024-08-14 23:14:27,944 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2894320.0, ans=0.1 2024-08-14 23:14:33,585 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 13 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-14 23:14:33,938 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2894320.0, ans=0.125 2024-08-14 23:14:39,259 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 21 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-14 23:14:42,156 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 14100, loss[loss=0.1225, beats_loss=0.009789, ecapa_loss=0.0001847, whisper_loss=0.1109, over 22815.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01064, ecapa_loss=0.0001528, whisper_loss=0.09203, over 3906249.31 frames. ], batch size: 91, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:14:44,376 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2894420.0, ans=0.0 2024-08-14 23:14:46,085 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2894420.0, ans=0.0 2024-08-14 23:14:54,461 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2894420.0, ans=0.125 2024-08-14 23:14:58,917 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 27 from LS+wenet, 13 from Vox, 46 fro AS 2024-08-14 23:15:15,255 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.57 vs. limit=22.5 2024-08-14 23:15:20,714 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2894620.0, ans=0.125 2024-08-14 23:15:21,473 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.675e+01 2.405e+01 2.712e+01 3.125e+01 2.483e+02, threshold=5.424e+01, percent-clipped=1.0 2024-08-14 23:15:29,477 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2894720.0, ans=0.125 2024-08-14 23:15:32,452 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2894720.0, ans=0.0 2024-08-14 23:15:54,950 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2894820.0, ans=0.125 2024-08-14 23:15:56,302 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 26 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-14 23:15:57,333 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 14150, loss[loss=0.1066, beats_loss=0.01208, ecapa_loss=0.0001091, whisper_loss=0.09347, over 21641.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0107, ecapa_loss=0.0001523, whisper_loss=0.09105, over 3865806.58 frames. ], batch size: 82, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:16:35,692 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2895120.0, ans=0.0 2024-08-14 23:16:40,286 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2895120.0, ans=0.125 2024-08-14 23:16:52,081 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 24 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-14 23:16:52,844 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.30 vs. limit=22.5 2024-08-14 23:17:12,617 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 14200, loss[loss=0.1128, beats_loss=0.01048, ecapa_loss=0.0001482, whisper_loss=0.1008, over 18098.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01066, ecapa_loss=0.0001524, whisper_loss=0.09124, over 3857866.59 frames. ], batch size: 71, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:17:15,324 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.80 vs. limit=6.0 2024-08-14 23:17:18,133 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.42 vs. limit=10.0 2024-08-14 23:17:30,097 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2895520.0, ans=0.125 2024-08-14 23:17:47,158 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2895620.0, ans=0.125 2024-08-14 23:17:52,653 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.967e+01 2.327e+01 2.651e+01 3.057e+01 2.487e+02, threshold=5.302e+01, percent-clipped=2.0 2024-08-14 23:17:53,285 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2895620.0, ans=10.0 2024-08-14 23:18:03,850 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=14.13 vs. limit=15.0 2024-08-14 23:18:08,045 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 23 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-14 23:18:10,032 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2024-08-14 23:18:21,133 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.45 vs. limit=15.0 2024-08-14 23:18:23,352 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 24 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-14 23:18:28,486 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 14250, loss[loss=0.08469, beats_loss=0.01166, ecapa_loss=0.0001764, whisper_loss=0.07127, over 14496.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01061, ecapa_loss=0.0001515, whisper_loss=0.09139, over 3880215.70 frames. ], batch size: 64, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:18:29,750 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2895920.0, ans=0.125 2024-08-14 23:18:32,642 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 20 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-14 23:18:52,270 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 20 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-14 23:18:58,858 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2896120.0, ans=0.125 2024-08-14 23:19:06,435 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-14 23:19:07,818 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 17 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-14 23:19:09,748 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2896120.0, ans=0.125 2024-08-14 23:19:18,931 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2896220.0, ans=0.2 2024-08-14 23:19:19,390 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.54 vs. limit=6.0 2024-08-14 23:19:20,164 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 21 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-14 23:19:25,344 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.16 vs. limit=15.0 2024-08-14 23:19:26,555 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2896220.0, ans=0.0 2024-08-14 23:19:28,260 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.49 vs. limit=22.5 2024-08-14 23:19:35,661 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.30 vs. limit=15.0 2024-08-14 23:19:44,387 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2896420.0, ans=0.125 2024-08-14 23:19:45,085 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 14300, loss[loss=0.08952, beats_loss=0.01423, ecapa_loss=0.0001129, whisper_loss=0.07416, over 20636.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01064, ecapa_loss=0.0001521, whisper_loss=0.09047, over 3872446.29 frames. ], batch size: 84, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:19:57,841 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2896420.0, ans=0.125 2024-08-14 23:20:00,671 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2896520.0, ans=0.1 2024-08-14 23:20:03,522 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2896520.0, ans=10.0 2024-08-14 23:20:04,874 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2896520.0, ans=0.0 2024-08-14 23:20:21,333 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2896620.0, ans=0.125 2024-08-14 23:20:23,664 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.848e+01 2.328e+01 2.532e+01 2.890e+01 9.959e+01, threshold=5.063e+01, percent-clipped=3.0 2024-08-14 23:20:25,960 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2896620.0, ans=0.1 2024-08-14 23:20:28,657 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-14 23:20:33,266 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2896720.0, ans=0.07 2024-08-14 23:20:45,104 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.02 vs. limit=15.0 2024-08-14 23:20:49,355 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2896820.0, ans=0.07 2024-08-14 23:20:49,391 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2896820.0, ans=0.125 2024-08-14 23:20:58,743 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 14350, loss[loss=0.08182, beats_loss=0.01356, ecapa_loss=0.0001191, whisper_loss=0.06707, over 21499.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01066, ecapa_loss=0.0001514, whisper_loss=0.09067, over 3884244.79 frames. ], batch size: 88, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:21:01,863 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2896920.0, ans=0.1 2024-08-14 23:21:02,068 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.54 vs. limit=15.0 2024-08-14 23:21:19,995 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2897020.0, ans=0.125 2024-08-14 23:21:34,284 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2897120.0, ans=0.125 2024-08-14 23:22:01,776 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 18 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-14 23:22:11,649 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 14400, loss[loss=0.1137, beats_loss=0.01165, ecapa_loss=0.0001197, whisper_loss=0.1008, over 23369.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01065, ecapa_loss=0.0001516, whisper_loss=0.09082, over 3893700.16 frames. ], batch size: 88, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:22:12,276 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2897420.0, ans=0.0 2024-08-14 23:22:26,516 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_na.min_abs, batch_count=2897520.0, ans=0.02 2024-08-14 23:22:27,850 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=2.579e-03 2024-08-14 23:22:32,209 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 28 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-14 23:22:51,284 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.446e+01 2.693e+01 3.071e+01 5.241e+01, threshold=5.387e+01, percent-clipped=1.0 2024-08-14 23:22:55,318 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.98 vs. limit=15.0 2024-08-14 23:23:24,674 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2897820.0, ans=0.2 2024-08-14 23:23:26,809 INFO [train_multi_KD3.py:1116] (1/4) Epoch 20, batch 14450, loss[loss=0.09947, beats_loss=0.01328, ecapa_loss=0.0001325, whisper_loss=0.08487, over 16970.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0108, ecapa_loss=0.0001508, whisper_loss=0.09028, over 3894462.34 frames. ], batch size: 66, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:23:30,321 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2897920.0, ans=0.125 2024-08-14 23:23:41,359 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2898020.0, ans=0.125 2024-08-14 23:23:42,998 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.24 vs. limit=15.0 2024-08-14 23:23:50,686 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 20 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-14 23:23:52,514 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2898020.0, ans=0.125 2024-08-14 23:23:53,656 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-14 23:24:07,835 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 31 from LS+wenet, 18 from Vox, 15 fro AS 2024-08-14 23:24:13,922 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=2898220.0, ans=0.05 2024-08-14 23:24:16,483 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 19 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-14 23:25:02,510 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 0, loss[loss=0.1004, beats_loss=0.008401, ecapa_loss=0.0001627, whisper_loss=0.09039, over 20026.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.008401, ecapa_loss=0.0001627, whisper_loss=0.09039, over 20026.00 frames. ], batch size: 75, lr: 3.00e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:25:02,511 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-14 23:25:46,227 INFO [train_multi_KD3.py:1149] (1/4) Epoch 21, validation on ASR_libri: loss=0.2536, beats_loss=0, ecapa_loss=0.0005489, whisper_loss=0.2481, over 922467.00 frames. 2024-08-14 23:26:02,227 INFO [train_multi_KD3.py:1149] (1/4) Epoch 21, validation on SV_voxceleb1: loss=0.004256, beats_loss=0, ecapa_loss=0.0004256, whisper_loss=0, over 939242.00 frames. 2024-08-14 23:28:02,824 INFO [train_multi_KD3.py:1149] (1/4) Epoch 21, validation on AT_audioset: loss=0.02343, beats_loss=0.02343, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 23:28:02,828 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 30639MB 2024-08-14 23:28:04,488 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2898350.0, ans=0.2 2024-08-14 23:28:04,990 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.76 vs. limit=15.0 2024-08-14 23:28:22,155 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2898350.0, ans=0.125 2024-08-14 23:28:30,726 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 15 from LS+wenet, 9 from Vox, 32 fro AS 2024-08-14 23:29:10,924 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 26 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-14 23:29:18,323 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2898550.0, ans=0.125 2024-08-14 23:29:18,428 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2898550.0, ans=0.2 2024-08-14 23:29:23,171 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 28 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-14 23:29:30,519 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.886e+01 2.477e+01 2.723e+01 3.011e+01 4.734e+01, threshold=5.445e+01, percent-clipped=0.0 2024-08-14 23:29:55,264 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2898750.0, ans=0.125 2024-08-14 23:30:02,510 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2898750.0, ans=0.125 2024-08-14 23:30:06,955 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 17 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-14 23:30:13,503 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 50, loss[loss=0.09342, beats_loss=0.01011, ecapa_loss=0.0001238, whisper_loss=0.08207, over 17551.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.00978, ecapa_loss=0.0001551, whisper_loss=0.09036, over 875017.68 frames. ], batch size: 65, lr: 3.00e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:30:43,719 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2898950.0, ans=0.125 2024-08-14 23:30:49,554 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 24 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-14 23:31:18,425 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2899050.0, ans=0.0 2024-08-14 23:31:32,405 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2899150.0, ans=0.2 2024-08-14 23:32:13,753 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 100, loss[loss=0.1078, beats_loss=0.009757, ecapa_loss=0.0001523, whisper_loss=0.0965, over 20325.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.009679, ecapa_loss=0.0001517, whisper_loss=0.09115, over 1526893.90 frames. ], batch size: 82, lr: 3.00e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:32:56,833 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 25 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-14 23:33:17,939 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 37 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-14 23:33:30,450 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2899650.0, ans=0.04949747468305833 2024-08-14 23:33:31,823 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.849e+01 2.639e+01 2.885e+01 3.247e+01 3.567e+02, threshold=5.770e+01, percent-clipped=1.0 2024-08-14 23:33:38,634 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 23 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-14 23:33:42,734 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2899650.0, ans=0.125 2024-08-14 23:33:54,266 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.61 vs. limit=15.0 2024-08-14 23:33:56,760 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.64 vs. limit=10.0 2024-08-14 23:34:07,767 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 150, loss[loss=0.09165, beats_loss=0.01032, ecapa_loss=0.0001078, whisper_loss=0.08026, over 16695.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.009582, ecapa_loss=0.0001527, whisper_loss=0.09236, over 2024467.17 frames. ], batch size: 63, lr: 3.00e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:34:18,567 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2899850.0, ans=0.0 2024-08-14 23:34:26,893 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 28 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-14 23:34:32,464 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2899950.0, ans=0.0 2024-08-14 23:34:39,076 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.00 vs. limit=15.0 2024-08-14 23:34:47,681 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=2900050.0, ans=0.05 2024-08-14 23:34:48,899 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2900050.0, ans=0.0 2024-08-14 23:35:03,944 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 21 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-14 23:35:08,305 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 16 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-14 23:35:10,346 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2900150.0, ans=0.125 2024-08-14 23:35:15,251 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2900250.0, ans=0.2 2024-08-14 23:35:16,668 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2900250.0, ans=0.0 2024-08-14 23:35:23,726 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2900250.0, ans=0.07 2024-08-14 23:35:26,936 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.20 vs. limit=15.0 2024-08-14 23:35:31,927 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 200, loss[loss=0.09808, beats_loss=0.009803, ecapa_loss=0.00013, whisper_loss=0.08698, over 16004.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.009689, ecapa_loss=0.0001562, whisper_loss=0.09261, over 2419436.17 frames. ], batch size: 60, lr: 3.00e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:36:05,936 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.919e+01 2024-08-14 23:36:12,952 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 15 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-14 23:36:23,829 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.758e+01 2.304e+01 2.562e+01 2.943e+01 6.143e+01, threshold=5.124e+01, percent-clipped=1.0 2024-08-14 23:36:24,154 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2900650.0, ans=0.1 2024-08-14 23:36:35,884 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2900750.0, ans=0.125 2024-08-14 23:36:49,625 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 250, loss[loss=0.1143, beats_loss=0.008397, ecapa_loss=0.0001627, whisper_loss=0.1043, over 17132.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.009851, ecapa_loss=0.0001553, whisper_loss=0.09252, over 2747400.92 frames. ], batch size: 66, lr: 3.00e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:37:03,265 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2900950.0, ans=0.0 2024-08-14 23:37:05,794 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2900950.0, ans=0.0 2024-08-14 23:37:27,317 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.59 vs. limit=22.5 2024-08-14 23:37:46,305 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=2901250.0, ans=10.0 2024-08-14 23:37:52,171 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2901250.0, ans=0.09899494936611666 2024-08-14 23:37:58,255 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.58 vs. limit=6.0 2024-08-14 23:38:01,939 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 300, loss[loss=0.08806, beats_loss=0.01045, ecapa_loss=0.0001571, whisper_loss=0.07604, over 19124.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01004, ecapa_loss=0.0001552, whisper_loss=0.09129, over 2970245.46 frames. ], batch size: 76, lr: 3.00e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:38:18,079 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2901450.0, ans=0.0 2024-08-14 23:38:19,361 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2901450.0, ans=0.125 2024-08-14 23:38:33,865 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 23 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-14 23:38:41,056 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 27 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-14 23:38:43,713 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 18 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-14 23:38:49,267 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.731e+01 2.216e+01 2.512e+01 2.821e+01 4.988e+01, threshold=5.024e+01, percent-clipped=0.0 2024-08-14 23:38:51,219 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2901650.0, ans=0.125 2024-08-14 23:39:01,167 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2901750.0, ans=0.0 2024-08-14 23:39:13,188 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 350, loss[loss=0.08925, beats_loss=0.01159, ecapa_loss=0.0001457, whisper_loss=0.0762, over 21330.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01026, ecapa_loss=0.0001532, whisper_loss=0.09056, over 3174556.97 frames. ], batch size: 89, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:39:20,223 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.11 vs. limit=15.0 2024-08-14 23:39:21,761 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.09 vs. limit=15.0 2024-08-14 23:39:24,810 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 21 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-14 23:39:49,510 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2902050.0, ans=0.0 2024-08-14 23:39:51,486 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.05 vs. limit=22.5 2024-08-14 23:40:03,045 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.54 vs. limit=15.0 2024-08-14 23:40:08,641 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 16 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-14 23:40:09,927 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 25 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-14 23:40:14,420 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 22 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-14 23:40:23,393 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2902250.0, ans=0.07 2024-08-14 23:40:23,615 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.18 vs. limit=15.0 2024-08-14 23:40:27,214 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 400, loss[loss=0.09797, beats_loss=0.01101, ecapa_loss=0.0001642, whisper_loss=0.08532, over 21791.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0103, ecapa_loss=0.0001532, whisper_loss=0.09078, over 3331382.44 frames. ], batch size: 90, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:40:50,266 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 27 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-14 23:41:02,819 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-14 23:41:05,526 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.39 vs. limit=15.0 2024-08-14 23:41:12,754 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2902650.0, ans=0.125 2024-08-14 23:41:12,785 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2902650.0, ans=0.1 2024-08-14 23:41:15,949 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2902650.0, ans=0.125 2024-08-14 23:41:17,708 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2902650.0, ans=0.125 2024-08-14 23:41:18,400 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.395e+01 2.699e+01 3.154e+01 2.910e+02, threshold=5.398e+01, percent-clipped=2.0 2024-08-14 23:41:28,476 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 22 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-14 23:41:30,967 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.75 vs. limit=12.0 2024-08-14 23:41:40,113 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2902750.0, ans=0.125 2024-08-14 23:41:40,236 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.01 vs. limit=15.0 2024-08-14 23:41:42,667 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 27 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-14 23:41:43,365 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2902750.0, ans=0.125 2024-08-14 23:41:44,678 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2902850.0, ans=0.0 2024-08-14 23:41:45,443 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 450, loss[loss=0.1042, beats_loss=0.008325, ecapa_loss=0.0001799, whisper_loss=0.09405, over 15928.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01036, ecapa_loss=0.0001525, whisper_loss=0.09041, over 3471813.12 frames. ], batch size: 61, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:41:47,361 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2902850.0, ans=0.2 2024-08-14 23:41:49,440 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2902850.0, ans=0.0 2024-08-14 23:41:49,536 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=2902850.0, ans=15.0 2024-08-14 23:42:29,517 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2903050.0, ans=0.0 2024-08-14 23:42:30,436 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 26 from LS+wenet, 18 from Vox, 49 fro AS 2024-08-14 23:42:35,701 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.42 vs. limit=15.0 2024-08-14 23:42:36,543 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 27 from LS+wenet, 33 from Vox, 35 fro AS 2024-08-14 23:42:43,428 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2903150.0, ans=0.0 2024-08-14 23:42:45,977 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 20 from LS+wenet, 24 from Vox, 51 fro AS 2024-08-14 23:42:57,533 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2903250.0, ans=0.125 2024-08-14 23:43:00,639 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2903250.0, ans=0.1 2024-08-14 23:43:04,656 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 500, loss[loss=0.1114, beats_loss=0.00972, ecapa_loss=0.0001306, whisper_loss=0.1004, over 19659.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01031, ecapa_loss=0.0001525, whisper_loss=0.09067, over 3560090.74 frames. ], batch size: 75, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:43:10,797 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 25 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-14 23:43:16,602 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2903350.0, ans=0.125 2024-08-14 23:43:18,978 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 18 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-14 23:43:20,076 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2903450.0, ans=0.2 2024-08-14 23:43:34,370 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2903550.0, ans=0.125 2024-08-14 23:43:35,854 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 18 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-14 23:43:36,168 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2903550.0, ans=0.125 2024-08-14 23:43:41,683 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 22 from LS+wenet, 29 from Vox, 40 fro AS 2024-08-14 23:43:55,657 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.939e+01 2.351e+01 2.577e+01 2.922e+01 3.343e+02, threshold=5.154e+01, percent-clipped=2.0 2024-08-14 23:44:03,520 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2903750.0, ans=0.125 2024-08-14 23:44:05,026 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2903750.0, ans=0.0 2024-08-14 23:44:10,882 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2903750.0, ans=0.0 2024-08-14 23:44:18,635 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 550, loss[loss=0.123, beats_loss=0.00945, ecapa_loss=0.0001769, whisper_loss=0.1118, over 22082.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01036, ecapa_loss=0.0001515, whisper_loss=0.09088, over 3652480.64 frames. ], batch size: 89, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:44:19,022 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2903850.0, ans=0.1 2024-08-14 23:44:21,761 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2903850.0, ans=0.125 2024-08-14 23:44:24,756 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2903850.0, ans=0.125 2024-08-14 23:44:29,194 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2903850.0, ans=0.5 2024-08-14 23:44:40,634 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-14 23:45:00,728 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2904150.0, ans=0.0 2024-08-14 23:45:03,226 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2904150.0, ans=0.125 2024-08-14 23:45:05,769 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2904150.0, ans=0.09899494936611666 2024-08-14 23:45:14,842 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.58 vs. limit=10.0 2024-08-14 23:45:20,849 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 35 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-14 23:45:24,563 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 600, loss[loss=0.09202, beats_loss=0.0113, ecapa_loss=0.0001195, whisper_loss=0.07952, over 22836.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01041, ecapa_loss=0.0001506, whisper_loss=0.09152, over 3717119.12 frames. ], batch size: 88, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:45:26,248 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2904350.0, ans=0.125 2024-08-14 23:45:26,310 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2904350.0, ans=0.0 2024-08-14 23:45:40,772 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2024-08-14 23:45:47,133 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2904450.0, ans=0.0 2024-08-14 23:45:57,530 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 34 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-14 23:45:58,192 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.74 vs. limit=22.5 2024-08-14 23:45:59,405 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.67 vs. limit=15.0 2024-08-14 23:46:02,037 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.29 vs. limit=22.5 2024-08-14 23:46:08,849 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.681e+01 2.318e+01 2.599e+01 2.895e+01 9.632e+01, threshold=5.197e+01, percent-clipped=3.0 2024-08-14 23:46:11,877 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 20 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-14 23:46:13,151 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 19 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-14 23:46:14,935 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2904650.0, ans=0.1 2024-08-14 23:46:19,646 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 24 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-14 23:46:23,776 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2904750.0, ans=0.125 2024-08-14 23:46:29,974 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 650, loss[loss=0.08917, beats_loss=0.01429, ecapa_loss=0.0001224, whisper_loss=0.07366, over 18733.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01039, ecapa_loss=0.0001501, whisper_loss=0.09188, over 3741526.77 frames. ], batch size: 74, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:46:34,297 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 24 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-14 23:46:35,700 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2904850.0, ans=0.1 2024-08-14 23:46:39,540 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2904850.0, ans=0.035 2024-08-14 23:46:59,711 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2905050.0, ans=0.0 2024-08-14 23:47:08,431 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2905150.0, ans=0.07 2024-08-14 23:47:34,191 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2905250.0, ans=0.0 2024-08-14 23:47:36,377 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 700, loss[loss=0.113, beats_loss=0.009149, ecapa_loss=0.0001726, whisper_loss=0.1021, over 21814.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01038, ecapa_loss=0.0001519, whisper_loss=0.09181, over 3763063.74 frames. ], batch size: 87, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:47:40,873 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2905350.0, ans=0.125 2024-08-14 23:47:44,473 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2905350.0, ans=0.125 2024-08-14 23:48:01,844 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.13 vs. limit=15.0 2024-08-14 23:48:15,165 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2905650.0, ans=0.125 2024-08-14 23:48:21,180 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.342e+01 2.517e+01 2.889e+01 6.845e+01, threshold=5.033e+01, percent-clipped=2.0 2024-08-14 23:48:27,087 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.99 vs. limit=22.5 2024-08-14 23:48:41,926 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 750, loss[loss=0.07291, beats_loss=0.01211, ecapa_loss=0.0001696, whisper_loss=0.0591, over 18400.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01053, ecapa_loss=0.0001511, whisper_loss=0.09032, over 3755378.67 frames. ], batch size: 78, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:48:42,442 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2905850.0, ans=0.2 2024-08-14 23:48:47,675 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2905850.0, ans=0.125 2024-08-14 23:49:12,814 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.74 vs. limit=15.0 2024-08-14 23:49:18,915 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-14 23:49:27,675 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 29 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-14 23:49:34,388 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=2906250.0, ans=10.0 2024-08-14 23:49:46,468 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=2906350.0, ans=10.0 2024-08-14 23:49:47,265 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 800, loss[loss=0.1175, beats_loss=0.008627, ecapa_loss=0.000162, whisper_loss=0.1073, over 22494.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01055, ecapa_loss=0.0001507, whisper_loss=0.09045, over 3810950.71 frames. ], batch size: 88, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:49:51,308 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 18 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-14 23:49:53,816 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 21 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-14 23:49:59,946 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2906450.0, ans=0.0 2024-08-14 23:50:10,230 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.69 vs. limit=12.0 2024-08-14 23:50:31,719 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.222e+01 2.469e+01 2.749e+01 4.131e+01, threshold=4.938e+01, percent-clipped=0.0 2024-08-14 23:50:33,720 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.67 vs. limit=22.5 2024-08-14 23:50:35,763 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 25 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-14 23:50:37,248 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2906650.0, ans=0.125 2024-08-14 23:50:42,906 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2906750.0, ans=0.2 2024-08-14 23:50:52,375 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 850, loss[loss=0.09319, beats_loss=0.01003, ecapa_loss=0.0001253, whisper_loss=0.08191, over 19536.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01055, ecapa_loss=0.0001507, whisper_loss=0.0894, over 3817859.41 frames. ], batch size: 75, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:51:03,207 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 19 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-14 23:51:25,689 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 19 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-14 23:51:29,943 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2907050.0, ans=0.125 2024-08-14 23:51:36,138 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-14 23:51:56,750 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2907250.0, ans=0.125 2024-08-14 23:51:58,603 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 900, loss[loss=0.09641, beats_loss=0.0118, ecapa_loss=0.0001353, whisper_loss=0.08326, over 18185.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01058, ecapa_loss=0.0001498, whisper_loss=0.08912, over 3817073.85 frames. ], batch size: 73, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:52:11,925 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 28 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-14 23:52:18,686 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2907450.0, ans=0.0 2024-08-14 23:52:21,580 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-14 23:52:21,773 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2907450.0, ans=0.1 2024-08-14 23:52:26,594 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 23 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-14 23:52:27,888 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 27 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-14 23:52:31,902 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2907550.0, ans=0.0 2024-08-14 23:52:34,673 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2907550.0, ans=0.025 2024-08-14 23:52:35,979 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2907550.0, ans=0.125 2024-08-14 23:52:43,585 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.513e+01 2.774e+01 3.197e+01 6.969e+01, threshold=5.548e+01, percent-clipped=1.0 2024-08-14 23:52:53,303 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2907750.0, ans=0.05 2024-08-14 23:52:54,267 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 34 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-14 23:53:01,070 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-14 23:53:02,750 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.80 vs. limit=12.0 2024-08-14 23:53:05,094 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 950, loss[loss=0.09937, beats_loss=0.008872, ecapa_loss=0.0001682, whisper_loss=0.08882, over 13682.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01061, ecapa_loss=0.000149, whisper_loss=0.08927, over 3816262.62 frames. ], batch size: 56, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:53:13,346 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.06 vs. limit=22.5 2024-08-14 23:53:14,048 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 34 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-14 23:53:19,995 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.51 vs. limit=12.0 2024-08-14 23:53:22,154 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2907950.0, ans=0.125 2024-08-14 23:53:30,097 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2908050.0, ans=0.1 2024-08-14 23:53:43,129 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2908150.0, ans=0.2 2024-08-14 23:54:00,088 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2908250.0, ans=0.1 2024-08-14 23:54:08,941 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.81 vs. limit=22.5 2024-08-14 23:54:10,591 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 1000, loss[loss=0.09332, beats_loss=0.01086, ecapa_loss=0.0002007, whisper_loss=0.08045, over 14337.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01055, ecapa_loss=0.0001497, whisper_loss=0.08961, over 3801145.35 frames. ], batch size: 62, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:54:19,902 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-14 23:54:23,585 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 30 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-14 23:54:25,284 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2908450.0, ans=0.125 2024-08-14 23:54:55,031 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.358e+01 2.563e+01 2.906e+01 4.216e+01, threshold=5.126e+01, percent-clipped=0.0 2024-08-14 23:54:57,744 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 19 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-14 23:55:03,492 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.92 vs. limit=15.0 2024-08-14 23:55:15,894 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 1050, loss[loss=0.08716, beats_loss=0.01195, ecapa_loss=0.0001493, whisper_loss=0.07372, over 14378.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01055, ecapa_loss=0.0001491, whisper_loss=0.08932, over 3774045.61 frames. ], batch size: 58, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:55:32,145 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2908950.0, ans=0.0 2024-08-14 23:55:41,108 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 23:55:43,569 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2909050.0, ans=0.0 2024-08-14 23:55:55,895 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2909150.0, ans=0.1 2024-08-14 23:55:57,770 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.56 vs. limit=22.5 2024-08-14 23:55:58,235 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 25 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-14 23:56:18,332 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten.whitening_limit, batch_count=2909250.0, ans=15.0 2024-08-14 23:56:21,303 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 1100, loss[loss=0.127, beats_loss=0.008497, ecapa_loss=0.0001367, whisper_loss=0.1172, over 21533.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01041, ecapa_loss=0.0001491, whisper_loss=0.08995, over 3779776.16 frames. ], batch size: 79, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:56:24,997 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 16 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-14 23:56:38,668 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 23:56:39,641 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-14 23:56:42,571 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2909450.0, ans=0.125 2024-08-14 23:56:45,994 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 21 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-14 23:56:50,520 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2909550.0, ans=0.125 2024-08-14 23:57:05,446 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.948e+01 2.324e+01 2.535e+01 2.853e+01 4.555e+01, threshold=5.069e+01, percent-clipped=0.0 2024-08-14 23:57:19,198 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2909750.0, ans=0.015 2024-08-14 23:57:24,530 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2909750.0, ans=0.2 2024-08-14 23:57:26,564 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 1150, loss[loss=0.118, beats_loss=0.006529, ecapa_loss=0.0001357, whisper_loss=0.1101, over 16977.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01045, ecapa_loss=0.0001486, whisper_loss=0.0902, over 3804408.25 frames. ], batch size: 61, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:57:28,181 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2909850.0, ans=0.125 2024-08-14 23:57:28,600 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.73 vs. limit=15.0 2024-08-14 23:57:29,873 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.75 vs. limit=22.5 2024-08-14 23:57:53,181 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-14 23:58:06,572 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2910150.0, ans=0.0 2024-08-14 23:58:16,509 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 22 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-14 23:58:18,795 WARNING [optim.py:496] (1/4) Scaling gradients by 0.08089441806077957, model_norm_threshold=50.69233322143555 2024-08-14 23:58:18,977 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.38, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.506e+05, grad_sumsq=1.506e+05, orig_rms_sq=1.000e+00 2024-08-14 23:58:26,403 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2910250.0, ans=0.0 2024-08-14 23:58:27,523 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2910250.0, ans=0.2 2024-08-14 23:58:32,282 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 1200, loss[loss=0.1193, beats_loss=0.009332, ecapa_loss=0.0001558, whisper_loss=0.1084, over 15926.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01052, ecapa_loss=0.0001483, whisper_loss=0.08956, over 3786588.89 frames. ], batch size: 61, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:58:33,780 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 30 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-14 23:58:46,575 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2910450.0, ans=0.125 2024-08-14 23:58:47,805 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2910450.0, ans=0.125 2024-08-14 23:58:48,892 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2910450.0, ans=0.125 2024-08-14 23:58:50,101 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2910450.0, ans=0.125 2024-08-14 23:59:00,991 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.24 vs. limit=15.0 2024-08-14 23:59:04,350 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2910550.0, ans=0.0 2024-08-14 23:59:12,884 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2910650.0, ans=0.125 2024-08-14 23:59:18,484 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.775e+01 2.273e+01 2.471e+01 2.921e+01 6.266e+02, threshold=4.943e+01, percent-clipped=3.0 2024-08-14 23:59:30,352 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 16 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-14 23:59:39,320 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.44 vs. limit=22.5 2024-08-14 23:59:39,931 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 1250, loss[loss=0.08982, beats_loss=0.01031, ecapa_loss=0.0001274, whisper_loss=0.07824, over 14644.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01066, ecapa_loss=0.0001478, whisper_loss=0.08913, over 3804746.19 frames. ], batch size: 56, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:59:45,364 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 20 from LS+wenet, 26 from Vox, 25 fro AS 2024-08-14 23:59:47,778 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-14 23:59:49,766 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 12 from LS+wenet, 11 from Vox, 34 fro AS 2024-08-15 00:00:16,965 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2911050.0, ans=0.1 2024-08-15 00:00:19,638 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2911050.0, ans=0.0 2024-08-15 00:00:30,938 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 19 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-15 00:00:43,274 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 37 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-15 00:00:43,569 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2911250.0, ans=0.0 2024-08-15 00:00:51,596 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 1300, loss[loss=0.1048, beats_loss=0.01078, ecapa_loss=0.0001661, whisper_loss=0.09236, over 18698.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01057, ecapa_loss=0.0001485, whisper_loss=0.08974, over 3813093.00 frames. ], batch size: 77, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:00:53,535 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.40 vs. limit=22.5 2024-08-15 00:00:59,267 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 12 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-15 00:01:13,119 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 21 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-15 00:01:31,998 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 25 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-15 00:01:40,986 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.605e+01 2.220e+01 2.514e+01 2.820e+01 5.708e+01, threshold=5.028e+01, percent-clipped=1.0 2024-08-15 00:02:07,102 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 1350, loss[loss=0.1029, beats_loss=0.01153, ecapa_loss=0.0001216, whisper_loss=0.09016, over 23313.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01056, ecapa_loss=0.0001476, whisper_loss=0.0904, over 3825602.67 frames. ], batch size: 91, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:02:21,547 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 20 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-15 00:02:46,134 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2912050.0, ans=0.2 2024-08-15 00:02:52,695 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.96 vs. limit=15.0 2024-08-15 00:03:08,438 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2912150.0, ans=0.125 2024-08-15 00:03:14,692 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2912250.0, ans=0.125 2024-08-15 00:03:26,175 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 1400, loss[loss=0.1198, beats_loss=0.008425, ecapa_loss=0.0001556, whisper_loss=0.1098, over 22194.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01052, ecapa_loss=0.0001484, whisper_loss=0.08965, over 3816225.61 frames. ], batch size: 85, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:03:39,426 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2912350.0, ans=0.0 2024-08-15 00:03:48,801 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 23 from LS+wenet, 17 from Vox, 16 fro AS 2024-08-15 00:04:07,442 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2912550.0, ans=0.125 2024-08-15 00:04:09,254 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.50 vs. limit=15.0 2024-08-15 00:04:18,978 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.745e+01 2.213e+01 2.409e+01 2.857e+01 4.258e+01, threshold=4.818e+01, percent-clipped=0.0 2024-08-15 00:05:00,896 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 1450, loss[loss=0.09848, beats_loss=0.007676, ecapa_loss=0.0001536, whisper_loss=0.08927, over 18735.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01048, ecapa_loss=0.0001483, whisper_loss=0.08943, over 3829548.19 frames. ], batch size: 73, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:05:11,297 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2912850.0, ans=0.0 2024-08-15 00:05:15,116 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2912850.0, ans=0.125 2024-08-15 00:05:23,319 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2912950.0, ans=0.125 2024-08-15 00:05:34,002 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2913050.0, ans=0.1 2024-08-15 00:05:34,369 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.25 vs. limit=15.0 2024-08-15 00:05:38,962 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2913050.0, ans=0.125 2024-08-15 00:05:57,733 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.12 vs. limit=15.0 2024-08-15 00:05:59,552 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2913150.0, ans=0.125 2024-08-15 00:06:21,962 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2913250.0, ans=0.125 2024-08-15 00:06:25,519 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 1500, loss[loss=0.09019, beats_loss=0.009318, ecapa_loss=0.0001817, whisper_loss=0.07905, over 19403.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01054, ecapa_loss=0.000148, whisper_loss=0.08854, over 3816561.46 frames. ], batch size: 80, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:06:42,811 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2913450.0, ans=0.125 2024-08-15 00:06:47,587 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.60 vs. limit=6.0 2024-08-15 00:06:48,275 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 30 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-15 00:06:55,234 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.58 vs. limit=15.0 2024-08-15 00:07:01,322 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-15 00:07:05,963 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 37 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-15 00:07:09,505 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.024e-02 2024-08-15 00:07:20,021 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.692e+01 2.340e+01 2.607e+01 2.901e+01 2.472e+02, threshold=5.215e+01, percent-clipped=2.0 2024-08-15 00:07:21,876 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 17 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-15 00:07:31,924 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=15.04 vs. limit=15.0 2024-08-15 00:07:44,993 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2913750.0, ans=0.125 2024-08-15 00:07:47,093 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-15 00:07:54,052 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 1550, loss[loss=0.114, beats_loss=0.01049, ecapa_loss=0.0001539, whisper_loss=0.102, over 21692.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0105, ecapa_loss=0.0001479, whisper_loss=0.08949, over 3842509.71 frames. ], batch size: 87, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:08:07,579 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2913850.0, ans=0.1 2024-08-15 00:08:09,773 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.64 vs. limit=15.0 2024-08-15 00:08:10,926 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-15 00:08:18,927 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2913950.0, ans=0.125 2024-08-15 00:08:38,663 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 17 from Vox, 49 fro AS 2024-08-15 00:09:20,658 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.10 vs. limit=15.0 2024-08-15 00:09:27,516 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2914250.0, ans=0.1 2024-08-15 00:09:36,524 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 1600, loss[loss=0.1165, beats_loss=0.01095, ecapa_loss=0.000122, whisper_loss=0.1043, over 24365.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01056, ecapa_loss=0.0001474, whisper_loss=0.08929, over 3853013.54 frames. ], batch size: 91, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:09:40,215 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2914350.0, ans=0.125 2024-08-15 00:09:59,326 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-15 00:10:29,818 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 27 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-15 00:10:52,650 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-15 00:10:57,271 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.670e+01 2.368e+01 2.559e+01 2.879e+01 3.704e+01, threshold=5.117e+01, percent-clipped=0.0 2024-08-15 00:11:36,071 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.91 vs. limit=15.0 2024-08-15 00:11:36,694 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 1650, loss[loss=0.1063, beats_loss=0.01177, ecapa_loss=0.0001246, whisper_loss=0.09332, over 20716.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01052, ecapa_loss=0.0001474, whisper_loss=0.08981, over 3873706.89 frames. ], batch size: 80, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:12:07,990 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2914950.0, ans=0.1 2024-08-15 00:12:18,238 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 19 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-15 00:12:30,511 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 16 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-15 00:12:33,919 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 15 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-15 00:12:49,285 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2915150.0, ans=0.07 2024-08-15 00:12:49,802 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.13 vs. limit=15.0 2024-08-15 00:13:01,793 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 21 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-15 00:13:33,647 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 16 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-15 00:13:36,044 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 1700, loss[loss=0.09108, beats_loss=0.01108, ecapa_loss=0.0001165, whisper_loss=0.07884, over 15995.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0104, ecapa_loss=0.000148, whisper_loss=0.09081, over 3842997.58 frames. ], batch size: 61, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:13:38,783 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 21 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-15 00:13:53,136 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=2915350.0, ans=0.025 2024-08-15 00:13:54,123 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 23 from LS+wenet, 18 from Vox, 49 fro AS 2024-08-15 00:14:06,593 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.64 vs. limit=10.0 2024-08-15 00:14:55,577 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.283e+01 2.594e+01 2.900e+01 5.252e+01, threshold=5.187e+01, percent-clipped=1.0 2024-08-15 00:15:20,070 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.55 vs. limit=15.0 2024-08-15 00:15:30,661 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 1750, loss[loss=0.08783, beats_loss=0.01044, ecapa_loss=0.0001693, whisper_loss=0.0757, over 22335.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01043, ecapa_loss=0.0001484, whisper_loss=0.09007, over 3830826.97 frames. ], batch size: 93, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:15:41,649 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2915850.0, ans=0.125 2024-08-15 00:16:04,371 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2916050.0, ans=0.2 2024-08-15 00:16:04,699 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.82 vs. limit=15.0 2024-08-15 00:16:13,800 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2916050.0, ans=0.1 2024-08-15 00:16:42,964 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 1800, loss[loss=0.1089, beats_loss=0.008477, ecapa_loss=0.0001798, whisper_loss=0.09861, over 21913.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01036, ecapa_loss=0.0001492, whisper_loss=0.09026, over 3841394.32 frames. ], batch size: 87, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:16:43,291 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2916350.0, ans=0.1 2024-08-15 00:16:44,748 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2916350.0, ans=0.125 2024-08-15 00:16:47,331 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2916350.0, ans=0.125 2024-08-15 00:16:47,361 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2916350.0, ans=0.125 2024-08-15 00:17:15,650 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 26 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-15 00:17:23,267 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2916650.0, ans=0.125 2024-08-15 00:17:25,890 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 23 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-15 00:17:29,581 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.746e+01 2.322e+01 2.601e+01 3.039e+01 2.068e+02, threshold=5.202e+01, percent-clipped=5.0 2024-08-15 00:17:37,233 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2916750.0, ans=0.125 2024-08-15 00:17:37,267 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2916750.0, ans=0.0 2024-08-15 00:17:38,580 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2916750.0, ans=0.0 2024-08-15 00:17:44,058 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2916750.0, ans=0.125 2024-08-15 00:17:45,856 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2916750.0, ans=0.2 2024-08-15 00:17:46,077 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.01 vs. limit=15.0 2024-08-15 00:17:52,298 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 1850, loss[loss=0.09279, beats_loss=0.01205, ecapa_loss=0.0001404, whisper_loss=0.07933, over 22431.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01041, ecapa_loss=0.0001499, whisper_loss=0.0901, over 3837635.39 frames. ], batch size: 90, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:17:57,897 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2916850.0, ans=0.5 2024-08-15 00:17:58,303 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.16 vs. limit=15.0 2024-08-15 00:17:59,123 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2916850.0, ans=0.0 2024-08-15 00:18:04,849 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2916850.0, ans=0.0 2024-08-15 00:18:07,491 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2916950.0, ans=0.5 2024-08-15 00:18:07,525 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2916950.0, ans=0.125 2024-08-15 00:18:17,528 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 26 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-15 00:18:18,325 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2916950.0, ans=0.125 2024-08-15 00:18:24,650 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2917050.0, ans=0.125 2024-08-15 00:18:27,894 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2917050.0, ans=0.09899494936611666 2024-08-15 00:18:34,488 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2917050.0, ans=0.1 2024-08-15 00:18:45,802 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.85 vs. limit=6.0 2024-08-15 00:18:53,372 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2917250.0, ans=0.125 2024-08-15 00:19:02,838 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2917250.0, ans=0.1 2024-08-15 00:19:10,417 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 1900, loss[loss=0.0885, beats_loss=0.01251, ecapa_loss=0.0001306, whisper_loss=0.07469, over 16296.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0104, ecapa_loss=0.0001505, whisper_loss=0.08978, over 3804835.34 frames. ], batch size: 65, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:19:11,200 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2917350.0, ans=0.125 2024-08-15 00:19:38,123 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 21 from LS+wenet, 12 from Vox, 21 fro AS 2024-08-15 00:19:43,062 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.80 vs. limit=22.5 2024-08-15 00:19:56,570 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2917550.0, ans=0.0 2024-08-15 00:20:04,173 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.92 vs. limit=22.5 2024-08-15 00:20:06,026 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.713e+01 2.388e+01 2.711e+01 3.006e+01 3.511e+02, threshold=5.422e+01, percent-clipped=5.0 2024-08-15 00:20:09,286 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 18 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-15 00:20:13,676 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 17 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-15 00:20:15,640 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 17 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-15 00:20:30,014 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 1950, loss[loss=0.07897, beats_loss=0.01243, ecapa_loss=0.000138, whisper_loss=0.06516, over 16350.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01053, ecapa_loss=0.0001496, whisper_loss=0.08946, over 3787028.15 frames. ], batch size: 65, lr: 2.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:21:00,070 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2917950.0, ans=0.0 2024-08-15 00:21:06,931 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2918050.0, ans=0.0 2024-08-15 00:21:09,636 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2918050.0, ans=0.125 2024-08-15 00:21:34,494 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.57 vs. limit=10.0 2024-08-15 00:21:46,457 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2918250.0, ans=0.125 2024-08-15 00:21:51,655 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 17 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-15 00:21:52,698 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 2000, loss[loss=0.07504, beats_loss=0.01294, ecapa_loss=0.0001478, whisper_loss=0.06062, over 18303.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01061, ecapa_loss=0.0001492, whisper_loss=0.08912, over 3811242.69 frames. ], batch size: 78, lr: 2.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:22:47,939 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 33 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-15 00:22:49,148 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.908e+01 2.256e+01 2.494e+01 2.913e+01 5.528e+01, threshold=4.988e+01, percent-clipped=1.0 2024-08-15 00:23:04,687 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 19 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-15 00:23:09,319 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 25 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-15 00:23:14,263 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2918850.0, ans=0.125 2024-08-15 00:23:15,059 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 2050, loss[loss=0.1005, beats_loss=0.008869, ecapa_loss=0.0001666, whisper_loss=0.08996, over 16120.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01055, ecapa_loss=0.0001496, whisper_loss=0.08976, over 3823244.97 frames. ], batch size: 63, lr: 2.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:23:23,382 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 23 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-15 00:23:28,201 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2918850.0, ans=0.125 2024-08-15 00:23:41,566 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 15 from LS+wenet, 10 from Vox, 28 fro AS 2024-08-15 00:23:50,305 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2919050.0, ans=0.1 2024-08-15 00:23:50,456 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.42 vs. limit=10.0 2024-08-15 00:23:55,248 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 00:23:59,464 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 34 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-15 00:24:08,163 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2919150.0, ans=0.1 2024-08-15 00:24:36,900 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 2100, loss[loss=0.09194, beats_loss=0.01399, ecapa_loss=0.000155, whisper_loss=0.0764, over 21444.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01053, ecapa_loss=0.0001499, whisper_loss=0.08984, over 3804558.13 frames. ], batch size: 90, lr: 2.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:24:37,848 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.95 vs. limit=22.5 2024-08-15 00:24:58,546 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.17 vs. limit=22.5 2024-08-15 00:25:00,163 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.31 vs. limit=15.0 2024-08-15 00:25:07,595 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 25 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-15 00:25:10,889 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2919550.0, ans=0.0 2024-08-15 00:25:15,909 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 14 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-15 00:25:18,169 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2919550.0, ans=0.2 2024-08-15 00:25:31,860 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.720e+01 2.305e+01 2.496e+01 2.863e+01 3.632e+01, threshold=4.993e+01, percent-clipped=0.0 2024-08-15 00:25:32,432 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 16 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-15 00:25:32,722 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2919650.0, ans=0.2 2024-08-15 00:25:39,754 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 16 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-15 00:25:41,938 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 24 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-15 00:25:57,176 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2919850.0, ans=0.0 2024-08-15 00:25:57,923 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 2150, loss[loss=0.104, beats_loss=0.01239, ecapa_loss=0.0001277, whisper_loss=0.09031, over 16324.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01051, ecapa_loss=0.0001503, whisper_loss=0.09078, over 3817979.30 frames. ], batch size: 63, lr: 2.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:26:07,031 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 25 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-15 00:26:21,510 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.90 vs. limit=15.0 2024-08-15 00:26:33,909 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 21 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-15 00:27:05,509 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 32 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-15 00:27:13,692 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.76 vs. limit=15.0 2024-08-15 00:27:26,844 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 2200, loss[loss=0.1103, beats_loss=0.01185, ecapa_loss=0.0001272, whisper_loss=0.09716, over 23433.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01052, ecapa_loss=0.0001502, whisper_loss=0.09168, over 3833344.87 frames. ], batch size: 92, lr: 2.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:27:57,611 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 17 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-15 00:28:07,267 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 19 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-15 00:28:10,575 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=3.922e-02 2024-08-15 00:28:13,654 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2920550.0, ans=0.0 2024-08-15 00:28:22,504 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.720e+01 2.336e+01 2.682e+01 3.041e+01 4.507e+01, threshold=5.364e+01, percent-clipped=0.0 2024-08-15 00:28:45,094 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 21 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-15 00:28:49,490 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 2250, loss[loss=0.1239, beats_loss=0.009101, ecapa_loss=0.0001622, whisper_loss=0.1132, over 18409.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01068, ecapa_loss=0.0001495, whisper_loss=0.09098, over 3858972.57 frames. ], batch size: 72, lr: 2.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:29:04,883 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2920950.0, ans=0.0 2024-08-15 00:29:04,942 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2920950.0, ans=0.125 2024-08-15 00:29:19,137 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2920950.0, ans=0.0 2024-08-15 00:29:26,033 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 22 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-15 00:29:28,893 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 30 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-15 00:30:05,638 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-15 00:30:06,079 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.40 vs. limit=22.5 2024-08-15 00:30:11,314 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 2300, loss[loss=0.08059, beats_loss=0.01293, ecapa_loss=0.000136, whisper_loss=0.0663, over 17066.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01067, ecapa_loss=0.0001486, whisper_loss=0.09128, over 3901631.86 frames. ], batch size: 67, lr: 2.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:30:13,652 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 19 from LS+wenet, 19 from Vox, 17 fro AS 2024-08-15 00:30:31,349 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2921450.0, ans=0.07 2024-08-15 00:30:49,492 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 23 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-15 00:30:51,799 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.91 vs. limit=10.0 2024-08-15 00:31:04,694 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.272e+01 2.490e+01 2.826e+01 4.749e+01, threshold=4.979e+01, percent-clipped=0.0 2024-08-15 00:31:06,440 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 32 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-15 00:31:06,933 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.90 vs. limit=15.0 2024-08-15 00:31:08,225 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2921650.0, ans=0.0 2024-08-15 00:31:21,797 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=2921750.0, ans=10.0 2024-08-15 00:31:31,741 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2921850.0, ans=0.125 2024-08-15 00:31:32,534 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 2350, loss[loss=0.1047, beats_loss=0.01119, ecapa_loss=0.0001354, whisper_loss=0.09215, over 22325.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01064, ecapa_loss=0.0001501, whisper_loss=0.09127, over 3874973.06 frames. ], batch size: 86, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:31:33,693 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.17 vs. limit=10.0 2024-08-15 00:31:52,466 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2921950.0, ans=0.125 2024-08-15 00:32:18,119 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 29 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-15 00:32:19,031 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2922050.0, ans=0.125 2024-08-15 00:32:36,610 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2922250.0, ans=0.0 2024-08-15 00:32:46,765 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 32 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-15 00:32:49,637 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 24 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-15 00:32:53,749 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 2400, loss[loss=0.1, beats_loss=0.01081, ecapa_loss=0.0001141, whisper_loss=0.08808, over 15847.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01054, ecapa_loss=0.00015, whisper_loss=0.0921, over 3890202.99 frames. ], batch size: 58, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:32:54,307 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 27 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-15 00:32:55,893 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-15 00:33:41,476 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2922550.0, ans=0.015 2024-08-15 00:33:43,359 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2922650.0, ans=0.125 2024-08-15 00:33:46,584 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.34 vs. limit=15.0 2024-08-15 00:33:50,220 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.905e+01 2.275e+01 2.495e+01 2.898e+01 2.121e+02, threshold=4.990e+01, percent-clipped=1.0 2024-08-15 00:33:51,012 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2922650.0, ans=0.125 2024-08-15 00:33:54,511 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2922650.0, ans=0.125 2024-08-15 00:34:06,591 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2922750.0, ans=0.05 2024-08-15 00:34:15,617 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 2450, loss[loss=0.1097, beats_loss=0.0105, ecapa_loss=0.000217, whisper_loss=0.097, over 21211.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.0105, ecapa_loss=0.000151, whisper_loss=0.09187, over 3878956.69 frames. ], batch size: 91, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:34:28,131 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 20 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-15 00:34:37,390 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2922950.0, ans=0.125 2024-08-15 00:34:45,703 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2922950.0, ans=0.125 2024-08-15 00:34:49,179 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2923050.0, ans=0.125 2024-08-15 00:34:53,835 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 30 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-15 00:34:53,986 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=2923050.0, ans=0.5 2024-08-15 00:34:59,719 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2923050.0, ans=0.125 2024-08-15 00:35:01,918 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2923050.0, ans=0.125 2024-08-15 00:35:23,554 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 20 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-15 00:35:28,911 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=5.594e-02 2024-08-15 00:35:31,332 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 21 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-15 00:35:37,607 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 25 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-15 00:35:38,639 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 2500, loss[loss=0.1019, beats_loss=0.01129, ecapa_loss=0.0001618, whisper_loss=0.08896, over 17992.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01047, ecapa_loss=0.0001504, whisper_loss=0.09213, over 3872290.55 frames. ], batch size: 75, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:35:40,495 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=2923350.0, ans=0.1 2024-08-15 00:36:00,172 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2923450.0, ans=0.0 2024-08-15 00:36:15,865 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 22 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-15 00:36:27,207 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2923650.0, ans=0.2 2024-08-15 00:36:32,284 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.759e+01 2.328e+01 2.584e+01 2.965e+01 7.495e+01, threshold=5.168e+01, percent-clipped=2.0 2024-08-15 00:36:32,465 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 35 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-15 00:36:41,200 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-15 00:36:42,181 WARNING [optim.py:496] (1/4) Scaling gradients by 0.026615602895617485, model_norm_threshold=51.67815017700195 2024-08-15 00:36:42,372 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.18, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.809e+05, grad_sumsq=6.809e+05, orig_rms_sq=1.000e+00 2024-08-15 00:36:56,924 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2923750.0, ans=0.0 2024-08-15 00:36:59,185 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 2550, loss[loss=0.09616, beats_loss=0.01041, ecapa_loss=0.0001645, whisper_loss=0.08411, over 16621.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01049, ecapa_loss=0.0001504, whisper_loss=0.09246, over 3907315.57 frames. ], batch size: 70, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:37:05,669 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2923850.0, ans=0.125 2024-08-15 00:37:12,894 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-15 00:37:14,214 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 13 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-15 00:37:39,849 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-15 00:37:44,867 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2924150.0, ans=0.125 2024-08-15 00:38:11,626 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2924250.0, ans=0.2 2024-08-15 00:38:16,898 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 2600, loss[loss=0.09256, beats_loss=0.009445, ecapa_loss=0.0001587, whisper_loss=0.08153, over 16091.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01052, ecapa_loss=0.0001499, whisper_loss=0.09243, over 3898641.33 frames. ], batch size: 65, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:38:26,798 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2924350.0, ans=0.09899494936611666 2024-08-15 00:38:34,442 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2924450.0, ans=0.125 2024-08-15 00:38:52,444 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.97 vs. limit=15.0 2024-08-15 00:39:00,137 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 23 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-15 00:39:04,125 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 24 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-15 00:39:08,647 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.339e+01 2.635e+01 2.939e+01 1.942e+03, threshold=5.270e+01, percent-clipped=3.0 2024-08-15 00:39:09,639 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2924650.0, ans=0.2 2024-08-15 00:39:13,847 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 15 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-15 00:39:14,078 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2924650.0, ans=0.125 2024-08-15 00:39:21,848 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2924750.0, ans=0.125 2024-08-15 00:39:28,249 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-15 00:39:32,856 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 2650, loss[loss=0.09035, beats_loss=0.0108, ecapa_loss=0.0001563, whisper_loss=0.07799, over 15394.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01054, ecapa_loss=0.0001502, whisper_loss=0.09214, over 3888410.31 frames. ], batch size: 60, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:39:35,416 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.25 vs. limit=22.5 2024-08-15 00:39:44,515 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 27 from LS+wenet, 16 from Vox, 51 fro AS 2024-08-15 00:39:52,808 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2924950.0, ans=0.125 2024-08-15 00:40:19,564 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 22 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-15 00:40:23,866 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.39 vs. limit=6.0 2024-08-15 00:40:26,838 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2925150.0, ans=0.125 2024-08-15 00:40:28,857 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2925150.0, ans=0.1 2024-08-15 00:40:50,042 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 2700, loss[loss=0.1123, beats_loss=0.008994, ecapa_loss=0.0001452, whisper_loss=0.1019, over 20309.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01061, ecapa_loss=0.0001499, whisper_loss=0.09124, over 3882632.11 frames. ], batch size: 79, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:40:53,694 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-15 00:41:12,233 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 19 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-15 00:41:27,875 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 25 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-15 00:41:38,078 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.09 vs. limit=15.0 2024-08-15 00:41:43,694 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.535e+01 2.279e+01 2.490e+01 2.726e+01 4.419e+01, threshold=4.980e+01, percent-clipped=0.0 2024-08-15 00:41:44,688 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2925650.0, ans=0.125 2024-08-15 00:41:44,748 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=8.809e+00 2024-08-15 00:41:45,991 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2925650.0, ans=0.125 2024-08-15 00:41:56,676 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-15 00:41:59,889 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2925750.0, ans=0.2 2024-08-15 00:42:04,596 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 16 from LS+wenet, 25 from Vox, 20 fro AS 2024-08-15 00:42:09,556 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 2750, loss[loss=0.09371, beats_loss=0.01076, ecapa_loss=0.0001286, whisper_loss=0.08166, over 16726.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01069, ecapa_loss=0.0001498, whisper_loss=0.09086, over 3875091.18 frames. ], batch size: 63, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:42:12,965 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2925850.0, ans=0.125 2024-08-15 00:42:37,589 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.353e+05 2024-08-15 00:42:40,719 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2926050.0, ans=0.125 2024-08-15 00:42:52,901 INFO [train_multi_KD3.py:844] (1/4) A total of 97 cuts. 25 from LS+wenet, 23 from Vox, 49 fro AS 2024-08-15 00:43:04,675 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2926150.0, ans=0.125 2024-08-15 00:43:11,260 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 20 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-15 00:43:22,871 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2926250.0, ans=0.0 2024-08-15 00:43:32,316 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 2800, loss[loss=0.1118, beats_loss=0.01099, ecapa_loss=0.0001476, whisper_loss=0.09933, over 21082.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01067, ecapa_loss=0.00015, whisper_loss=0.09073, over 3832362.99 frames. ], batch size: 87, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:43:48,032 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2926350.0, ans=0.2 2024-08-15 00:43:53,921 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 29 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-15 00:44:00,895 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2926450.0, ans=0.0 2024-08-15 00:44:06,526 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-15 00:44:10,921 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2926550.0, ans=0.2 2024-08-15 00:44:22,413 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=4.152e-02 2024-08-15 00:44:31,126 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.375e+01 2.624e+01 2.895e+01 7.200e+01, threshold=5.247e+01, percent-clipped=1.0 2024-08-15 00:45:00,073 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 2850, loss[loss=0.1166, beats_loss=0.01053, ecapa_loss=0.0001448, whisper_loss=0.1047, over 23318.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01068, ecapa_loss=0.0001508, whisper_loss=0.09101, over 3839694.51 frames. ], batch size: 92, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:45:11,819 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2926850.0, ans=0.125 2024-08-15 00:45:35,730 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 37 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-15 00:45:36,296 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2927050.0, ans=0.125 2024-08-15 00:45:44,300 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2927050.0, ans=0.125 2024-08-15 00:45:53,676 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2927150.0, ans=0.0 2024-08-15 00:46:24,453 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 2900, loss[loss=0.1069, beats_loss=0.01197, ecapa_loss=0.0001628, whisper_loss=0.09331, over 20501.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01067, ecapa_loss=0.0001512, whisper_loss=0.09079, over 3833700.75 frames. ], batch size: 83, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:46:43,345 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 23 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-15 00:46:50,385 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.62 vs. limit=12.0 2024-08-15 00:46:52,240 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2927450.0, ans=0.1 2024-08-15 00:47:24,859 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.908e+01 2.340e+01 2.621e+01 2.930e+01 9.659e+01, threshold=5.242e+01, percent-clipped=1.0 2024-08-15 00:47:25,562 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.69 vs. limit=10.0 2024-08-15 00:47:36,825 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2927750.0, ans=0.125 2024-08-15 00:47:38,666 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 25 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-15 00:47:44,490 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 16 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-15 00:47:52,599 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 2950, loss[loss=0.06789, beats_loss=0.01624, ecapa_loss=0.0001342, whisper_loss=0.0503, over 16621.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01067, ecapa_loss=0.0001524, whisper_loss=0.0904, over 3832319.10 frames. ], batch size: 70, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:48:15,477 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2927950.0, ans=0.0 2024-08-15 00:48:15,800 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.37 vs. limit=15.0 2024-08-15 00:48:26,010 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2927950.0, ans=0.05 2024-08-15 00:48:38,698 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 27 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-15 00:48:47,998 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 26 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-15 00:49:11,375 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 26 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-15 00:49:11,627 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2928250.0, ans=0.125 2024-08-15 00:49:22,671 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 3000, loss[loss=0.09724, beats_loss=0.009855, ecapa_loss=0.0001508, whisper_loss=0.08588, over 23313.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01063, ecapa_loss=0.0001523, whisper_loss=0.09078, over 3865308.92 frames. ], batch size: 91, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:49:22,671 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-15 00:49:49,517 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.8736, 1.6892, 2.9420, 2.8916], device='cuda:1') 2024-08-15 00:50:02,595 INFO [train_multi_KD3.py:1149] (1/4) Epoch 21, validation on ASR_libri: loss=0.2533, beats_loss=0, ecapa_loss=0.0005339, whisper_loss=0.248, over 922467.00 frames. 2024-08-15 00:50:21,632 INFO [train_multi_KD3.py:1149] (1/4) Epoch 21, validation on SV_voxceleb1: loss=0.004208, beats_loss=0, ecapa_loss=0.0004208, whisper_loss=0, over 939242.00 frames. 2024-08-15 00:51:04,870 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.5886, 2.0258, 2.1748, 1.1737], device='cuda:1') 2024-08-15 00:51:09,830 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.0798, 3.1506, 3.4211, 3.1614], device='cuda:1') 2024-08-15 00:52:15,944 INFO [train_multi_KD3.py:1149] (1/4) Epoch 21, validation on AT_audioset: loss=0.02337, beats_loss=0.02337, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 00:52:15,948 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 30639MB 2024-08-15 00:52:23,316 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 29 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-15 00:52:40,851 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.65 vs. limit=6.0 2024-08-15 00:52:47,598 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=26.32 vs. limit=22.5 2024-08-15 00:52:50,430 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2928550.0, ans=0.125 2024-08-15 00:52:53,531 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2928550.0, ans=0.125 2024-08-15 00:53:01,673 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2928650.0, ans=0.0 2024-08-15 00:53:08,684 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.875e+01 2.494e+01 2.724e+01 3.053e+01 2.712e+02, threshold=5.448e+01, percent-clipped=2.0 2024-08-15 00:53:17,846 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2928750.0, ans=0.09899494936611666 2024-08-15 00:53:22,969 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2928750.0, ans=0.125 2024-08-15 00:53:37,454 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 3050, loss[loss=0.1339, beats_loss=0.007971, ecapa_loss=0.0001532, whisper_loss=0.1244, over 15017.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01061, ecapa_loss=0.0001534, whisper_loss=0.09125, over 3892871.02 frames. ], batch size: 54, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:53:53,263 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 26 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-15 00:54:03,556 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 17 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-15 00:54:42,859 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 26 from LS+wenet, 13 from Vox, 18 fro AS 2024-08-15 00:54:43,048 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2929150.0, ans=0.0 2024-08-15 00:54:47,960 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2929250.0, ans=0.0 2024-08-15 00:55:06,007 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 3100, loss[loss=0.1207, beats_loss=0.0111, ecapa_loss=0.0001521, whisper_loss=0.1081, over 23236.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01058, ecapa_loss=0.0001552, whisper_loss=0.09178, over 3877028.15 frames. ], batch size: 89, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:56:02,490 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.47 vs. limit=15.0 2024-08-15 00:56:05,858 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.762e+01 2.234e+01 2.421e+01 2.829e+01 3.932e+01, threshold=4.842e+01, percent-clipped=0.0 2024-08-15 00:56:15,920 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-15 00:56:17,634 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 24 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-15 00:56:24,054 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 34 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-15 00:56:32,704 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 20 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-15 00:56:34,186 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 3150, loss[loss=0.09993, beats_loss=0.01198, ecapa_loss=0.0001073, whisper_loss=0.08688, over 17706.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01054, ecapa_loss=0.0001552, whisper_loss=0.09148, over 3833600.73 frames. ], batch size: 66, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:57:00,422 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2929950.0, ans=0.0 2024-08-15 00:57:04,545 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2929950.0, ans=0.125 2024-08-15 00:57:12,942 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2930050.0, ans=0.2 2024-08-15 00:57:14,070 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-15 00:57:24,924 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2930150.0, ans=0.125 2024-08-15 00:57:25,462 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.23 vs. limit=15.0 2024-08-15 00:57:32,357 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2930150.0, ans=0.2 2024-08-15 00:57:35,011 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2930150.0, ans=0.2 2024-08-15 00:57:35,030 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2930150.0, ans=0.125 2024-08-15 00:57:36,499 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2930150.0, ans=0.0 2024-08-15 00:57:44,760 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.48 vs. limit=15.0 2024-08-15 00:57:58,655 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 3200, loss[loss=0.1098, beats_loss=0.01093, ecapa_loss=0.0001411, whisper_loss=0.0975, over 22752.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01067, ecapa_loss=0.0001534, whisper_loss=0.09126, over 3855820.53 frames. ], batch size: 91, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:57:59,656 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.31 vs. limit=22.5 2024-08-15 00:58:01,029 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2930350.0, ans=0.125 2024-08-15 00:58:11,370 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2930350.0, ans=0.125 2024-08-15 00:58:14,514 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 20 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-15 00:58:19,704 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 16 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-15 00:58:22,622 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 28 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-15 00:58:58,098 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.935e+01 2.340e+01 2.607e+01 2.888e+01 4.627e+01, threshold=5.214e+01, percent-clipped=0.0 2024-08-15 00:59:00,345 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 23 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-15 00:59:10,511 WARNING [optim.py:496] (1/4) Scaling gradients by 0.07108230143785477, model_norm_threshold=52.141944885253906 2024-08-15 00:59:10,696 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.17, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=9.209e+04, grad_sumsq=9.209e+04, orig_rms_sq=1.000e+00 2024-08-15 00:59:10,931 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 21 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-15 00:59:17,016 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2930750.0, ans=0.125 2024-08-15 00:59:26,758 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 3250, loss[loss=0.08766, beats_loss=0.01345, ecapa_loss=0.0001426, whisper_loss=0.07278, over 16298.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01072, ecapa_loss=0.0001534, whisper_loss=0.09095, over 3851090.48 frames. ], batch size: 67, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:59:34,552 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2930850.0, ans=0.125 2024-08-15 00:59:49,106 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2930950.0, ans=0.125 2024-08-15 00:59:54,621 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=2930950.0, ans=10.0 2024-08-15 01:00:04,190 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2931050.0, ans=0.0 2024-08-15 01:00:05,292 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 25 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-15 01:00:37,554 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2931250.0, ans=0.125 2024-08-15 01:00:51,765 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 16 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-15 01:00:53,135 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 3300, loss[loss=0.0814, beats_loss=0.01436, ecapa_loss=0.0001214, whisper_loss=0.06582, over 16368.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01074, ecapa_loss=0.0001536, whisper_loss=0.09026, over 3849548.63 frames. ], batch size: 67, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:01:37,334 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2931550.0, ans=0.0 2024-08-15 01:01:47,523 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.782e+01 2.317e+01 2.622e+01 2.887e+01 7.335e+02, threshold=5.244e+01, percent-clipped=2.0 2024-08-15 01:01:55,234 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-15 01:02:13,744 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 17 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-15 01:02:14,774 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 3350, loss[loss=0.07957, beats_loss=0.01146, ecapa_loss=0.0001273, whisper_loss=0.06684, over 18483.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0107, ecapa_loss=0.000153, whisper_loss=0.08986, over 3842593.92 frames. ], batch size: 73, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:02:47,377 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 01:02:55,455 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 16 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-15 01:03:00,380 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2932050.0, ans=0.05 2024-08-15 01:03:03,727 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.61 vs. limit=22.5 2024-08-15 01:03:15,467 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 15 from Vox, 48 fro AS 2024-08-15 01:03:29,837 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 20 from LS+wenet, 25 from Vox, 16 fro AS 2024-08-15 01:03:38,446 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2932250.0, ans=0.2 2024-08-15 01:03:45,966 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 3400, loss[loss=0.09438, beats_loss=0.0114, ecapa_loss=0.0001475, whisper_loss=0.0815, over 17253.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01064, ecapa_loss=0.0001529, whisper_loss=0.09021, over 3871692.67 frames. ], batch size: 71, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:03:52,049 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 14 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-15 01:04:29,676 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=2932550.0, ans=0.025 2024-08-15 01:04:44,582 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.702e+01 2.349e+01 2.635e+01 2.975e+01 2.960e+02, threshold=5.270e+01, percent-clipped=3.0 2024-08-15 01:04:56,016 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2932750.0, ans=0.0 2024-08-15 01:04:56,043 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2932750.0, ans=0.125 2024-08-15 01:05:14,135 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 3450, loss[loss=0.09473, beats_loss=0.012, ecapa_loss=0.0001295, whisper_loss=0.08144, over 19767.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01068, ecapa_loss=0.0001519, whisper_loss=0.0904, over 3879397.47 frames. ], batch size: 79, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:05:33,334 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2932950.0, ans=0.2 2024-08-15 01:05:40,073 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2932950.0, ans=0.1 2024-08-15 01:05:42,562 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.44 vs. limit=15.0 2024-08-15 01:05:44,022 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 22 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-15 01:05:44,367 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2932950.0, ans=0.125 2024-08-15 01:05:44,373 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2932950.0, ans=0.125 2024-08-15 01:05:44,386 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2932950.0, ans=0.2 2024-08-15 01:05:47,019 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2932950.0, ans=0.125 2024-08-15 01:05:50,312 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2933050.0, ans=0.0 2024-08-15 01:05:59,561 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2933050.0, ans=0.09899494936611666 2024-08-15 01:06:04,848 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2933050.0, ans=0.125 2024-08-15 01:06:11,914 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 18 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-15 01:06:34,272 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.01 vs. limit=15.0 2024-08-15 01:06:35,371 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2933250.0, ans=0.95 2024-08-15 01:06:36,531 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-15 01:06:47,206 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 3500, loss[loss=0.1098, beats_loss=0.01046, ecapa_loss=0.0001835, whisper_loss=0.09752, over 17405.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01059, ecapa_loss=0.0001523, whisper_loss=0.09083, over 3893513.04 frames. ], batch size: 72, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:06:52,441 WARNING [optim.py:496] (1/4) Scaling gradients by 0.0579170286655426, model_norm_threshold=52.703590393066406 2024-08-15 01:06:52,618 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.12, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.021e+05, grad_sumsq=2.962e+04, orig_rms_sq=3.448e+00 2024-08-15 01:06:59,561 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-15 01:07:22,699 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 19 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-15 01:07:38,228 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2933650.0, ans=0.125 2024-08-15 01:07:44,495 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.730e+01 2.354e+01 2.571e+01 2.919e+01 9.100e+02, threshold=5.142e+01, percent-clipped=1.0 2024-08-15 01:07:44,664 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 24 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-15 01:08:10,691 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 3550, loss[loss=0.07396, beats_loss=0.01562, ecapa_loss=9.791e-05, whisper_loss=0.05737, over 15695.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01065, ecapa_loss=0.0001519, whisper_loss=0.0909, over 3921682.98 frames. ], batch size: 63, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:08:23,793 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-15 01:08:27,456 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 11 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-15 01:08:35,209 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 23 from LS+wenet, 18 from Vox, 12 fro AS 2024-08-15 01:08:54,851 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2934050.0, ans=0.125 2024-08-15 01:08:55,063 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.83 vs. limit=15.0 2024-08-15 01:09:08,198 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2934150.0, ans=0.125 2024-08-15 01:09:33,288 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 3600, loss[loss=0.1021, beats_loss=0.01134, ecapa_loss=0.000156, whisper_loss=0.08916, over 22677.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01056, ecapa_loss=0.0001518, whisper_loss=0.09126, over 3909173.56 frames. ], batch size: 93, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:09:36,266 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-15 01:09:36,442 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2934350.0, ans=0.125 2024-08-15 01:09:41,778 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.77 vs. limit=10.0 2024-08-15 01:10:12,371 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2934550.0, ans=0.125 2024-08-15 01:10:21,962 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 30 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-15 01:10:30,767 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.21 vs. limit=22.5 2024-08-15 01:10:31,172 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 2.251e+01 2.514e+01 2.875e+01 6.843e+01, threshold=5.029e+01, percent-clipped=1.0 2024-08-15 01:10:52,590 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 14 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-15 01:10:56,390 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2934750.0, ans=0.1 2024-08-15 01:10:58,974 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 3650, loss[loss=0.0889, beats_loss=0.01037, ecapa_loss=0.0001512, whisper_loss=0.07701, over 20824.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01065, ecapa_loss=0.0001514, whisper_loss=0.09161, over 3907112.84 frames. ], batch size: 84, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:11:11,006 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.31 vs. limit=10.0 2024-08-15 01:11:20,232 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.95 vs. limit=15.0 2024-08-15 01:11:41,327 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-15 01:11:48,158 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 19 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-15 01:12:00,974 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2935150.0, ans=0.0 2024-08-15 01:12:03,628 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 23 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-15 01:12:19,917 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 3700, loss[loss=0.09941, beats_loss=0.01215, ecapa_loss=0.0001498, whisper_loss=0.08577, over 22174.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01064, ecapa_loss=0.0001515, whisper_loss=0.09174, over 3901499.23 frames. ], batch size: 93, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:12:31,310 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.91 vs. limit=12.0 2024-08-15 01:12:37,982 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 39 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-15 01:12:49,857 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 34 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-15 01:12:52,342 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2935450.0, ans=0.1 2024-08-15 01:13:06,754 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2935550.0, ans=0.0 2024-08-15 01:13:13,318 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2935650.0, ans=0.125 2024-08-15 01:13:17,736 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.268e+01 2.455e+01 2.851e+01 9.066e+01, threshold=4.910e+01, percent-clipped=1.0 2024-08-15 01:13:27,446 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2935750.0, ans=0.125 2024-08-15 01:13:45,929 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 3750, loss[loss=0.1334, beats_loss=0.007946, ecapa_loss=0.0001484, whisper_loss=0.124, over 21126.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01067, ecapa_loss=0.0001514, whisper_loss=0.09173, over 3898831.71 frames. ], batch size: 77, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:13:58,815 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2935850.0, ans=0.0 2024-08-15 01:14:37,627 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2936150.0, ans=0.0 2024-08-15 01:14:56,713 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2936250.0, ans=0.2 2024-08-15 01:14:59,219 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 22 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-15 01:15:04,479 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 15 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-15 01:15:11,057 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 3800, loss[loss=0.09325, beats_loss=0.01211, ecapa_loss=0.0001306, whisper_loss=0.07983, over 18560.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01072, ecapa_loss=0.0001507, whisper_loss=0.09123, over 3874492.36 frames. ], batch size: 76, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:15:44,260 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 20 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-15 01:15:47,085 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2936550.0, ans=0.125 2024-08-15 01:15:56,714 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.40 vs. limit=6.0 2024-08-15 01:16:00,802 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 25 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-15 01:16:01,157 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2936650.0, ans=0.0 2024-08-15 01:16:02,363 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2936650.0, ans=0.125 2024-08-15 01:16:06,990 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.293e+01 2.527e+01 3.101e+01 3.900e+02, threshold=5.055e+01, percent-clipped=2.0 2024-08-15 01:16:15,963 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2936750.0, ans=0.1 2024-08-15 01:16:34,147 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 3850, loss[loss=0.1203, beats_loss=0.01107, ecapa_loss=0.0001302, whisper_loss=0.1079, over 23946.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01068, ecapa_loss=0.0001506, whisper_loss=0.09143, over 3864090.17 frames. ], batch size: 93, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:16:44,502 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2936850.0, ans=0.2 2024-08-15 01:17:23,680 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 31 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-15 01:17:34,224 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.43 vs. limit=12.0 2024-08-15 01:17:38,522 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-15 01:17:40,516 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2937150.0, ans=0.2 2024-08-15 01:18:01,519 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 3900, loss[loss=0.1014, beats_loss=0.01103, ecapa_loss=0.0001805, whisper_loss=0.08852, over 19490.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01056, ecapa_loss=0.000152, whisper_loss=0.0923, over 3882672.03 frames. ], batch size: 82, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:18:23,559 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.39 vs. limit=15.0 2024-08-15 01:18:54,063 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2024-08-15 01:18:55,147 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2937650.0, ans=0.1 2024-08-15 01:18:59,261 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.036e+01 2.342e+01 2.596e+01 2.902e+01 5.795e+01, threshold=5.192e+01, percent-clipped=1.0 2024-08-15 01:19:17,186 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2937750.0, ans=0.1 2024-08-15 01:19:27,347 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 3950, loss[loss=0.09587, beats_loss=0.009938, ecapa_loss=0.0001579, whisper_loss=0.08435, over 14105.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.0106, ecapa_loss=0.0001521, whisper_loss=0.09238, over 3890273.31 frames. ], batch size: 54, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:19:28,001 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2937850.0, ans=0.1 2024-08-15 01:19:31,783 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2937850.0, ans=0.125 2024-08-15 01:19:45,243 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2937950.0, ans=0.0 2024-08-15 01:20:03,308 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-15 01:20:05,421 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-15 01:20:22,769 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.66 vs. limit=15.0 2024-08-15 01:20:33,581 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-15 01:20:53,686 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2938250.0, ans=0.07 2024-08-15 01:20:56,616 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 4000, loss[loss=0.09568, beats_loss=0.008589, ecapa_loss=0.0001603, whisper_loss=0.08549, over 14751.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01067, ecapa_loss=0.0001525, whisper_loss=0.09163, over 3896151.43 frames. ], batch size: 57, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:21:03,990 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2938350.0, ans=0.125 2024-08-15 01:21:07,190 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2938350.0, ans=0.2 2024-08-15 01:21:15,728 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2938450.0, ans=0.0 2024-08-15 01:21:19,942 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.78 vs. limit=15.0 2024-08-15 01:21:23,438 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2938450.0, ans=0.125 2024-08-15 01:21:29,741 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2938450.0, ans=0.125 2024-08-15 01:21:33,551 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 28 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-15 01:21:56,152 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.84 vs. limit=15.0 2024-08-15 01:21:58,483 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.427e+01 2.607e+01 2.957e+01 4.809e+01, threshold=5.214e+01, percent-clipped=0.0 2024-08-15 01:22:01,729 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2938650.0, ans=0.2 2024-08-15 01:22:04,379 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 23 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-15 01:22:07,982 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 35 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-15 01:22:19,243 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2938750.0, ans=0.125 2024-08-15 01:22:27,511 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 4050, loss[loss=0.1, beats_loss=0.009259, ecapa_loss=0.0001731, whisper_loss=0.08904, over 18494.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01063, ecapa_loss=0.0001535, whisper_loss=0.09184, over 3916677.70 frames. ], batch size: 76, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:22:29,937 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 34 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-15 01:22:44,198 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2938850.0, ans=0.1 2024-08-15 01:22:45,697 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 20 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-15 01:22:50,942 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2938950.0, ans=0.125 2024-08-15 01:22:59,633 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2938950.0, ans=0.1 2024-08-15 01:23:09,583 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2939050.0, ans=10.0 2024-08-15 01:23:19,048 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.06 vs. limit=22.5 2024-08-15 01:23:58,947 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 4100, loss[loss=0.1119, beats_loss=0.009337, ecapa_loss=0.0001165, whisper_loss=0.1014, over 21577.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01065, ecapa_loss=0.000154, whisper_loss=0.09162, over 3910450.39 frames. ], batch size: 79, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:24:12,329 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.23 vs. limit=10.0 2024-08-15 01:24:15,415 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 23 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-15 01:24:28,356 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 28 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-15 01:24:37,910 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.23 vs. limit=22.5 2024-08-15 01:24:56,843 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2939650.0, ans=0.125 2024-08-15 01:24:58,295 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.844e+01 2.312e+01 2.568e+01 2.993e+01 3.291e+02, threshold=5.136e+01, percent-clipped=2.0 2024-08-15 01:25:00,240 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=7.811e-02 2024-08-15 01:25:02,367 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.96 vs. limit=15.0 2024-08-15 01:25:05,562 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2939650.0, ans=0.125 2024-08-15 01:25:09,097 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 19 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-15 01:25:21,799 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2939750.0, ans=0.125 2024-08-15 01:25:23,577 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2939750.0, ans=0.0 2024-08-15 01:25:26,396 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 4150, loss[loss=0.1023, beats_loss=0.01025, ecapa_loss=0.000167, whisper_loss=0.0904, over 22296.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01064, ecapa_loss=0.0001544, whisper_loss=0.0914, over 3897493.49 frames. ], batch size: 92, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:25:28,352 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2939850.0, ans=0.05 2024-08-15 01:25:49,158 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.93 vs. limit=15.0 2024-08-15 01:25:55,000 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.36 vs. limit=22.5 2024-08-15 01:26:00,252 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-15 01:26:11,531 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.26 vs. limit=12.0 2024-08-15 01:26:15,298 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 21 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-15 01:26:18,583 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 11 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-15 01:26:20,594 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2940150.0, ans=0.2 2024-08-15 01:26:24,406 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=2940150.0, ans=22.5 2024-08-15 01:26:48,383 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 25 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-15 01:26:48,654 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2940250.0, ans=0.2 2024-08-15 01:26:52,921 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 4200, loss[loss=0.1176, beats_loss=0.0109, ecapa_loss=0.0001348, whisper_loss=0.1053, over 17166.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01067, ecapa_loss=0.0001534, whisper_loss=0.09156, over 3912719.21 frames. ], batch size: 66, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:26:53,540 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2940350.0, ans=0.0 2024-08-15 01:26:59,445 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 26 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-15 01:26:59,745 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2940350.0, ans=0.1 2024-08-15 01:27:02,942 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 22 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-15 01:27:07,882 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2940450.0, ans=0.1 2024-08-15 01:27:10,667 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 27 from LS+wenet, 34 from Vox, 26 fro AS 2024-08-15 01:27:15,719 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2940450.0, ans=0.0 2024-08-15 01:27:44,029 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.838e+01 2.316e+01 2.522e+01 2.873e+01 3.693e+01, threshold=5.044e+01, percent-clipped=0.0 2024-08-15 01:27:47,548 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2940650.0, ans=0.125 2024-08-15 01:27:47,585 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2940650.0, ans=0.125 2024-08-15 01:27:49,896 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 36 from LS+wenet, 33 from Vox, 22 fro AS 2024-08-15 01:27:58,389 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 20 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-15 01:28:06,734 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 4250, loss[loss=0.08842, beats_loss=0.012, ecapa_loss=0.0001589, whisper_loss=0.07483, over 21759.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01068, ecapa_loss=0.0001531, whisper_loss=0.09125, over 3893417.45 frames. ], batch size: 91, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:28:26,272 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-15 01:28:29,693 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.36 vs. limit=15.0 2024-08-15 01:28:39,336 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2941050.0, ans=0.0 2024-08-15 01:28:46,813 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2941150.0, ans=0.125 2024-08-15 01:29:13,853 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 4300, loss[loss=0.1069, beats_loss=0.008631, ecapa_loss=0.0001373, whisper_loss=0.09689, over 23102.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01063, ecapa_loss=0.000153, whisper_loss=0.09169, over 3901034.08 frames. ], batch size: 87, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:29:14,566 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.20 vs. limit=15.0 2024-08-15 01:29:25,233 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2941350.0, ans=0.125 2024-08-15 01:29:27,365 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 24 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-15 01:29:58,833 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.262e+01 2.467e+01 2.860e+01 4.963e+01, threshold=4.934e+01, percent-clipped=0.0 2024-08-15 01:30:14,393 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-15 01:30:19,768 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 4350, loss[loss=0.1104, beats_loss=0.007189, ecapa_loss=0.000176, whisper_loss=0.1014, over 14046.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01062, ecapa_loss=0.0001545, whisper_loss=0.09132, over 3867735.46 frames. ], batch size: 55, lr: 2.98e-03, grad_scale: 1.152921504606847e+18 2024-08-15 01:30:21,528 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2941850.0, ans=0.0 2024-08-15 01:30:22,841 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 17 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-15 01:30:24,160 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 23 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-15 01:31:13,247 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.53 vs. limit=15.0 2024-08-15 01:31:25,791 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 4400, loss[loss=0.1081, beats_loss=0.01052, ecapa_loss=0.0001681, whisper_loss=0.09587, over 22527.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0106, ecapa_loss=0.0001538, whisper_loss=0.09156, over 3869717.06 frames. ], batch size: 92, lr: 2.98e-03, grad_scale: 1.152921504606847e+18 2024-08-15 01:31:30,141 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2942350.0, ans=0.07 2024-08-15 01:31:39,147 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2942450.0, ans=0.125 2024-08-15 01:31:41,341 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 33 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-15 01:31:44,136 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 42 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-15 01:31:47,033 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2942450.0, ans=0.05 2024-08-15 01:32:02,577 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2942550.0, ans=0.125 2024-08-15 01:32:06,494 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2942650.0, ans=0.0 2024-08-15 01:32:09,933 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.967e+01 2.385e+01 2.562e+01 2.975e+01 4.289e+01, threshold=5.125e+01, percent-clipped=0.0 2024-08-15 01:32:18,237 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.93 vs. limit=15.0 2024-08-15 01:32:30,906 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 4450, loss[loss=0.1046, beats_loss=0.01117, ecapa_loss=0.0001383, whisper_loss=0.092, over 22642.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01057, ecapa_loss=0.0001541, whisper_loss=0.09131, over 3916482.10 frames. ], batch size: 89, lr: 2.98e-03, grad_scale: 1.152921504606847e+18 2024-08-15 01:32:32,481 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2942850.0, ans=0.1 2024-08-15 01:32:35,224 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2942850.0, ans=0.1 2024-08-15 01:32:43,985 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2942950.0, ans=0.0 2024-08-15 01:32:53,208 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2942950.0, ans=0.125 2024-08-15 01:33:15,396 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2943150.0, ans=0.0 2024-08-15 01:33:36,218 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 4500, loss[loss=0.1237, beats_loss=0.008151, ecapa_loss=0.0001604, whisper_loss=0.114, over 23000.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01058, ecapa_loss=0.0001539, whisper_loss=0.0912, over 3923776.18 frames. ], batch size: 90, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:33:55,247 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 23 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-15 01:34:03,616 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 18 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-15 01:34:13,751 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 24 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-15 01:34:19,488 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2943650.0, ans=0.2 2024-08-15 01:34:23,170 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.884e+01 2.368e+01 2.661e+01 3.187e+01 2.204e+02, threshold=5.323e+01, percent-clipped=1.0 2024-08-15 01:34:28,765 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.28 vs. limit=15.0 2024-08-15 01:34:42,790 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 4550, loss[loss=0.1107, beats_loss=0.01206, ecapa_loss=0.0001292, whisper_loss=0.0973, over 23621.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01066, ecapa_loss=0.0001534, whisper_loss=0.09079, over 3925680.04 frames. ], batch size: 92, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:34:52,130 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-15 01:34:52,384 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2943850.0, ans=0.125 2024-08-15 01:34:53,943 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.97 vs. limit=15.0 2024-08-15 01:34:59,837 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.78 vs. limit=15.0 2024-08-15 01:35:04,641 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 01:35:05,988 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2943950.0, ans=0.0 2024-08-15 01:35:22,979 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2944150.0, ans=0.025 2024-08-15 01:35:25,268 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 27 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-15 01:35:28,952 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 20 from LS+wenet, 30 from Vox, 41 fro AS 2024-08-15 01:35:48,575 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 4600, loss[loss=0.09525, beats_loss=0.0136, ecapa_loss=0.0001401, whisper_loss=0.08025, over 20465.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01066, ecapa_loss=0.0001539, whisper_loss=0.09053, over 3912795.18 frames. ], batch size: 79, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:36:04,543 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 23 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-15 01:36:12,334 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2944450.0, ans=0.125 2024-08-15 01:36:19,667 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-15 01:36:22,804 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2944550.0, ans=0.125 2024-08-15 01:36:24,161 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2944550.0, ans=0.125 2024-08-15 01:36:33,901 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.934e+01 2.310e+01 2.603e+01 2.915e+01 4.398e+01, threshold=5.206e+01, percent-clipped=0.0 2024-08-15 01:36:34,773 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.63 vs. limit=15.0 2024-08-15 01:36:42,096 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 29 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-15 01:36:53,783 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 4650, loss[loss=0.09554, beats_loss=0.009789, ecapa_loss=0.0001969, whisper_loss=0.08379, over 21294.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01065, ecapa_loss=0.0001543, whisper_loss=0.09048, over 3907191.63 frames. ], batch size: 94, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:36:55,681 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2944850.0, ans=0.125 2024-08-15 01:37:00,727 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2944850.0, ans=0.07 2024-08-15 01:37:03,124 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-15 01:37:05,119 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.19 vs. limit=15.0 2024-08-15 01:37:08,902 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2944950.0, ans=0.125 2024-08-15 01:37:08,955 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2944950.0, ans=0.0 2024-08-15 01:37:09,968 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-15 01:37:16,234 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 28 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-15 01:37:16,604 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2944950.0, ans=0.125 2024-08-15 01:37:18,251 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.96 vs. limit=15.0 2024-08-15 01:37:27,496 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.16 vs. limit=15.0 2024-08-15 01:37:31,799 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.99 vs. limit=15.0 2024-08-15 01:37:31,903 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.68 vs. limit=10.0 2024-08-15 01:37:36,562 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.whiten.whitening_limit, batch_count=2945150.0, ans=12.0 2024-08-15 01:37:41,515 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2945150.0, ans=0.07 2024-08-15 01:37:45,090 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 22 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-15 01:37:45,590 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.11 vs. limit=15.0 2024-08-15 01:37:46,435 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 23 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-15 01:37:48,965 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 33 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-15 01:37:55,403 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-15 01:37:59,186 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 4700, loss[loss=0.08715, beats_loss=0.01097, ecapa_loss=0.0001298, whisper_loss=0.07488, over 14514.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0106, ecapa_loss=0.0001537, whisper_loss=0.09144, over 3891048.41 frames. ], batch size: 56, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:38:02,417 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.86 vs. limit=15.0 2024-08-15 01:38:03,596 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2945350.0, ans=0.125 2024-08-15 01:38:12,678 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2945450.0, ans=0.0 2024-08-15 01:38:24,439 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2945550.0, ans=0.125 2024-08-15 01:38:27,075 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 34 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-15 01:38:36,132 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 28 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-15 01:38:37,616 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2945650.0, ans=0.125 2024-08-15 01:38:44,921 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.361e+01 2.586e+01 2.935e+01 3.925e+01, threshold=5.173e+01, percent-clipped=0.0 2024-08-15 01:38:45,058 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-15 01:38:48,365 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2945650.0, ans=0.2 2024-08-15 01:38:49,517 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2945650.0, ans=0.125 2024-08-15 01:38:52,996 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 14 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-15 01:38:59,684 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2945750.0, ans=0.125 2024-08-15 01:39:05,277 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 4750, loss[loss=0.105, beats_loss=0.01076, ecapa_loss=0.0001486, whisper_loss=0.0928, over 19329.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01058, ecapa_loss=0.0001542, whisper_loss=0.09128, over 3906637.60 frames. ], batch size: 77, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:39:26,182 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2945950.0, ans=0.0 2024-08-15 01:39:36,804 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 27 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-15 01:39:39,683 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2946050.0, ans=0.125 2024-08-15 01:39:43,518 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2946050.0, ans=0.0 2024-08-15 01:40:00,584 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2946250.0, ans=0.1 2024-08-15 01:40:14,052 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 4800, loss[loss=0.1165, beats_loss=0.01078, ecapa_loss=0.000126, whisper_loss=0.1044, over 18997.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01067, ecapa_loss=0.0001547, whisper_loss=0.09082, over 3906501.50 frames. ], batch size: 72, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:40:23,382 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2946350.0, ans=0.1 2024-08-15 01:40:32,848 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2946450.0, ans=0.125 2024-08-15 01:40:36,399 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2946450.0, ans=0.0 2024-08-15 01:40:37,528 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=2946450.0, ans=10.0 2024-08-15 01:40:53,355 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2946550.0, ans=0.025 2024-08-15 01:40:54,806 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2946550.0, ans=0.1 2024-08-15 01:41:04,566 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 32 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-15 01:41:07,536 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.617e+01 2.219e+01 2.446e+01 2.733e+01 3.979e+01, threshold=4.892e+01, percent-clipped=0.0 2024-08-15 01:41:30,786 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 4850, loss[loss=0.1126, beats_loss=0.01216, ecapa_loss=0.0001283, whisper_loss=0.09913, over 23614.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01062, ecapa_loss=0.000155, whisper_loss=0.09143, over 3890826.15 frames. ], batch size: 92, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:41:30,941 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 27 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-15 01:41:54,897 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2946950.0, ans=0.125 2024-08-15 01:41:55,802 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-15 01:41:57,453 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2946950.0, ans=0.04949747468305833 2024-08-15 01:42:07,888 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2947050.0, ans=0.1 2024-08-15 01:42:13,329 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 18 from LS+wenet, 11 from Vox, 35 fro AS 2024-08-15 01:42:27,829 INFO [train_multi_KD3.py:844] (1/4) A total of 96 cuts. 28 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-15 01:42:40,534 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2947250.0, ans=0.95 2024-08-15 01:42:49,376 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 4900, loss[loss=0.09536, beats_loss=0.01394, ecapa_loss=0.0001193, whisper_loss=0.08023, over 22990.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01058, ecapa_loss=0.0001543, whisper_loss=0.09197, over 3868902.49 frames. ], batch size: 93, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:42:49,562 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 24 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-15 01:42:49,758 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2947350.0, ans=0.0 2024-08-15 01:42:59,577 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2947350.0, ans=0.0 2024-08-15 01:43:13,012 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2947450.0, ans=0.125 2024-08-15 01:43:37,260 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2947650.0, ans=0.125 2024-08-15 01:43:42,433 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.89 vs. limit=22.5 2024-08-15 01:43:42,818 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.972e+01 2.371e+01 2.641e+01 2.917e+01 5.290e+01, threshold=5.283e+01, percent-clipped=1.0 2024-08-15 01:43:43,354 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2947650.0, ans=0.125 2024-08-15 01:43:43,396 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2947650.0, ans=0.0 2024-08-15 01:43:46,033 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2947650.0, ans=0.125 2024-08-15 01:43:53,654 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 18 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-15 01:43:54,620 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.55 vs. limit=10.0 2024-08-15 01:43:57,861 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 21 from LS+wenet, 15 from Vox, 49 fro AS 2024-08-15 01:44:04,838 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 4950, loss[loss=0.1115, beats_loss=0.006962, ecapa_loss=0.0001689, whisper_loss=0.1029, over 15096.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01059, ecapa_loss=0.0001538, whisper_loss=0.09104, over 3869603.61 frames. ], batch size: 56, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:44:10,769 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 32 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-15 01:44:37,881 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2948050.0, ans=0.0 2024-08-15 01:44:46,374 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.77 vs. limit=22.5 2024-08-15 01:44:50,233 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2948150.0, ans=0.2 2024-08-15 01:44:51,017 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 13 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-15 01:45:01,622 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 19 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-15 01:45:03,221 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.79 vs. limit=15.0 2024-08-15 01:45:13,206 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 5000, loss[loss=0.07498, beats_loss=0.01457, ecapa_loss=0.0001614, whisper_loss=0.0588, over 20091.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01067, ecapa_loss=0.0001548, whisper_loss=0.09035, over 3861755.14 frames. ], batch size: 87, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:45:39,805 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2948550.0, ans=0.2 2024-08-15 01:45:41,117 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2948550.0, ans=0.125 2024-08-15 01:45:58,644 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.804e+01 2.280e+01 2.510e+01 2.795e+01 4.349e+01, threshold=5.019e+01, percent-clipped=0.0 2024-08-15 01:46:08,375 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2948750.0, ans=0.125 2024-08-15 01:46:08,698 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.80 vs. limit=6.0 2024-08-15 01:46:09,288 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 27 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-15 01:46:14,397 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2948750.0, ans=0.05 2024-08-15 01:46:17,860 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 5050, loss[loss=0.1004, beats_loss=0.01087, ecapa_loss=0.0001721, whisper_loss=0.08776, over 17937.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01065, ecapa_loss=0.0001554, whisper_loss=0.09106, over 3870111.99 frames. ], batch size: 75, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:46:21,993 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-15 01:46:29,581 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.60 vs. limit=15.0 2024-08-15 01:46:44,444 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2949050.0, ans=0.125 2024-08-15 01:46:45,969 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.718e+01 2024-08-15 01:47:01,952 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.36 vs. limit=15.0 2024-08-15 01:47:17,681 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-15 01:47:23,235 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 5100, loss[loss=0.1018, beats_loss=0.01157, ecapa_loss=0.0001447, whisper_loss=0.08883, over 15524.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01068, ecapa_loss=0.0001534, whisper_loss=0.09164, over 3871297.93 frames. ], batch size: 63, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:47:23,683 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2949350.0, ans=0.09899494936611666 2024-08-15 01:47:45,622 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 36 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-15 01:47:48,041 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-15 01:47:50,639 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 22 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-15 01:47:54,705 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2949550.0, ans=0.0 2024-08-15 01:47:55,067 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.36 vs. limit=15.0 2024-08-15 01:48:01,073 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2949650.0, ans=0.2 2024-08-15 01:48:08,691 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.924e+01 2.324e+01 2.645e+01 2.910e+01 4.236e+01, threshold=5.291e+01, percent-clipped=0.0 2024-08-15 01:48:10,082 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 22 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-15 01:48:15,173 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 22 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-15 01:48:25,962 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2949750.0, ans=0.125 2024-08-15 01:48:27,069 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 34 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-15 01:48:28,092 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 5150, loss[loss=0.1208, beats_loss=0.01034, ecapa_loss=0.0001658, whisper_loss=0.1088, over 21967.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01069, ecapa_loss=0.0001526, whisper_loss=0.09203, over 3901914.73 frames. ], batch size: 90, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:48:29,155 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.59 vs. limit=5.0 2024-08-15 01:48:33,543 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2949850.0, ans=0.125 2024-08-15 01:48:34,469 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 19 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-15 01:48:37,084 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 17 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-15 01:48:37,231 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2949850.0, ans=0.125 2024-08-15 01:48:43,117 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.58 vs. limit=15.0 2024-08-15 01:48:43,708 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-15 01:49:02,896 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 21 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-15 01:49:15,190 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2950150.0, ans=0.0 2024-08-15 01:49:23,188 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2950250.0, ans=0.125 2024-08-15 01:49:24,268 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2950250.0, ans=0.1 2024-08-15 01:49:26,344 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.24 vs. limit=15.0 2024-08-15 01:49:32,161 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 25 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-15 01:49:33,249 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 5200, loss[loss=0.09834, beats_loss=0.01281, ecapa_loss=0.0001249, whisper_loss=0.08427, over 21592.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01062, ecapa_loss=0.0001525, whisper_loss=0.09214, over 3902709.89 frames. ], batch size: 85, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:49:33,474 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 29 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-15 01:49:46,940 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2950450.0, ans=0.125 2024-08-15 01:49:54,961 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2950450.0, ans=0.2 2024-08-15 01:50:12,313 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 25 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-15 01:50:18,065 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2950650.0, ans=0.125 2024-08-15 01:50:19,481 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.65 vs. limit=15.0 2024-08-15 01:50:19,926 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.309e+01 2.539e+01 2.841e+01 2.667e+02, threshold=5.077e+01, percent-clipped=2.0 2024-08-15 01:50:23,525 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.42 vs. limit=12.0 2024-08-15 01:50:33,937 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2950750.0, ans=0.0 2024-08-15 01:50:35,089 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 32 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-15 01:50:40,254 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 5250, loss[loss=0.1155, beats_loss=0.008802, ecapa_loss=0.0001354, whisper_loss=0.1054, over 21880.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01056, ecapa_loss=0.0001522, whisper_loss=0.09203, over 3866446.32 frames. ], batch size: 84, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:50:42,896 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 23 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-15 01:50:43,188 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2950850.0, ans=0.125 2024-08-15 01:50:46,993 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 14 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-15 01:50:47,327 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2950850.0, ans=0.125 2024-08-15 01:50:53,437 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.43 vs. limit=15.0 2024-08-15 01:50:58,232 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2950950.0, ans=0.5 2024-08-15 01:51:01,262 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2950950.0, ans=0.125 2024-08-15 01:51:15,436 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2951050.0, ans=0.2 2024-08-15 01:51:27,361 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.94 vs. limit=15.0 2024-08-15 01:51:28,953 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.33 vs. limit=12.0 2024-08-15 01:51:41,344 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2951250.0, ans=0.125 2024-08-15 01:51:44,123 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 17 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-15 01:51:45,399 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 21 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-15 01:51:48,618 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2951250.0, ans=0.125 2024-08-15 01:51:51,130 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 5300, loss[loss=0.117, beats_loss=0.009166, ecapa_loss=0.0001565, whisper_loss=0.1063, over 16183.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01054, ecapa_loss=0.0001515, whisper_loss=0.09185, over 3892512.63 frames. ], batch size: 63, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:52:01,752 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 26 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-15 01:52:03,090 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-15 01:52:09,551 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2951450.0, ans=0.125 2024-08-15 01:52:17,601 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=2951450.0, ans=0.95 2024-08-15 01:52:38,478 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 13 from LS+wenet, 29 from Vox, 28 fro AS 2024-08-15 01:52:43,974 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.836e+01 2.269e+01 2.498e+01 2.805e+01 4.853e+01, threshold=4.996e+01, percent-clipped=0.0 2024-08-15 01:52:51,733 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.47 vs. limit=6.0 2024-08-15 01:53:06,785 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2951850.0, ans=0.125 2024-08-15 01:53:07,536 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 5350, loss[loss=0.08988, beats_loss=0.01126, ecapa_loss=0.0001325, whisper_loss=0.0773, over 16707.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01052, ecapa_loss=0.000151, whisper_loss=0.09144, over 3878717.73 frames. ], batch size: 67, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:53:09,056 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.29 vs. limit=15.0 2024-08-15 01:53:09,820 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 24 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-15 01:53:10,000 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2951850.0, ans=0.125 2024-08-15 01:53:21,873 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2951850.0, ans=0.125 2024-08-15 01:53:39,772 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2952050.0, ans=0.125 2024-08-15 01:53:47,780 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.58 vs. limit=15.0 2024-08-15 01:53:51,296 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 24 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-15 01:53:54,668 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2952150.0, ans=0.125 2024-08-15 01:54:04,566 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2952150.0, ans=0.125 2024-08-15 01:54:09,669 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2952150.0, ans=0.1 2024-08-15 01:54:09,716 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2952150.0, ans=0.2 2024-08-15 01:54:18,169 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2952250.0, ans=0.2 2024-08-15 01:54:26,659 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 5400, loss[loss=0.1072, beats_loss=0.009687, ecapa_loss=0.0001796, whisper_loss=0.09576, over 17421.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01053, ecapa_loss=0.0001519, whisper_loss=0.0914, over 3872409.83 frames. ], batch size: 71, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:54:36,702 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.67 vs. limit=15.0 2024-08-15 01:55:19,513 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.928e+01 2.348e+01 2.605e+01 2.891e+01 6.130e+01, threshold=5.210e+01, percent-clipped=1.0 2024-08-15 01:55:25,973 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.14 vs. limit=15.0 2024-08-15 01:55:27,540 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2952750.0, ans=0.125 2024-08-15 01:55:31,964 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2952750.0, ans=0.0 2024-08-15 01:55:36,524 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 24 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-15 01:55:43,394 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2952850.0, ans=0.05 2024-08-15 01:55:44,274 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 5450, loss[loss=0.0861, beats_loss=0.01207, ecapa_loss=0.0001566, whisper_loss=0.07247, over 20202.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01057, ecapa_loss=0.0001509, whisper_loss=0.09191, over 3879892.65 frames. ], batch size: 84, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:55:45,841 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-15 01:55:58,263 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.23 vs. limit=10.0 2024-08-15 01:56:01,067 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2952950.0, ans=0.1 2024-08-15 01:56:01,188 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2952950.0, ans=0.125 2024-08-15 01:56:48,710 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 19 from LS+wenet, 23 from Vox, 20 fro AS 2024-08-15 01:56:53,653 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2953250.0, ans=0.125 2024-08-15 01:56:56,246 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 20 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-15 01:57:06,012 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 5500, loss[loss=0.1159, beats_loss=0.009259, ecapa_loss=0.0001164, whisper_loss=0.1054, over 21485.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01055, ecapa_loss=0.0001513, whisper_loss=0.09196, over 3881146.86 frames. ], batch size: 80, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:57:22,176 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-15 01:58:04,872 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.740e+01 2.277e+01 2.522e+01 2.854e+01 1.046e+02, threshold=5.045e+01, percent-clipped=1.0 2024-08-15 01:58:16,042 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.42 vs. limit=22.5 2024-08-15 01:58:17,375 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 27 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-15 01:58:27,361 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 5550, loss[loss=0.09925, beats_loss=0.01279, ecapa_loss=0.0001347, whisper_loss=0.0851, over 19403.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01062, ecapa_loss=0.000151, whisper_loss=0.0916, over 3921714.02 frames. ], batch size: 79, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:58:29,962 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2953850.0, ans=0.0 2024-08-15 01:59:02,143 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 01:59:17,132 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 17 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-15 01:59:47,406 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 5600, loss[loss=0.09953, beats_loss=0.009851, ecapa_loss=0.0001546, whisper_loss=0.08813, over 21885.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01065, ecapa_loss=0.0001521, whisper_loss=0.0909, over 3901340.37 frames. ], batch size: 88, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:59:52,787 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 22 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-15 01:59:54,642 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2954350.0, ans=0.1 2024-08-15 01:59:57,773 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-15 02:00:13,716 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2954450.0, ans=0.125 2024-08-15 02:00:13,825 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.363e+01 2024-08-15 02:00:18,579 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 17 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-15 02:00:20,753 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.28 vs. limit=10.0 2024-08-15 02:00:43,114 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.248e+01 2.477e+01 2.810e+01 7.862e+01, threshold=4.953e+01, percent-clipped=1.0 2024-08-15 02:00:49,905 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 25 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-15 02:01:06,681 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 5650, loss[loss=0.1149, beats_loss=0.00992, ecapa_loss=0.0001592, whisper_loss=0.1034, over 16281.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0107, ecapa_loss=0.0001514, whisper_loss=0.09063, over 3910085.40 frames. ], batch size: 66, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:01:20,377 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 34 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-15 02:01:20,602 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2954950.0, ans=0.125 2024-08-15 02:01:42,714 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2955050.0, ans=0.09899494936611666 2024-08-15 02:01:49,164 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2955050.0, ans=0.1 2024-08-15 02:01:57,218 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-15 02:01:57,579 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2955150.0, ans=0.125 2024-08-15 02:02:12,527 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2955250.0, ans=0.05 2024-08-15 02:02:17,888 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2955350.0, ans=0.2 2024-08-15 02:02:18,556 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 5700, loss[loss=0.1045, beats_loss=0.01086, ecapa_loss=0.0001728, whisper_loss=0.09193, over 22097.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01073, ecapa_loss=0.0001534, whisper_loss=0.09034, over 3934086.86 frames. ], batch size: 93, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:02:18,936 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2955350.0, ans=0.125 2024-08-15 02:02:35,215 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.86 vs. limit=10.0 2024-08-15 02:02:38,504 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 30 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-15 02:02:52,249 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2955550.0, ans=0.05 2024-08-15 02:03:06,365 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.916e+01 2.504e+01 2.865e+01 3.291e+01 2.428e+02, threshold=5.731e+01, percent-clipped=5.0 2024-08-15 02:03:11,610 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2955650.0, ans=0.125 2024-08-15 02:03:19,819 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2955750.0, ans=0.125 2024-08-15 02:03:24,895 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2955750.0, ans=0.125 2024-08-15 02:03:27,305 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 5750, loss[loss=0.09203, beats_loss=0.012, ecapa_loss=0.0001616, whisper_loss=0.07841, over 21159.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01084, ecapa_loss=0.0001537, whisper_loss=0.08976, over 3933086.92 frames. ], batch size: 86, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:03:32,091 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 23 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-15 02:03:34,664 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 20 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-15 02:03:36,211 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 02:04:15,613 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 16 from Vox, 50 fro AS 2024-08-15 02:04:33,233 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2956250.0, ans=0.125 2024-08-15 02:04:35,542 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 5800, loss[loss=0.1004, beats_loss=0.01004, ecapa_loss=0.0001487, whisper_loss=0.08887, over 14984.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01087, ecapa_loss=0.0001532, whisper_loss=0.08957, over 3920234.87 frames. ], batch size: 59, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:04:39,471 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 24 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-15 02:04:41,819 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 40 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-15 02:04:42,955 INFO [train_multi_KD3.py:844] (1/4) A total of 97 cuts. 25 from LS+wenet, 30 from Vox, 42 fro AS 2024-08-15 02:04:47,328 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2956450.0, ans=0.09899494936611666 2024-08-15 02:04:49,513 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2956450.0, ans=0.125 2024-08-15 02:04:57,624 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2956450.0, ans=0.125 2024-08-15 02:04:58,735 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2956450.0, ans=0.0 2024-08-15 02:05:21,523 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2956650.0, ans=0.0 2024-08-15 02:05:25,421 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.822e+01 2.342e+01 2.666e+01 2.998e+01 4.632e+01, threshold=5.332e+01, percent-clipped=0.0 2024-08-15 02:05:42,668 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2956750.0, ans=0.125 2024-08-15 02:05:45,511 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 5850, loss[loss=0.09312, beats_loss=0.0134, ecapa_loss=0.0001437, whisper_loss=0.07829, over 22150.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01078, ecapa_loss=0.0001545, whisper_loss=0.09041, over 3904414.79 frames. ], batch size: 94, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:05:55,681 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2956850.0, ans=0.2 2024-08-15 02:05:55,735 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2956850.0, ans=0.0 2024-08-15 02:05:56,676 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 18 from LS+wenet, 20 from Vox, 16 fro AS 2024-08-15 02:06:10,007 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=2956950.0, ans=0.025 2024-08-15 02:06:26,129 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-15 02:06:26,591 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.85 vs. limit=15.0 2024-08-15 02:06:45,240 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2957250.0, ans=0.0 2024-08-15 02:06:47,999 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 17 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-15 02:06:48,415 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2957250.0, ans=0.0 2024-08-15 02:06:53,033 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2957250.0, ans=0.0 2024-08-15 02:06:54,059 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2957250.0, ans=0.125 2024-08-15 02:06:58,260 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 23 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-15 02:06:59,390 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 5900, loss[loss=0.1117, beats_loss=0.009217, ecapa_loss=0.0001675, whisper_loss=0.1008, over 17273.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01069, ecapa_loss=0.0001554, whisper_loss=0.09065, over 3853911.18 frames. ], batch size: 67, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:07:00,121 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.61 vs. limit=6.0 2024-08-15 02:07:14,116 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2957450.0, ans=0.125 2024-08-15 02:07:15,096 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 26 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-15 02:07:29,502 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2957550.0, ans=0.0 2024-08-15 02:07:54,757 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.695e+01 2.296e+01 2.523e+01 2.887e+01 4.052e+01, threshold=5.046e+01, percent-clipped=0.0 2024-08-15 02:07:56,581 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 15 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-15 02:08:05,512 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-15 02:08:06,805 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 25 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-15 02:08:15,768 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 5950, loss[loss=0.09832, beats_loss=0.008879, ecapa_loss=0.0001799, whisper_loss=0.08765, over 22216.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01062, ecapa_loss=0.0001552, whisper_loss=0.09104, over 3872811.53 frames. ], batch size: 91, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:08:21,131 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 16 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-15 02:08:27,803 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2957950.0, ans=0.07 2024-08-15 02:08:42,513 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 35 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-15 02:08:42,707 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 02:08:55,536 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 14 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-15 02:09:26,939 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.09 vs. limit=22.5 2024-08-15 02:09:38,339 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2958250.0, ans=0.0 2024-08-15 02:09:40,782 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 6000, loss[loss=0.1166, beats_loss=0.006832, ecapa_loss=0.0001551, whisper_loss=0.1082, over 16589.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01069, ecapa_loss=0.0001542, whisper_loss=0.09093, over 3890506.60 frames. ], batch size: 61, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:09:40,783 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-15 02:10:48,018 INFO [train_multi_KD3.py:1149] (1/4) Epoch 21, validation on ASR_libri: loss=0.2535, beats_loss=0, ecapa_loss=0.0005526, whisper_loss=0.2479, over 922467.00 frames. 2024-08-15 02:11:13,999 INFO [train_multi_KD3.py:1149] (1/4) Epoch 21, validation on SV_voxceleb1: loss=0.004315, beats_loss=0, ecapa_loss=0.0004315, whisper_loss=0, over 939242.00 frames. 2024-08-15 02:14:20,448 INFO [train_multi_KD3.py:1149] (1/4) Epoch 21, validation on AT_audioset: loss=0.0235, beats_loss=0.0235, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 02:14:20,452 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 30639MB 2024-08-15 02:14:23,521 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 22 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-15 02:14:41,902 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2958450.0, ans=0.09899494936611666 2024-08-15 02:15:17,964 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.856e+01 2.230e+01 2.610e+01 2.865e+01 2.775e+02, threshold=5.221e+01, percent-clipped=3.0 2024-08-15 02:15:23,759 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2958750.0, ans=0.2 2024-08-15 02:15:27,628 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2958750.0, ans=0.125 2024-08-15 02:15:27,652 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2958750.0, ans=0.0 2024-08-15 02:15:30,822 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys.whitening_limit, batch_count=2958750.0, ans=6.0 2024-08-15 02:15:38,009 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 6050, loss[loss=0.0853, beats_loss=0.0113, ecapa_loss=0.000157, whisper_loss=0.07243, over 17828.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01071, ecapa_loss=0.0001544, whisper_loss=0.09027, over 3888166.12 frames. ], batch size: 74, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:15:40,659 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 16 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-15 02:15:57,964 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2958950.0, ans=0.2 2024-08-15 02:16:00,473 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2958950.0, ans=0.125 2024-08-15 02:16:08,615 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.33 vs. limit=15.0 2024-08-15 02:16:26,928 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.07 vs. limit=15.0 2024-08-15 02:16:34,693 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2959250.0, ans=0.0 2024-08-15 02:16:35,912 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2959250.0, ans=0.125 2024-08-15 02:16:43,674 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 6100, loss[loss=0.0943, beats_loss=0.01178, ecapa_loss=0.0001255, whisper_loss=0.08126, over 22597.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01074, ecapa_loss=0.000154, whisper_loss=0.09, over 3911768.09 frames. ], batch size: 91, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:16:50,883 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2959350.0, ans=0.125 2024-08-15 02:17:00,370 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.75 vs. limit=15.0 2024-08-15 02:17:01,041 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-15 02:17:13,984 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2959550.0, ans=0.1 2024-08-15 02:17:29,578 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.864e+01 2.268e+01 2.472e+01 3.000e+01 1.337e+02, threshold=4.943e+01, percent-clipped=1.0 2024-08-15 02:17:35,267 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2959750.0, ans=0.125 2024-08-15 02:17:38,238 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2959750.0, ans=0.125 2024-08-15 02:17:39,219 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 15 from LS+wenet, 29 from Vox, 27 fro AS 2024-08-15 02:17:42,281 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.63 vs. limit=15.0 2024-08-15 02:17:49,711 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 6150, loss[loss=0.08109, beats_loss=0.01187, ecapa_loss=0.0001462, whisper_loss=0.06776, over 17674.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01075, ecapa_loss=0.0001543, whisper_loss=0.08992, over 3909485.52 frames. ], batch size: 71, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:17:57,757 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 26 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-15 02:17:58,745 WARNING [optim.py:496] (1/4) Scaling gradients by 0.09473193436861038, model_norm_threshold=49.43223190307617 2024-08-15 02:17:58,937 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.0.norm.log_scale with proportion 0.09, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.424e+04, grad_sumsq=2.424e+04, orig_rms_sq=1.000e+00 2024-08-15 02:18:07,304 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2959950.0, ans=0.2 2024-08-15 02:19:01,159 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 6200, loss[loss=0.1163, beats_loss=0.01131, ecapa_loss=0.00015, whisper_loss=0.1034, over 23455.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01069, ecapa_loss=0.0001547, whisper_loss=0.09024, over 3899694.98 frames. ], batch size: 93, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:19:01,281 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 17 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-15 02:19:04,032 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 26 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-15 02:19:09,565 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=6.711e-02 2024-08-15 02:19:16,454 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2960450.0, ans=0.125 2024-08-15 02:19:32,094 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 29 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-15 02:19:43,982 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-15 02:19:49,399 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.384e+01 2.613e+01 3.033e+01 5.218e+02, threshold=5.226e+01, percent-clipped=4.0 2024-08-15 02:20:00,590 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2960750.0, ans=0.125 2024-08-15 02:20:09,414 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 6250, loss[loss=0.08189, beats_loss=0.01075, ecapa_loss=0.000114, whisper_loss=0.07, over 14456.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01065, ecapa_loss=0.0001542, whisper_loss=0.09047, over 3929898.45 frames. ], batch size: 54, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:20:09,883 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2960850.0, ans=0.0 2024-08-15 02:20:11,276 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2960850.0, ans=0.125 2024-08-15 02:20:20,418 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 21 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-15 02:20:41,521 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 21 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-15 02:20:46,258 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.43 vs. limit=15.0 2024-08-15 02:20:47,112 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 24 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-15 02:20:49,304 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.58 vs. limit=15.0 2024-08-15 02:20:56,670 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2961150.0, ans=0.125 2024-08-15 02:21:06,289 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 22 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-15 02:21:16,717 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 24 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-15 02:21:17,824 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 6300, loss[loss=0.09088, beats_loss=0.01277, ecapa_loss=0.0001635, whisper_loss=0.07647, over 18587.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01065, ecapa_loss=0.0001534, whisper_loss=0.09072, over 3910476.15 frames. ], batch size: 79, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:21:22,580 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.19 vs. limit=22.5 2024-08-15 02:21:26,927 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 24 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-15 02:21:32,778 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2961450.0, ans=0.2 2024-08-15 02:21:36,631 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2961450.0, ans=0.0 2024-08-15 02:21:53,739 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 21 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-15 02:21:57,981 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2961650.0, ans=0.125 2024-08-15 02:22:04,209 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.810e+01 2.318e+01 2.564e+01 2.783e+01 4.377e+01, threshold=5.129e+01, percent-clipped=0.0 2024-08-15 02:22:15,066 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2961750.0, ans=0.2 2024-08-15 02:22:22,486 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2961850.0, ans=0.0 2024-08-15 02:22:23,230 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 6350, loss[loss=0.1057, beats_loss=0.008382, ecapa_loss=0.000169, whisper_loss=0.09567, over 14131.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01068, ecapa_loss=0.0001523, whisper_loss=0.09053, over 3900340.59 frames. ], batch size: 57, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:22:28,104 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 31 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-15 02:22:39,130 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.58 vs. limit=6.0 2024-08-15 02:23:02,799 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2962150.0, ans=0.125 2024-08-15 02:23:19,529 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.88 vs. limit=22.5 2024-08-15 02:23:22,751 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 25 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-15 02:23:24,105 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-15 02:23:29,213 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 6400, loss[loss=0.1048, beats_loss=0.009457, ecapa_loss=0.0001814, whisper_loss=0.0935, over 17123.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01061, ecapa_loss=0.0001533, whisper_loss=0.0908, over 3886798.88 frames. ], batch size: 71, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:23:34,357 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 16 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-15 02:23:46,253 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 16 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-15 02:23:50,441 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 21 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-15 02:23:54,358 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2962550.0, ans=0.125 2024-08-15 02:24:01,765 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2962550.0, ans=0.125 2024-08-15 02:24:14,869 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.481e+01 2.409e+01 2.750e+01 3.071e+01 4.179e+02, threshold=5.499e+01, percent-clipped=4.0 2024-08-15 02:24:20,151 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 22 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-15 02:24:27,223 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2962750.0, ans=0.0 2024-08-15 02:24:29,962 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2962750.0, ans=0.125 2024-08-15 02:24:33,951 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.51 vs. limit=15.0 2024-08-15 02:24:34,424 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 6450, loss[loss=0.1119, beats_loss=0.01108, ecapa_loss=0.0001622, whisper_loss=0.09916, over 19791.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01074, ecapa_loss=0.000152, whisper_loss=0.09006, over 3869274.19 frames. ], batch size: 79, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:24:34,626 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 36 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-15 02:24:34,959 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2962850.0, ans=0.0 2024-08-15 02:24:37,271 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2962850.0, ans=0.125 2024-08-15 02:24:39,523 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 24 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-15 02:24:45,584 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2962850.0, ans=0.125 2024-08-15 02:24:53,607 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2962950.0, ans=0.125 2024-08-15 02:24:53,774 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.42 vs. limit=15.0 2024-08-15 02:25:02,646 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2963050.0, ans=0.125 2024-08-15 02:25:03,777 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 23 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-15 02:25:06,999 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2963050.0, ans=0.0 2024-08-15 02:25:17,768 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=2963150.0, ans=15.0 2024-08-15 02:25:22,236 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 20 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-15 02:25:23,526 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 24 from LS+wenet, 19 from Vox, 52 fro AS 2024-08-15 02:25:25,123 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 23 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-15 02:25:33,088 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 26 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-15 02:25:40,485 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 6500, loss[loss=0.1076, beats_loss=0.01117, ecapa_loss=0.000123, whisper_loss=0.09518, over 22322.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01072, ecapa_loss=0.0001524, whisper_loss=0.09039, over 3855141.56 frames. ], batch size: 90, lr: 2.97e-03, grad_scale: 1.152921504606847e+18 2024-08-15 02:25:50,011 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 22 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-15 02:25:50,673 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2963350.0, ans=0.0 2024-08-15 02:26:00,832 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 02:26:04,723 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=6.322e-02 2024-08-15 02:26:08,858 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 29 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-15 02:26:26,995 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.262e+01 2.539e+01 2.761e+01 6.579e+01, threshold=5.077e+01, percent-clipped=1.0 2024-08-15 02:26:41,082 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 22 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-15 02:26:46,360 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 6550, loss[loss=0.1214, beats_loss=0.009819, ecapa_loss=0.0001446, whisper_loss=0.1101, over 18688.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01073, ecapa_loss=0.0001519, whisper_loss=0.09091, over 3900067.05 frames. ], batch size: 71, lr: 2.97e-03, grad_scale: 1.152921504606847e+18 2024-08-15 02:27:05,045 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2963950.0, ans=0.125 2024-08-15 02:27:10,045 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 32 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-15 02:27:20,234 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 28 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-15 02:27:24,492 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2964150.0, ans=0.125 2024-08-15 02:27:26,664 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-15 02:27:50,816 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 6600, loss[loss=0.1117, beats_loss=0.007347, ecapa_loss=0.0001649, whisper_loss=0.1027, over 18853.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01066, ecapa_loss=0.0001526, whisper_loss=0.09152, over 3938561.21 frames. ], batch size: 73, lr: 2.97e-03, grad_scale: 1.152921504606847e+18 2024-08-15 02:27:55,052 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2964350.0, ans=0.2 2024-08-15 02:27:56,571 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2964350.0, ans=0.2 2024-08-15 02:28:11,310 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 19 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-15 02:28:19,528 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.01 vs. limit=22.5 2024-08-15 02:28:35,629 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.905e+01 2.343e+01 2.654e+01 2.960e+01 4.414e+01, threshold=5.309e+01, percent-clipped=0.0 2024-08-15 02:28:44,134 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2964750.0, ans=0.1 2024-08-15 02:28:52,048 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2964750.0, ans=0.1 2024-08-15 02:28:55,383 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 6650, loss[loss=0.1245, beats_loss=0.01011, ecapa_loss=0.000173, whisper_loss=0.1127, over 23318.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01063, ecapa_loss=0.0001537, whisper_loss=0.0918, over 3931633.68 frames. ], batch size: 91, lr: 2.97e-03, grad_scale: 1.152921504606847e+18 2024-08-15 02:29:01,583 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=7.690e-03 2024-08-15 02:29:19,206 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 17 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-15 02:29:26,420 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2965050.0, ans=0.09899494936611666 2024-08-15 02:29:28,762 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 32 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-15 02:29:35,524 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-15 02:29:46,053 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 27 from LS+wenet, 27 from Vox, 21 fro AS 2024-08-15 02:29:48,840 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2965250.0, ans=0.125 2024-08-15 02:30:01,857 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 6700, loss[loss=0.1189, beats_loss=0.008346, ecapa_loss=0.0001512, whisper_loss=0.1091, over 21579.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01047, ecapa_loss=0.0001544, whisper_loss=0.09252, over 3935999.24 frames. ], batch size: 80, lr: 2.97e-03, grad_scale: 1.152921504606847e+18 2024-08-15 02:30:08,123 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 26 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-15 02:30:13,416 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2965350.0, ans=0.0 2024-08-15 02:30:14,356 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 24 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-15 02:30:25,808 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 32 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-15 02:30:35,245 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-15 02:30:51,817 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.875e+01 2.368e+01 2.669e+01 2.995e+01 9.040e+01, threshold=5.338e+01, percent-clipped=3.0 2024-08-15 02:31:02,970 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2965750.0, ans=0.1 2024-08-15 02:31:12,618 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.50 vs. limit=15.0 2024-08-15 02:31:14,451 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 6750, loss[loss=0.1125, beats_loss=0.009966, ecapa_loss=0.0001112, whisper_loss=0.1014, over 17513.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01047, ecapa_loss=0.0001536, whisper_loss=0.09253, over 3904739.59 frames. ], batch size: 64, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:31:16,837 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2965850.0, ans=0.0 2024-08-15 02:31:18,861 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 22 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-15 02:31:28,546 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2965950.0, ans=0.125 2024-08-15 02:31:36,340 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2965950.0, ans=0.125 2024-08-15 02:31:39,563 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2965950.0, ans=0.125 2024-08-15 02:31:41,973 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2965950.0, ans=0.1 2024-08-15 02:32:13,879 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.52 vs. limit=15.0 2024-08-15 02:32:29,970 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 6800, loss[loss=0.1238, beats_loss=0.008715, ecapa_loss=0.0001477, whisper_loss=0.1136, over 21950.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.0105, ecapa_loss=0.0001538, whisper_loss=0.0921, over 3920932.05 frames. ], batch size: 83, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:32:30,681 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2966350.0, ans=0.2 2024-08-15 02:32:33,456 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2966350.0, ans=0.1 2024-08-15 02:33:02,135 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2966550.0, ans=0.125 2024-08-15 02:33:08,520 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2966550.0, ans=0.0 2024-08-15 02:33:25,170 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.273e+01 2.576e+01 2.834e+01 4.792e+01, threshold=5.152e+01, percent-clipped=0.0 2024-08-15 02:33:25,409 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 18 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-15 02:33:36,403 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2966750.0, ans=0.95 2024-08-15 02:33:45,906 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2966850.0, ans=0.0 2024-08-15 02:33:46,677 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 6850, loss[loss=0.102, beats_loss=0.01185, ecapa_loss=0.0001311, whisper_loss=0.08886, over 16709.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01057, ecapa_loss=0.0001532, whisper_loss=0.09161, over 3931130.28 frames. ], batch size: 62, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:33:54,294 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2966850.0, ans=0.0 2024-08-15 02:34:17,700 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2967050.0, ans=0.125 2024-08-15 02:34:34,169 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 17 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-15 02:35:00,264 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2967250.0, ans=0.2 2024-08-15 02:35:02,728 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 18 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-15 02:35:05,692 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 6900, loss[loss=0.09582, beats_loss=0.01187, ecapa_loss=0.0001333, whisper_loss=0.08261, over 20520.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01056, ecapa_loss=0.0001536, whisper_loss=0.09169, over 3926956.59 frames. ], batch size: 80, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:35:06,109 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 20 from LS+wenet, 29 from Vox, 25 fro AS 2024-08-15 02:35:19,728 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2967350.0, ans=0.125 2024-08-15 02:35:24,434 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 21 from LS+wenet, 22 from Vox, 50 fro AS 2024-08-15 02:35:38,743 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 28 from LS+wenet, 17 from Vox, 48 fro AS 2024-08-15 02:35:49,965 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2967550.0, ans=0.125 2024-08-15 02:35:50,928 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 22 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-15 02:36:02,115 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.900e+01 2.306e+01 2.522e+01 2.757e+01 3.704e+01, threshold=5.043e+01, percent-clipped=0.0 2024-08-15 02:36:06,183 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2967650.0, ans=0.5 2024-08-15 02:36:11,452 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 20 from LS+wenet, 19 from Vox, 51 fro AS 2024-08-15 02:36:24,120 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 6950, loss[loss=0.1123, beats_loss=0.01042, ecapa_loss=0.0001574, whisper_loss=0.1003, over 21025.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01061, ecapa_loss=0.000153, whisper_loss=0.09219, over 3948663.15 frames. ], batch size: 86, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:36:29,277 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 21 from LS+wenet, 23 from Vox, 48 fro AS 2024-08-15 02:36:49,624 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2967950.0, ans=0.0 2024-08-15 02:36:54,940 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2968050.0, ans=0.1 2024-08-15 02:36:58,215 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 30 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-15 02:36:58,356 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2968050.0, ans=0.125 2024-08-15 02:37:03,011 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2968050.0, ans=0.125 2024-08-15 02:37:19,452 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 20 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-15 02:37:40,290 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 7000, loss[loss=0.1112, beats_loss=0.01059, ecapa_loss=0.0001637, whisper_loss=0.09896, over 16343.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01065, ecapa_loss=0.0001525, whisper_loss=0.09122, over 3922923.03 frames. ], batch size: 66, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:37:47,676 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 26 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-15 02:38:07,527 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2968450.0, ans=0.125 2024-08-15 02:38:14,574 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.22 vs. limit=15.0 2024-08-15 02:38:20,008 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-15 02:38:38,365 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.758e+01 2.236e+01 2.500e+01 2.764e+01 4.319e+01, threshold=5.000e+01, percent-clipped=0.0 2024-08-15 02:38:43,516 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 26 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-15 02:39:00,323 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 7050, loss[loss=0.1079, beats_loss=0.009186, ecapa_loss=0.000158, whisper_loss=0.09717, over 15858.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01067, ecapa_loss=0.0001527, whisper_loss=0.09089, over 3906622.66 frames. ], batch size: 64, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:39:04,168 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 19 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-15 02:39:24,739 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 20 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-15 02:39:28,216 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 17 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-15 02:39:28,371 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2968950.0, ans=0.125 2024-08-15 02:39:43,363 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-15 02:40:04,948 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 39 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-15 02:40:05,400 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2969250.0, ans=0.125 2024-08-15 02:40:20,727 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 7100, loss[loss=0.08576, beats_loss=0.01046, ecapa_loss=0.0001428, whisper_loss=0.07387, over 15891.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01071, ecapa_loss=0.0001521, whisper_loss=0.0907, over 3874964.92 frames. ], batch size: 64, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:40:40,087 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2969450.0, ans=0.125 2024-08-15 02:40:44,128 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 20 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-15 02:41:00,425 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 20 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-15 02:41:19,177 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.954e+01 2.324e+01 2.523e+01 2.719e+01 3.184e+02, threshold=5.045e+01, percent-clipped=4.0 2024-08-15 02:41:19,683 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2969650.0, ans=0.2 2024-08-15 02:41:32,587 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2969750.0, ans=0.2 2024-08-15 02:41:42,021 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 7150, loss[loss=0.1003, beats_loss=0.01228, ecapa_loss=0.0001605, whisper_loss=0.08642, over 20177.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01073, ecapa_loss=0.0001514, whisper_loss=0.09067, over 3889270.59 frames. ], batch size: 83, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:41:44,763 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2969850.0, ans=0.1 2024-08-15 02:42:15,890 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 23 from LS+wenet, 33 from Vox, 37 fro AS 2024-08-15 02:42:28,602 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 31 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-15 02:42:52,712 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2970250.0, ans=0.125 2024-08-15 02:42:59,566 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 14 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-15 02:43:03,736 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 7200, loss[loss=0.09606, beats_loss=0.01103, ecapa_loss=0.0001409, whisper_loss=0.08363, over 17235.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01067, ecapa_loss=0.0001526, whisper_loss=0.0908, over 3913093.38 frames. ], batch size: 66, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:43:17,036 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 23 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-15 02:43:19,590 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 27 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-15 02:43:26,952 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 21 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-15 02:43:33,105 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 19 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-15 02:44:03,142 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.642e+01 2.341e+01 2.613e+01 2.912e+01 4.502e+01, threshold=5.226e+01, percent-clipped=0.0 2024-08-15 02:44:11,664 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2970750.0, ans=0.035 2024-08-15 02:44:23,576 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2970850.0, ans=0.2 2024-08-15 02:44:24,540 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 7250, loss[loss=0.105, beats_loss=0.00914, ecapa_loss=0.0001999, whisper_loss=0.0939, over 16375.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01063, ecapa_loss=0.0001524, whisper_loss=0.09113, over 3896428.76 frames. ], batch size: 70, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:44:25,659 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.04 vs. limit=15.0 2024-08-15 02:44:35,553 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2970850.0, ans=0.125 2024-08-15 02:44:35,632 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2970850.0, ans=0.1 2024-08-15 02:44:37,079 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2970850.0, ans=0.0 2024-08-15 02:44:38,466 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2970850.0, ans=0.09899494936611666 2024-08-15 02:44:46,681 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2970950.0, ans=0.0 2024-08-15 02:44:53,392 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.31 vs. limit=22.5 2024-08-15 02:45:09,811 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2971050.0, ans=0.0 2024-08-15 02:45:33,215 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2971250.0, ans=0.1 2024-08-15 02:45:35,901 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 14 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-15 02:45:46,096 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 21 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-15 02:45:47,117 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 7300, loss[loss=0.1037, beats_loss=0.01268, ecapa_loss=0.0001171, whisper_loss=0.08986, over 17891.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01057, ecapa_loss=0.0001532, whisper_loss=0.09143, over 3877122.26 frames. ], batch size: 68, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:46:00,673 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2971350.0, ans=0.125 2024-08-15 02:46:10,631 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2971450.0, ans=0.0 2024-08-15 02:46:13,176 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2971450.0, ans=0.1 2024-08-15 02:46:15,004 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 02:46:46,672 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.778e+01 2.342e+01 2.606e+01 2.963e+01 2.884e+02, threshold=5.213e+01, percent-clipped=2.0 2024-08-15 02:46:50,536 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2971650.0, ans=0.125 2024-08-15 02:46:52,421 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2971750.0, ans=0.125 2024-08-15 02:47:02,481 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2971750.0, ans=0.125 2024-08-15 02:47:06,493 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 22 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-15 02:47:09,931 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 7350, loss[loss=0.1094, beats_loss=0.01058, ecapa_loss=0.0001364, whisper_loss=0.09746, over 18235.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01056, ecapa_loss=0.0001531, whisper_loss=0.09112, over 3864055.93 frames. ], batch size: 67, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:47:10,110 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 17 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-15 02:47:15,613 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2971850.0, ans=0.125 2024-08-15 02:47:17,325 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 28 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-15 02:47:33,432 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 23 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-15 02:48:06,207 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2024-08-15 02:48:15,539 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2972250.0, ans=0.125 2024-08-15 02:48:32,590 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 7400, loss[loss=0.08301, beats_loss=0.01374, ecapa_loss=0.0001717, whisper_loss=0.06756, over 16875.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01052, ecapa_loss=0.0001536, whisper_loss=0.09185, over 3861727.43 frames. ], batch size: 73, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:48:32,844 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 17 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-15 02:48:35,990 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2972350.0, ans=0.0 2024-08-15 02:48:39,268 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2972350.0, ans=0.0 2024-08-15 02:48:46,106 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.17 vs. limit=22.5 2024-08-15 02:49:00,835 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2972450.0, ans=0.0 2024-08-15 02:49:06,842 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-15 02:49:18,976 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 21 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-15 02:49:29,145 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2972650.0, ans=0.125 2024-08-15 02:49:30,843 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2972650.0, ans=0.125 2024-08-15 02:49:30,854 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2972650.0, ans=0.0 2024-08-15 02:49:31,638 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.322e+01 2.605e+01 2.983e+01 4.527e+01, threshold=5.211e+01, percent-clipped=0.0 2024-08-15 02:49:38,003 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 21 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-15 02:49:41,197 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2972750.0, ans=0.0 2024-08-15 02:49:53,885 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 7450, loss[loss=0.1077, beats_loss=0.009639, ecapa_loss=0.0001769, whisper_loss=0.09629, over 19913.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01053, ecapa_loss=0.0001546, whisper_loss=0.09174, over 3871136.19 frames. ], batch size: 83, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:50:08,143 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2972850.0, ans=0.1 2024-08-15 02:50:09,012 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 36 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-15 02:50:24,772 INFO [train_multi_KD3.py:844] (1/4) A total of 96 cuts. 22 from LS+wenet, 29 from Vox, 45 fro AS 2024-08-15 02:50:27,917 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 13 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-15 02:50:45,740 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2973150.0, ans=0.0 2024-08-15 02:50:53,831 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 35 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-15 02:50:57,776 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2973150.0, ans=0.2 2024-08-15 02:51:13,198 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 21 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-15 02:51:16,048 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2973350.0, ans=0.2 2024-08-15 02:51:16,927 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 7500, loss[loss=0.1096, beats_loss=0.01054, ecapa_loss=0.0001548, whisper_loss=0.0975, over 21987.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.0105, ecapa_loss=0.0001547, whisper_loss=0.09187, over 3883123.42 frames. ], batch size: 89, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:51:29,222 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2973350.0, ans=0.125 2024-08-15 02:51:31,843 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 32 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-15 02:51:43,365 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2973450.0, ans=0.125 2024-08-15 02:52:02,343 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.93 vs. limit=12.0 2024-08-15 02:52:05,186 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2973650.0, ans=0.125 2024-08-15 02:52:15,814 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.930e+01 2.356e+01 2.622e+01 2.952e+01 4.347e+01, threshold=5.245e+01, percent-clipped=0.0 2024-08-15 02:52:34,311 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 21 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-15 02:52:38,977 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 7550, loss[loss=0.1224, beats_loss=0.007129, ecapa_loss=0.0001543, whisper_loss=0.1137, over 19443.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01054, ecapa_loss=0.0001545, whisper_loss=0.09166, over 3869232.28 frames. ], batch size: 73, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:52:50,458 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2973850.0, ans=0.125 2024-08-15 02:53:35,916 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2974150.0, ans=0.125 2024-08-15 02:53:51,207 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.545e+05 2024-08-15 02:53:51,438 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.65 vs. limit=15.0 2024-08-15 02:53:57,644 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2974350.0, ans=0.125 2024-08-15 02:53:58,671 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 7600, loss[loss=0.1015, beats_loss=0.01103, ecapa_loss=0.0001496, whisper_loss=0.08902, over 20565.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0106, ecapa_loss=0.0001536, whisper_loss=0.09122, over 3829893.05 frames. ], batch size: 80, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:54:06,110 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2974350.0, ans=0.0 2024-08-15 02:54:10,111 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.34 vs. limit=22.5 2024-08-15 02:54:30,439 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.75 vs. limit=15.0 2024-08-15 02:54:31,417 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-15 02:54:35,349 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.54 vs. limit=15.0 2024-08-15 02:54:37,372 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 35 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-15 02:54:37,607 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2974550.0, ans=0.0 2024-08-15 02:54:44,015 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 26 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-15 02:54:45,908 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2974650.0, ans=0.2 2024-08-15 02:54:45,956 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2974650.0, ans=0.0 2024-08-15 02:54:53,353 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2974650.0, ans=0.1 2024-08-15 02:54:55,782 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.830e+01 2.311e+01 2.587e+01 3.162e+01 4.205e+02, threshold=5.175e+01, percent-clipped=3.0 2024-08-15 02:54:56,465 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 21 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-15 02:55:10,981 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 18 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-15 02:55:17,922 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 7650, loss[loss=0.09527, beats_loss=0.01076, ecapa_loss=0.0001319, whisper_loss=0.08319, over 22330.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0106, ecapa_loss=0.0001541, whisper_loss=0.09118, over 3855623.69 frames. ], batch size: 88, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:55:19,540 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.61 vs. limit=15.0 2024-08-15 02:55:43,517 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2974950.0, ans=0.125 2024-08-15 02:56:09,696 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 24 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-15 02:56:15,862 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 22 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-15 02:56:35,093 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 7700, loss[loss=0.1095, beats_loss=0.01036, ecapa_loss=0.0001589, whisper_loss=0.09756, over 22732.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01059, ecapa_loss=0.0001535, whisper_loss=0.09095, over 3850135.58 frames. ], batch size: 92, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:56:35,228 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 25 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-15 02:56:45,448 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 29 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-15 02:56:54,354 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2975450.0, ans=0.09899494936611666 2024-08-15 02:56:59,391 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 39 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-15 02:57:01,954 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2975450.0, ans=0.1 2024-08-15 02:57:03,498 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2975450.0, ans=0.125 2024-08-15 02:57:30,868 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2975650.0, ans=0.0 2024-08-15 02:57:31,905 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.790e+01 2.248e+01 2.489e+01 2.817e+01 2.674e+02, threshold=4.978e+01, percent-clipped=0.0 2024-08-15 02:57:40,274 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.21 vs. limit=15.0 2024-08-15 02:57:52,849 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 7750, loss[loss=0.08939, beats_loss=0.01293, ecapa_loss=0.0001379, whisper_loss=0.07508, over 18134.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01063, ecapa_loss=0.0001532, whisper_loss=0.09002, over 3853563.29 frames. ], batch size: 72, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:58:02,160 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.77 vs. limit=15.0 2024-08-15 02:58:04,630 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2975850.0, ans=0.5 2024-08-15 02:58:05,009 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.38 vs. limit=12.0 2024-08-15 02:58:07,847 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 27 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-15 02:58:10,457 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 14 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-15 02:58:34,291 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.87 vs. limit=8.0 2024-08-15 02:58:37,563 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.77 vs. limit=12.0 2024-08-15 02:58:46,360 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.34 vs. limit=6.0 2024-08-15 02:59:04,293 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2976250.0, ans=0.05 2024-08-15 02:59:09,836 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 7800, loss[loss=0.0641, beats_loss=0.01225, ecapa_loss=0.0001025, whisper_loss=0.05082, over 14884.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01063, ecapa_loss=0.000153, whisper_loss=0.09003, over 3873326.38 frames. ], batch size: 57, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:59:22,736 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2976350.0, ans=0.125 2024-08-15 02:59:23,777 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 22 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-15 02:59:41,473 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2976550.0, ans=0.5 2024-08-15 02:59:53,974 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.72 vs. limit=22.5 2024-08-15 03:00:03,725 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2976650.0, ans=0.125 2024-08-15 03:00:06,183 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.382e+01 2.617e+01 2.970e+01 1.321e+02, threshold=5.235e+01, percent-clipped=4.0 2024-08-15 03:00:18,454 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 24 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-15 03:00:29,691 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2976850.0, ans=0.125 2024-08-15 03:00:30,877 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 7850, loss[loss=0.1011, beats_loss=0.009408, ecapa_loss=0.0001263, whisper_loss=0.09041, over 18277.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01062, ecapa_loss=0.000153, whisper_loss=0.0904, over 3877826.31 frames. ], batch size: 70, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:00:37,522 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2976850.0, ans=0.125 2024-08-15 03:01:03,327 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2977050.0, ans=0.0 2024-08-15 03:01:07,360 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.48 vs. limit=12.0 2024-08-15 03:01:25,659 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2977150.0, ans=0.125 2024-08-15 03:01:25,883 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.55 vs. limit=6.0 2024-08-15 03:01:32,973 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2977150.0, ans=0.125 2024-08-15 03:01:35,019 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2977250.0, ans=0.125 2024-08-15 03:01:43,520 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2977250.0, ans=0.1 2024-08-15 03:01:53,644 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 7900, loss[loss=0.1073, beats_loss=0.01107, ecapa_loss=0.0001547, whisper_loss=0.09472, over 21779.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01063, ecapa_loss=0.0001534, whisper_loss=0.09079, over 3886331.37 frames. ], batch size: 88, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:01:58,823 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2977350.0, ans=0.0 2024-08-15 03:02:27,420 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.37 vs. limit=12.0 2024-08-15 03:02:51,127 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 23 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-15 03:02:53,967 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.836e+01 2.325e+01 2.726e+01 3.089e+01 1.885e+02, threshold=5.452e+01, percent-clipped=1.0 2024-08-15 03:03:15,703 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 7950, loss[loss=0.1203, beats_loss=0.008674, ecapa_loss=0.0001612, whisper_loss=0.11, over 23366.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01067, ecapa_loss=0.0001529, whisper_loss=0.09113, over 3902238.23 frames. ], batch size: 92, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:03:19,196 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 34 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-15 03:03:43,163 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 15 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-15 03:03:44,415 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 21 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-15 03:04:14,378 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 27 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-15 03:04:23,653 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 11 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-15 03:04:25,412 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 36 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-15 03:04:27,494 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2978250.0, ans=0.0 2024-08-15 03:04:34,099 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 30 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-15 03:04:37,489 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 8000, loss[loss=0.07077, beats_loss=0.01137, ecapa_loss=0.0001388, whisper_loss=0.058, over 16403.00 frames. ], tot_loss[loss=0.104, beats_loss=0.0106, ecapa_loss=0.0001529, whisper_loss=0.09186, over 3904237.24 frames. ], batch size: 65, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:04:37,664 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 25 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-15 03:04:48,016 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-15 03:04:53,216 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2978450.0, ans=0.125 2024-08-15 03:04:54,386 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2978450.0, ans=0.125 2024-08-15 03:04:59,440 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2978450.0, ans=0.0 2024-08-15 03:05:02,218 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-15 03:05:03,890 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2978450.0, ans=0.125 2024-08-15 03:05:10,503 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-15 03:05:25,537 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-15 03:05:31,600 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 31 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-15 03:05:35,832 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.399e+01 2.701e+01 3.155e+01 4.080e+02, threshold=5.401e+01, percent-clipped=3.0 2024-08-15 03:05:38,146 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2978650.0, ans=0.125 2024-08-15 03:05:43,020 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 34 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-15 03:05:52,131 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 27 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-15 03:05:57,645 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 8050, loss[loss=0.1232, beats_loss=0.009818, ecapa_loss=0.0001784, whisper_loss=0.1116, over 21957.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01055, ecapa_loss=0.0001535, whisper_loss=0.09221, over 3888275.63 frames. ], batch size: 89, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:06:04,416 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2978850.0, ans=0.2 2024-08-15 03:06:12,231 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-15 03:06:18,174 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2978950.0, ans=0.125 2024-08-15 03:06:20,168 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2978950.0, ans=0.2 2024-08-15 03:06:24,301 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 27 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-15 03:06:28,463 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.97 vs. limit=22.5 2024-08-15 03:06:33,236 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 21 from LS+wenet, 27 from Vox, 47 fro AS 2024-08-15 03:06:33,582 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2979050.0, ans=0.125 2024-08-15 03:06:36,406 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-15 03:06:54,669 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2979150.0, ans=0.0 2024-08-15 03:07:00,451 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2979250.0, ans=0.125 2024-08-15 03:07:04,700 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 27 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-15 03:07:17,149 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 8100, loss[loss=0.0943, beats_loss=0.01217, ecapa_loss=0.0001339, whisper_loss=0.08079, over 22705.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01057, ecapa_loss=0.0001524, whisper_loss=0.0918, over 3887689.95 frames. ], batch size: 92, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:07:17,987 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2979350.0, ans=0.0 2024-08-15 03:07:23,016 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2979350.0, ans=0.125 2024-08-15 03:07:27,021 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2979350.0, ans=0.125 2024-08-15 03:07:49,366 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2979450.0, ans=0.0 2024-08-15 03:07:53,356 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2979550.0, ans=0.125 2024-08-15 03:08:02,397 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2979550.0, ans=0.0 2024-08-15 03:08:13,357 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 19 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-15 03:08:16,163 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.692e+01 2.318e+01 2.516e+01 2.878e+01 5.938e+01, threshold=5.033e+01, percent-clipped=1.0 2024-08-15 03:08:30,862 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2979750.0, ans=0.125 2024-08-15 03:08:31,072 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.89 vs. limit=15.0 2024-08-15 03:08:34,530 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2979750.0, ans=0.1 2024-08-15 03:08:38,460 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 8150, loss[loss=0.08827, beats_loss=0.01412, ecapa_loss=0.0001379, whisper_loss=0.07276, over 21409.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01058, ecapa_loss=0.000153, whisper_loss=0.09141, over 3885567.08 frames. ], batch size: 90, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:09:04,536 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.37 vs. limit=22.5 2024-08-15 03:09:08,072 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.54 vs. limit=15.0 2024-08-15 03:09:21,246 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 26 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-15 03:09:45,112 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 19 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-15 03:09:56,748 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2980250.0, ans=0.125 2024-08-15 03:10:00,807 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 8200, loss[loss=0.1102, beats_loss=0.01034, ecapa_loss=0.000153, whisper_loss=0.09834, over 22481.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01063, ecapa_loss=0.0001525, whisper_loss=0.09104, over 3895964.00 frames. ], batch size: 91, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:10:07,428 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2980350.0, ans=0.1 2024-08-15 03:10:14,455 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.06 vs. limit=15.0 2024-08-15 03:10:27,429 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2980450.0, ans=0.0 2024-08-15 03:10:33,725 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2980550.0, ans=0.1 2024-08-15 03:10:39,426 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-15 03:10:43,474 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2980550.0, ans=0.125 2024-08-15 03:10:50,357 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2980650.0, ans=0.0 2024-08-15 03:10:51,950 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2980650.0, ans=0.2 2024-08-15 03:10:51,960 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2980650.0, ans=0.0 2024-08-15 03:10:53,248 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2980650.0, ans=0.125 2024-08-15 03:11:00,726 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.326e+01 2.553e+01 2.974e+01 4.367e+01, threshold=5.105e+01, percent-clipped=0.0 2024-08-15 03:11:05,988 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 27 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-15 03:11:06,200 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2980750.0, ans=0.125 2024-08-15 03:11:23,106 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 8250, loss[loss=0.1027, beats_loss=0.01179, ecapa_loss=0.0001315, whisper_loss=0.08956, over 20837.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01064, ecapa_loss=0.0001525, whisper_loss=0.09111, over 3900481.48 frames. ], batch size: 84, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:11:26,763 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2980850.0, ans=0.1 2024-08-15 03:11:29,929 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2980850.0, ans=0.0 2024-08-15 03:11:34,358 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.68 vs. limit=15.0 2024-08-15 03:11:43,598 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 21 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-15 03:12:01,877 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 28 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-15 03:12:08,452 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.31 vs. limit=22.5 2024-08-15 03:12:13,806 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2981150.0, ans=0.125 2024-08-15 03:12:40,504 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2981250.0, ans=0.1 2024-08-15 03:12:40,527 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2981250.0, ans=0.0 2024-08-15 03:12:47,523 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 8300, loss[loss=0.09592, beats_loss=0.01295, ecapa_loss=0.0001356, whisper_loss=0.08161, over 22825.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01072, ecapa_loss=0.0001505, whisper_loss=0.09093, over 3913595.62 frames. ], batch size: 92, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:12:59,014 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.37 vs. limit=15.0 2024-08-15 03:13:15,786 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2981450.0, ans=0.125 2024-08-15 03:13:33,955 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-15 03:13:35,052 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 21 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-15 03:13:44,785 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-15 03:13:46,512 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 23 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-15 03:13:47,547 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.954e+01 2.355e+01 2.573e+01 2.835e+01 6.620e+01, threshold=5.146e+01, percent-clipped=1.0 2024-08-15 03:13:54,281 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 19 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-15 03:14:11,155 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 8350, loss[loss=0.1023, beats_loss=0.00999, ecapa_loss=0.0001484, whisper_loss=0.09083, over 17819.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01073, ecapa_loss=0.0001503, whisper_loss=0.09094, over 3912593.89 frames. ], batch size: 67, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:14:49,895 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 33 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-15 03:14:56,299 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2982050.0, ans=0.2 2024-08-15 03:15:19,477 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 22 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-15 03:15:21,802 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2982250.0, ans=0.125 2024-08-15 03:15:25,794 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 21 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-15 03:15:29,471 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 24 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-15 03:15:34,327 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 8400, loss[loss=0.07908, beats_loss=0.01117, ecapa_loss=0.0001169, whisper_loss=0.06673, over 19394.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0107, ecapa_loss=0.0001499, whisper_loss=0.09083, over 3877254.98 frames. ], batch size: 74, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:15:46,266 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2982350.0, ans=0.2 2024-08-15 03:15:51,512 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.67 vs. limit=22.5 2024-08-15 03:15:53,154 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2982450.0, ans=0.125 2024-08-15 03:15:57,768 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2982450.0, ans=0.0 2024-08-15 03:16:02,342 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2982450.0, ans=0.0 2024-08-15 03:16:24,973 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2982650.0, ans=0.125 2024-08-15 03:16:32,705 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2982650.0, ans=0.0 2024-08-15 03:16:33,633 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 22 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-15 03:16:36,522 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.895e+01 2.324e+01 2.482e+01 2.790e+01 5.297e+01, threshold=4.963e+01, percent-clipped=1.0 2024-08-15 03:16:52,745 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2982750.0, ans=0.125 2024-08-15 03:17:01,453 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2982850.0, ans=0.125 2024-08-15 03:17:02,460 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 8450, loss[loss=0.1238, beats_loss=0.01057, ecapa_loss=0.0001562, whisper_loss=0.1117, over 22740.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01054, ecapa_loss=0.0001519, whisper_loss=0.09189, over 3856540.69 frames. ], batch size: 89, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:17:05,541 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-15 03:17:35,453 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2983050.0, ans=0.0 2024-08-15 03:17:56,864 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 18 from LS+wenet, 37 from Vox, 39 fro AS 2024-08-15 03:18:22,437 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 19 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-15 03:18:23,501 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 8500, loss[loss=0.09761, beats_loss=0.01056, ecapa_loss=0.0001435, whisper_loss=0.08561, over 16810.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01064, ecapa_loss=0.0001513, whisper_loss=0.09102, over 3866545.12 frames. ], batch size: 67, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:18:39,590 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2983450.0, ans=0.0 2024-08-15 03:18:52,257 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=2983450.0, ans=0.5 2024-08-15 03:18:57,007 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 37 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-15 03:18:58,202 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 33 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-15 03:19:18,162 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=2983650.0, ans=15.0 2024-08-15 03:19:21,472 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.872e+01 2.305e+01 2.558e+01 2.940e+01 1.198e+02, threshold=5.115e+01, percent-clipped=1.0 2024-08-15 03:19:24,792 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 22 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-15 03:19:25,157 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2983650.0, ans=0.0 2024-08-15 03:19:46,064 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 8550, loss[loss=0.09806, beats_loss=0.01201, ecapa_loss=0.0001572, whisper_loss=0.08447, over 16415.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01056, ecapa_loss=0.0001519, whisper_loss=0.09137, over 3874305.32 frames. ], batch size: 69, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:19:47,035 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=14.20 vs. limit=15.0 2024-08-15 03:19:59,860 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2983850.0, ans=0.0 2024-08-15 03:20:09,172 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 39 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-15 03:20:11,732 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.61 vs. limit=15.0 2024-08-15 03:20:14,061 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 21 from LS+wenet, 37 from Vox, 33 fro AS 2024-08-15 03:20:23,724 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 21 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-15 03:21:04,996 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2984250.0, ans=0.125 2024-08-15 03:21:07,717 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 8600, loss[loss=0.1135, beats_loss=0.008922, ecapa_loss=0.0001396, whisper_loss=0.1032, over 16977.00 frames. ], tot_loss[loss=0.104, beats_loss=0.0105, ecapa_loss=0.0001515, whisper_loss=0.09196, over 3860961.09 frames. ], batch size: 65, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:21:08,166 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2984350.0, ans=0.125 2024-08-15 03:21:15,976 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2984350.0, ans=0.125 2024-08-15 03:21:41,414 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 24 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-15 03:21:43,060 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2984550.0, ans=0.0 2024-08-15 03:21:44,836 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.55 vs. limit=22.5 2024-08-15 03:22:06,155 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.966e+01 2.403e+01 2.689e+01 2.856e+01 3.948e+01, threshold=5.378e+01, percent-clipped=0.0 2024-08-15 03:22:07,023 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.72 vs. limit=15.0 2024-08-15 03:22:09,928 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.67 vs. limit=15.0 2024-08-15 03:22:13,004 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2984750.0, ans=0.125 2024-08-15 03:22:17,147 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 22 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-15 03:22:17,613 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2984750.0, ans=0.0 2024-08-15 03:22:28,503 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.87 vs. limit=10.0 2024-08-15 03:22:29,112 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 8650, loss[loss=0.1404, beats_loss=0.008844, ecapa_loss=0.0001643, whisper_loss=0.1299, over 19177.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01068, ecapa_loss=0.0001509, whisper_loss=0.09095, over 3873406.74 frames. ], batch size: 73, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:22:34,311 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 30 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-15 03:23:15,611 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2985050.0, ans=0.125 2024-08-15 03:23:15,713 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=2985050.0, ans=15.0 2024-08-15 03:23:17,260 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2985050.0, ans=0.09899494936611666 2024-08-15 03:23:37,038 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 33 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-15 03:23:38,997 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 20 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-15 03:23:47,868 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2985250.0, ans=0.2 2024-08-15 03:23:55,802 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 8700, loss[loss=0.1101, beats_loss=0.01044, ecapa_loss=0.0001311, whisper_loss=0.09838, over 17174.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01063, ecapa_loss=0.0001509, whisper_loss=0.09088, over 3866843.54 frames. ], batch size: 67, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:24:14,472 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2985450.0, ans=0.0 2024-08-15 03:24:27,715 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2985450.0, ans=0.5 2024-08-15 03:24:33,481 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 26 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-15 03:24:38,543 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.93 vs. limit=15.0 2024-08-15 03:24:57,915 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2985650.0, ans=0.1 2024-08-15 03:24:57,959 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 03:25:05,504 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.433e+01 2.657e+01 2.885e+01 1.161e+02, threshold=5.314e+01, percent-clipped=2.0 2024-08-15 03:25:20,147 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 23 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-15 03:25:29,824 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2985850.0, ans=0.125 2024-08-15 03:25:30,698 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 8750, loss[loss=0.1014, beats_loss=0.01145, ecapa_loss=0.0001263, whisper_loss=0.08869, over 18100.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01057, ecapa_loss=0.0001522, whisper_loss=0.09178, over 3857342.39 frames. ], batch size: 69, lr: 2.96e-03, grad_scale: 1.152921504606847e+18 2024-08-15 03:25:40,300 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.01 vs. limit=22.5 2024-08-15 03:25:51,288 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.17 vs. limit=10.0 2024-08-15 03:26:08,650 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.41 vs. limit=15.0 2024-08-15 03:26:11,195 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 31 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-15 03:26:11,940 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2986050.0, ans=0.025 2024-08-15 03:26:45,436 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 19 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-15 03:26:51,018 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 17 from LS+wenet, 8 from Vox, 30 fro AS 2024-08-15 03:27:02,089 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 8800, loss[loss=0.1156, beats_loss=0.01136, ecapa_loss=0.0001021, whisper_loss=0.1032, over 22853.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01067, ecapa_loss=0.0001509, whisper_loss=0.09179, over 3855905.19 frames. ], batch size: 84, lr: 2.96e-03, grad_scale: 1.152921504606847e+18 2024-08-15 03:27:11,269 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 23 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-15 03:27:40,183 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 03:27:40,289 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2986550.0, ans=0.125 2024-08-15 03:27:47,388 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2986550.0, ans=0.2 2024-08-15 03:27:50,623 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2986550.0, ans=0.2 2024-08-15 03:27:56,758 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2986650.0, ans=0.1 2024-08-15 03:27:56,891 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2986650.0, ans=0.125 2024-08-15 03:28:05,500 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.747e+01 2.266e+01 2.513e+01 2.884e+01 4.202e+01, threshold=5.025e+01, percent-clipped=0.0 2024-08-15 03:28:07,008 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 22 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-15 03:28:24,327 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2986750.0, ans=0.125 2024-08-15 03:28:28,952 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 8850, loss[loss=0.1191, beats_loss=0.007578, ecapa_loss=0.0001691, whisper_loss=0.1098, over 19024.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01078, ecapa_loss=0.00015, whisper_loss=0.09042, over 3843653.02 frames. ], batch size: 74, lr: 2.96e-03, grad_scale: 1.152921504606847e+18 2024-08-15 03:28:29,752 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 24 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-15 03:28:30,635 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.91 vs. limit=12.0 2024-08-15 03:28:46,243 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 21 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-15 03:29:38,117 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-15 03:29:56,667 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 8900, loss[loss=0.1088, beats_loss=0.009927, ecapa_loss=0.0001733, whisper_loss=0.09711, over 21971.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01082, ecapa_loss=0.0001505, whisper_loss=0.09024, over 3885058.91 frames. ], batch size: 91, lr: 2.96e-03, grad_scale: 1.152921504606847e+18 2024-08-15 03:30:06,870 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 26 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-15 03:30:18,198 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2987450.0, ans=0.2 2024-08-15 03:30:25,484 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2987450.0, ans=0.2 2024-08-15 03:30:33,945 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2987550.0, ans=0.07 2024-08-15 03:31:00,609 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2987650.0, ans=0.125 2024-08-15 03:31:01,329 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.851e+01 2.296e+01 2.671e+01 2.935e+01 5.477e+01, threshold=5.343e+01, percent-clipped=1.0 2024-08-15 03:31:27,911 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 8950, loss[loss=0.1153, beats_loss=0.01085, ecapa_loss=0.0001489, whisper_loss=0.1029, over 23155.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01081, ecapa_loss=0.0001505, whisper_loss=0.0904, over 3902711.87 frames. ], batch size: 92, lr: 2.96e-03, grad_scale: 1.152921504606847e+18 2024-08-15 03:31:46,041 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2987850.0, ans=0.125 2024-08-15 03:31:48,432 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 33 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-15 03:32:59,178 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 9000, loss[loss=0.114, beats_loss=0.009618, ecapa_loss=0.0001683, whisper_loss=0.1027, over 19988.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01089, ecapa_loss=0.0001499, whisper_loss=0.08991, over 3918996.80 frames. ], batch size: 80, lr: 2.96e-03, grad_scale: 1.152921504606847e+18 2024-08-15 03:32:59,178 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-15 03:33:42,800 INFO [train_multi_KD3.py:1149] (1/4) Epoch 21, validation on ASR_libri: loss=0.2525, beats_loss=0, ecapa_loss=0.0005419, whisper_loss=0.2471, over 922467.00 frames. 2024-08-15 03:34:03,038 INFO [train_multi_KD3.py:1149] (1/4) Epoch 21, validation on SV_voxceleb1: loss=0.004236, beats_loss=0, ecapa_loss=0.0004236, whisper_loss=0, over 939242.00 frames. 2024-08-15 03:35:55,379 INFO [train_multi_KD3.py:1149] (1/4) Epoch 21, validation on AT_audioset: loss=0.02341, beats_loss=0.02341, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 03:35:55,382 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 30639MB 2024-08-15 03:36:06,438 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2988350.0, ans=0.0 2024-08-15 03:36:09,583 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2988350.0, ans=0.125 2024-08-15 03:36:11,436 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.45 vs. limit=10.0 2024-08-15 03:36:21,514 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2988450.0, ans=0.0 2024-08-15 03:36:48,764 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2988650.0, ans=0.1 2024-08-15 03:36:57,407 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 32 from LS+wenet, 13 from Vox, 19 fro AS 2024-08-15 03:37:00,295 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.851e+01 2.331e+01 2.598e+01 2.772e+01 4.416e+01, threshold=5.195e+01, percent-clipped=0.0 2024-08-15 03:37:05,994 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2988750.0, ans=0.125 2024-08-15 03:37:22,813 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 9050, loss[loss=0.08946, beats_loss=0.01314, ecapa_loss=0.0001337, whisper_loss=0.07499, over 21810.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01078, ecapa_loss=0.0001516, whisper_loss=0.09045, over 3885868.98 frames. ], batch size: 89, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:37:26,081 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2988850.0, ans=0.1 2024-08-15 03:37:29,906 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2988850.0, ans=0.1 2024-08-15 03:37:51,097 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=2988950.0, ans=0.05 2024-08-15 03:37:55,917 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2988950.0, ans=0.2 2024-08-15 03:38:14,166 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.74 vs. limit=22.5 2024-08-15 03:38:17,012 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 25 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-15 03:38:28,767 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2989150.0, ans=0.125 2024-08-15 03:38:56,699 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 9100, loss[loss=0.09447, beats_loss=0.01261, ecapa_loss=0.0001758, whisper_loss=0.08011, over 20949.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01077, ecapa_loss=0.0001535, whisper_loss=0.09032, over 3865312.42 frames. ], batch size: 90, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:38:57,771 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 19 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-15 03:39:09,447 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 21 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-15 03:39:30,084 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2989450.0, ans=0.125 2024-08-15 03:39:36,531 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.45 vs. limit=10.0 2024-08-15 03:39:52,104 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 23 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-15 03:39:52,360 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2989550.0, ans=0.125 2024-08-15 03:39:52,461 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.78 vs. limit=10.0 2024-08-15 03:40:00,195 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 39 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-15 03:40:10,702 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.764e+01 2.390e+01 2.731e+01 3.078e+01 3.225e+02, threshold=5.461e+01, percent-clipped=2.0 2024-08-15 03:40:35,395 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 9150, loss[loss=0.1044, beats_loss=0.01185, ecapa_loss=0.0001499, whisper_loss=0.09108, over 21659.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01076, ecapa_loss=0.0001529, whisper_loss=0.09073, over 3901411.51 frames. ], batch size: 87, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:40:45,817 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 19 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-15 03:40:55,547 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 20 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-15 03:41:06,802 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 24 from LS+wenet, 24 from Vox, 47 fro AS 2024-08-15 03:41:11,228 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2990050.0, ans=0.1 2024-08-15 03:41:16,336 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2990050.0, ans=0.125 2024-08-15 03:41:36,394 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.60 vs. limit=12.0 2024-08-15 03:41:42,405 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2990150.0, ans=0.125 2024-08-15 03:41:45,708 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=2990250.0, ans=0.05 2024-08-15 03:42:04,719 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 9200, loss[loss=0.0994, beats_loss=0.009974, ecapa_loss=0.0001599, whisper_loss=0.08783, over 22715.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0108, ecapa_loss=0.0001528, whisper_loss=0.09062, over 3912102.26 frames. ], batch size: 93, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:42:12,484 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2990350.0, ans=0.125 2024-08-15 03:42:17,597 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 19 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-15 03:42:29,192 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2990450.0, ans=0.07 2024-08-15 03:42:31,801 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 18 from LS+wenet, 30 from Vox, 42 fro AS 2024-08-15 03:43:12,183 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.340e+01 2.560e+01 2.896e+01 2.197e+02, threshold=5.119e+01, percent-clipped=4.0 2024-08-15 03:43:26,425 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 32 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-15 03:43:30,007 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 33 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-15 03:43:35,930 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 9250, loss[loss=0.09728, beats_loss=0.008953, ecapa_loss=0.0001883, whisper_loss=0.08644, over 18637.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01079, ecapa_loss=0.000152, whisper_loss=0.09097, over 3912543.76 frames. ], batch size: 75, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:44:15,087 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2991050.0, ans=0.0 2024-08-15 03:44:31,099 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 14 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-15 03:44:52,156 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.94 vs. limit=15.0 2024-08-15 03:44:57,798 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2991250.0, ans=0.1 2024-08-15 03:45:08,481 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 9300, loss[loss=0.0904, beats_loss=0.01227, ecapa_loss=0.0001524, whisper_loss=0.0766, over 22966.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01071, ecapa_loss=0.0001523, whisper_loss=0.09119, over 3884143.03 frames. ], batch size: 93, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:45:30,453 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2991450.0, ans=0.1 2024-08-15 03:45:40,919 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 22 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-15 03:46:18,973 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.028e+01 2.307e+01 2.589e+01 2.834e+01 3.793e+01, threshold=5.178e+01, percent-clipped=0.0 2024-08-15 03:46:42,675 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 9350, loss[loss=0.09699, beats_loss=0.00921, ecapa_loss=0.0001814, whisper_loss=0.08597, over 19482.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01074, ecapa_loss=0.0001526, whisper_loss=0.09122, over 3873020.03 frames. ], batch size: 79, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:46:42,937 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2991850.0, ans=0.1 2024-08-15 03:46:56,507 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 31 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-15 03:47:12,612 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.98 vs. limit=22.5 2024-08-15 03:47:25,723 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2992050.0, ans=0.1 2024-08-15 03:47:35,820 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2992150.0, ans=0.125 2024-08-15 03:47:42,566 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.27 vs. limit=15.0 2024-08-15 03:47:59,653 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2992250.0, ans=0.125 2024-08-15 03:48:00,520 INFO [train_multi_KD3.py:844] (1/4) A total of 97 cuts. 29 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-15 03:48:02,606 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2992250.0, ans=0.1 2024-08-15 03:48:05,199 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 21 from LS+wenet, 26 from Vox, 48 fro AS 2024-08-15 03:48:08,530 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 9400, loss[loss=0.0918, beats_loss=0.008107, ecapa_loss=0.0001201, whisper_loss=0.08249, over 16042.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01084, ecapa_loss=0.0001508, whisper_loss=0.08943, over 3841383.61 frames. ], batch size: 58, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:48:08,921 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 25 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-15 03:48:12,134 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 27 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-15 03:48:30,397 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2992450.0, ans=0.2 2024-08-15 03:48:34,252 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2992450.0, ans=0.0 2024-08-15 03:48:39,894 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 40 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-15 03:48:41,809 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2992550.0, ans=0.125 2024-08-15 03:48:41,947 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2992550.0, ans=0.125 2024-08-15 03:48:55,844 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2992550.0, ans=0.125 2024-08-15 03:48:55,974 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.95 vs. limit=12.0 2024-08-15 03:49:00,156 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 17 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-15 03:49:08,181 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 18 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-15 03:49:11,276 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.839e+01 2.345e+01 2.543e+01 2.847e+01 7.002e+01, threshold=5.086e+01, percent-clipped=1.0 2024-08-15 03:49:16,633 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 26 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-15 03:49:19,955 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2992750.0, ans=10.0 2024-08-15 03:49:32,143 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 9450, loss[loss=0.115, beats_loss=0.01048, ecapa_loss=0.000133, whisper_loss=0.1032, over 23886.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01075, ecapa_loss=0.000151, whisper_loss=0.0897, over 3840964.33 frames. ], batch size: 92, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:49:37,312 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-15 03:49:42,601 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 22 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-15 03:49:44,742 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2992850.0, ans=0.125 2024-08-15 03:49:51,208 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 18 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-15 03:50:32,759 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 03:50:58,888 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 9500, loss[loss=0.1206, beats_loss=0.009325, ecapa_loss=0.0001618, whisper_loss=0.1097, over 13760.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0108, ecapa_loss=0.0001503, whisper_loss=0.08971, over 3867647.26 frames. ], batch size: 55, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:51:02,682 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2993350.0, ans=0.125 2024-08-15 03:51:05,718 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2993350.0, ans=0.0 2024-08-15 03:51:33,414 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2993550.0, ans=0.125 2024-08-15 03:51:50,061 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 20 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-15 03:51:53,434 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 19 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-15 03:51:57,419 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2993650.0, ans=0.0 2024-08-15 03:52:05,972 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.771e+01 2.283e+01 2.545e+01 2.911e+01 4.109e+01, threshold=5.090e+01, percent-clipped=0.0 2024-08-15 03:52:20,818 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2993750.0, ans=0.125 2024-08-15 03:52:22,527 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.54 vs. limit=15.0 2024-08-15 03:52:29,574 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 9550, loss[loss=0.1061, beats_loss=0.01165, ecapa_loss=0.0001211, whisper_loss=0.09325, over 18388.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01069, ecapa_loss=0.0001518, whisper_loss=0.09006, over 3867364.89 frames. ], batch size: 71, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:52:32,249 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=2993850.0, ans=10.0 2024-08-15 03:52:33,913 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.028e-02 2024-08-15 03:52:59,024 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2993950.0, ans=0.125 2024-08-15 03:52:59,026 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2993950.0, ans=0.0 2024-08-15 03:53:02,941 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2993950.0, ans=0.125 2024-08-15 03:53:09,844 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.36 vs. limit=15.0 2024-08-15 03:53:34,049 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2994150.0, ans=0.125 2024-08-15 03:54:00,909 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 9600, loss[loss=0.09652, beats_loss=0.008637, ecapa_loss=0.0001978, whisper_loss=0.08591, over 19728.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01064, ecapa_loss=0.0001531, whisper_loss=0.09001, over 3847084.23 frames. ], batch size: 83, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:54:13,171 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.73 vs. limit=15.0 2024-08-15 03:54:20,168 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 35 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-15 03:54:22,198 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2994450.0, ans=0.1 2024-08-15 03:54:23,249 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 26 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-15 03:54:45,514 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2994550.0, ans=0.125 2024-08-15 03:55:13,059 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.774e+01 2.340e+01 2.536e+01 2.906e+01 4.631e+01, threshold=5.072e+01, percent-clipped=0.0 2024-08-15 03:55:13,240 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 30 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-15 03:55:43,067 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 9650, loss[loss=0.1172, beats_loss=0.009433, ecapa_loss=0.0001447, whisper_loss=0.1063, over 19635.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0106, ecapa_loss=0.0001533, whisper_loss=0.0904, over 3853165.24 frames. ], batch size: 76, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:55:55,245 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2994850.0, ans=0.015 2024-08-15 03:55:55,711 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.99 vs. limit=15.0 2024-08-15 03:56:14,308 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.39 vs. limit=15.0 2024-08-15 03:56:16,716 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.15 vs. limit=15.0 2024-08-15 03:56:33,426 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 18 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-15 03:57:02,326 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2995150.0, ans=0.1 2024-08-15 03:57:25,862 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 19 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-15 03:57:29,331 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 9700, loss[loss=0.09955, beats_loss=0.01071, ecapa_loss=0.0001259, whisper_loss=0.08758, over 18676.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01061, ecapa_loss=0.000154, whisper_loss=0.09008, over 3816319.18 frames. ], batch size: 71, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:58:06,446 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 03:58:07,500 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 17 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-15 03:58:18,615 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2995450.0, ans=0.125 2024-08-15 03:58:32,478 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-15 03:58:50,995 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2995650.0, ans=0.125 2024-08-15 03:59:01,736 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.62 vs. limit=6.0 2024-08-15 03:59:02,382 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 24 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-15 03:59:10,197 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.634e+01 2.369e+01 2.652e+01 2.894e+01 3.989e+01, threshold=5.305e+01, percent-clipped=0.0 2024-08-15 03:59:16,943 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2995750.0, ans=0.1 2024-08-15 03:59:46,141 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 9750, loss[loss=0.1071, beats_loss=0.01098, ecapa_loss=0.0001596, whisper_loss=0.09456, over 15671.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01065, ecapa_loss=0.0001539, whisper_loss=0.08919, over 3804500.35 frames. ], batch size: 64, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:59:55,628 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2995850.0, ans=0.0 2024-08-15 04:00:12,590 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2995950.0, ans=0.0 2024-08-15 04:00:23,505 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 19 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-15 04:00:29,746 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 31 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-15 04:00:50,188 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2996050.0, ans=0.0 2024-08-15 04:00:50,377 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.10 vs. limit=22.5 2024-08-15 04:00:51,925 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2996050.0, ans=0.1 2024-08-15 04:00:56,160 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2996050.0, ans=0.1 2024-08-15 04:01:08,648 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2996150.0, ans=0.0 2024-08-15 04:01:19,758 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 23 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-15 04:01:21,557 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=7.810e+01 2024-08-15 04:01:35,613 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 21 from LS+wenet, 31 from Vox, 25 fro AS 2024-08-15 04:01:53,538 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.09 vs. limit=15.0 2024-08-15 04:01:53,964 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 9800, loss[loss=0.07442, beats_loss=0.01096, ecapa_loss=0.000141, whisper_loss=0.06204, over 14021.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01061, ecapa_loss=0.000155, whisper_loss=0.08932, over 3809546.73 frames. ], batch size: 55, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:01:59,358 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.66 vs. limit=22.5 2024-08-15 04:01:59,981 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 29 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-15 04:02:20,757 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2996450.0, ans=0.125 2024-08-15 04:02:35,126 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2996450.0, ans=0.125 2024-08-15 04:02:36,633 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2996450.0, ans=0.0 2024-08-15 04:03:05,286 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 25 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-15 04:03:07,664 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-15 04:03:11,919 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2996650.0, ans=0.2 2024-08-15 04:03:25,826 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.760e+01 2.294e+01 2.579e+01 3.082e+01 3.957e+01, threshold=5.158e+01, percent-clipped=0.0 2024-08-15 04:03:25,970 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 20 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-15 04:03:31,694 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.579e-03 2024-08-15 04:03:43,035 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2996750.0, ans=0.0 2024-08-15 04:03:45,301 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 9850, loss[loss=0.07009, beats_loss=0.0133, ecapa_loss=0.00013, whisper_loss=0.05549, over 14756.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01065, ecapa_loss=0.0001536, whisper_loss=0.08897, over 3807310.17 frames. ], batch size: 61, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:03:49,925 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 22 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-15 04:03:50,401 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2996850.0, ans=0.0 2024-08-15 04:03:52,751 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.15 vs. limit=15.0 2024-08-15 04:04:01,016 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2996850.0, ans=0.0 2024-08-15 04:04:01,022 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2996850.0, ans=0.09899494936611666 2024-08-15 04:04:07,477 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 21 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-15 04:04:11,908 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2996950.0, ans=0.1 2024-08-15 04:04:41,369 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2997150.0, ans=0.125 2024-08-15 04:04:56,712 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2997250.0, ans=0.5 2024-08-15 04:05:11,113 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 9900, loss[loss=0.1115, beats_loss=0.01179, ecapa_loss=0.0001469, whisper_loss=0.09828, over 23559.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01068, ecapa_loss=0.0001539, whisper_loss=0.08984, over 3839856.97 frames. ], batch size: 92, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:05:13,417 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 25 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-15 04:05:17,432 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 23 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-15 04:05:35,895 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.55 vs. limit=15.0 2024-08-15 04:05:52,836 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2997550.0, ans=0.125 2024-08-15 04:05:54,327 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2997550.0, ans=0.125 2024-08-15 04:06:12,394 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.606e+01 2.308e+01 2.598e+01 3.035e+01 4.044e+01, threshold=5.195e+01, percent-clipped=0.0 2024-08-15 04:06:34,725 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 9950, loss[loss=0.1072, beats_loss=0.008543, ecapa_loss=0.0001654, whisper_loss=0.09696, over 18699.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0107, ecapa_loss=0.0001535, whisper_loss=0.08959, over 3843593.88 frames. ], batch size: 75, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:06:59,954 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2997950.0, ans=0.0 2024-08-15 04:07:01,261 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 29 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-15 04:07:19,402 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2998050.0, ans=0.05 2024-08-15 04:07:22,191 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2998050.0, ans=0.0 2024-08-15 04:07:37,231 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.74 vs. limit=15.0 2024-08-15 04:07:45,429 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2998250.0, ans=0.125 2024-08-15 04:08:01,349 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 10000, loss[loss=0.08117, beats_loss=0.01521, ecapa_loss=0.000105, whisper_loss=0.0649, over 14010.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01074, ecapa_loss=0.0001529, whisper_loss=0.08932, over 3830220.59 frames. ], batch size: 55, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:08:01,936 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2998350.0, ans=0.125 2024-08-15 04:08:07,870 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-15 04:08:14,480 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2998350.0, ans=0.1 2024-08-15 04:08:23,444 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 04:08:26,499 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2998450.0, ans=0.0 2024-08-15 04:08:36,075 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2998550.0, ans=0.125 2024-08-15 04:08:40,346 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2998550.0, ans=0.0 2024-08-15 04:08:58,601 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2998650.0, ans=0.125 2024-08-15 04:09:02,404 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.788e+01 2.337e+01 2.587e+01 2.892e+01 1.142e+02, threshold=5.175e+01, percent-clipped=1.0 2024-08-15 04:09:19,489 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2998750.0, ans=0.125 2024-08-15 04:09:23,497 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 10050, loss[loss=0.09234, beats_loss=0.01172, ecapa_loss=0.000107, whisper_loss=0.07955, over 21436.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01069, ecapa_loss=0.0001529, whisper_loss=0.08958, over 3847741.02 frames. ], batch size: 83, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:09:31,770 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2998850.0, ans=0.0 2024-08-15 04:09:36,683 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 20 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-15 04:09:39,103 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.40 vs. limit=15.0 2024-08-15 04:09:45,194 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2024-08-15 04:09:45,722 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 21 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-15 04:09:49,589 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 22 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-15 04:10:11,559 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2999050.0, ans=0.125 2024-08-15 04:10:15,247 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2999150.0, ans=0.125 2024-08-15 04:10:22,899 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2999150.0, ans=0.0 2024-08-15 04:10:35,750 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 25 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-15 04:10:46,283 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 35 from Vox, 29 fro AS 2024-08-15 04:10:47,447 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 10100, loss[loss=0.09923, beats_loss=0.008747, ecapa_loss=0.0001991, whisper_loss=0.0885, over 21220.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01071, ecapa_loss=0.0001537, whisper_loss=0.08972, over 3875060.50 frames. ], batch size: 89, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:10:51,074 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.42 vs. limit=15.0 2024-08-15 04:10:52,456 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2999350.0, ans=0.025 2024-08-15 04:11:03,383 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 14 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-15 04:11:11,699 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2999450.0, ans=0.125 2024-08-15 04:11:14,420 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2999450.0, ans=0.125 2024-08-15 04:11:16,492 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.58 vs. limit=15.0 2024-08-15 04:11:19,570 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2999550.0, ans=0.1 2024-08-15 04:11:32,980 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2999650.0, ans=0.2 2024-08-15 04:11:35,474 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-15 04:11:43,412 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 22 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-15 04:11:44,480 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.436e+01 2.693e+01 3.002e+01 5.180e+01, threshold=5.387e+01, percent-clipped=1.0 2024-08-15 04:11:56,165 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-15 04:11:56,820 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2999750.0, ans=0.0 2024-08-15 04:12:05,219 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 10150, loss[loss=0.09781, beats_loss=0.01053, ecapa_loss=0.0001647, whisper_loss=0.08563, over 16685.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01067, ecapa_loss=0.0001545, whisper_loss=0.09043, over 3891329.61 frames. ], batch size: 70, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:12:13,376 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 11 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-15 04:12:18,432 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2999850.0, ans=0.125 2024-08-15 04:12:24,169 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2999950.0, ans=0.035 2024-08-15 04:12:31,154 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 21 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-15 04:12:32,447 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 19 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-15 04:12:32,693 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2999950.0, ans=0.0 2024-08-15 04:12:38,218 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.43 vs. limit=15.0 2024-08-15 04:12:40,202 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3000050.0, ans=0.125 2024-08-15 04:13:09,987 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3000250.0, ans=0.125 2024-08-15 04:13:23,317 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.24 vs. limit=12.0 2024-08-15 04:13:25,139 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 10200, loss[loss=0.09522, beats_loss=0.01221, ecapa_loss=0.0001165, whisper_loss=0.08185, over 24059.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01066, ecapa_loss=0.0001539, whisper_loss=0.09036, over 3880324.59 frames. ], batch size: 92, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:13:37,046 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3000350.0, ans=0.125 2024-08-15 04:13:38,865 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.13 vs. limit=22.5 2024-08-15 04:14:02,531 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.06 vs. limit=15.0 2024-08-15 04:14:04,009 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3000550.0, ans=0.1 2024-08-15 04:14:09,512 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3000550.0, ans=0.1 2024-08-15 04:14:10,646 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-15 04:14:13,645 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3000650.0, ans=0.1 2024-08-15 04:14:23,541 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.355e+01 2.535e+01 2.807e+01 5.755e+01, threshold=5.070e+01, percent-clipped=1.0 2024-08-15 04:14:24,565 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3000650.0, ans=0.125 2024-08-15 04:14:34,967 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 17 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-15 04:14:43,175 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 10250, loss[loss=0.1021, beats_loss=0.01327, ecapa_loss=0.0001514, whisper_loss=0.08732, over 21322.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01064, ecapa_loss=0.0001536, whisper_loss=0.09094, over 3920703.39 frames. ], batch size: 89, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:14:54,314 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3000850.0, ans=0.125 2024-08-15 04:15:00,182 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3000950.0, ans=0.0 2024-08-15 04:15:03,702 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3000950.0, ans=0.1 2024-08-15 04:15:12,636 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 18 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-15 04:15:17,000 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-15 04:15:18,754 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3001050.0, ans=0.125 2024-08-15 04:15:25,086 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3001050.0, ans=0.09899494936611666 2024-08-15 04:15:28,826 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 18 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-15 04:15:30,226 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 17 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-15 04:15:42,713 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3001150.0, ans=0.1 2024-08-15 04:15:42,977 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.18 vs. limit=15.0 2024-08-15 04:16:00,706 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 10300, loss[loss=0.107, beats_loss=0.009871, ecapa_loss=0.0001574, whisper_loss=0.0956, over 22875.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01064, ecapa_loss=0.0001535, whisper_loss=0.09063, over 3917632.61 frames. ], batch size: 90, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:16:13,584 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3001350.0, ans=0.0 2024-08-15 04:16:34,904 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3001550.0, ans=0.125 2024-08-15 04:16:41,688 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 23 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-15 04:16:59,015 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.405e+01 2.691e+01 3.048e+01 4.748e+01, threshold=5.382e+01, percent-clipped=0.0 2024-08-15 04:17:19,575 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 10350, loss[loss=0.11, beats_loss=0.008627, ecapa_loss=0.0001325, whisper_loss=0.1, over 24089.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01064, ecapa_loss=0.0001525, whisper_loss=0.09069, over 3954174.79 frames. ], batch size: 90, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:17:26,134 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3001850.0, ans=0.125 2024-08-15 04:17:35,819 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.50 vs. limit=22.5 2024-08-15 04:18:11,383 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 20 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-15 04:18:39,188 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 26 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-15 04:18:40,861 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 10400, loss[loss=0.1083, beats_loss=0.008941, ecapa_loss=0.0002002, whisper_loss=0.0974, over 18362.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01068, ecapa_loss=0.0001527, whisper_loss=0.09051, over 3939813.02 frames. ], batch size: 76, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:19:02,281 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3002450.0, ans=0.1 2024-08-15 04:19:21,397 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.69 vs. limit=22.5 2024-08-15 04:19:23,956 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 17 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-15 04:19:28,015 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3002650.0, ans=0.1 2024-08-15 04:19:28,036 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3002650.0, ans=0.125 2024-08-15 04:19:34,123 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3002650.0, ans=0.125 2024-08-15 04:19:34,953 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.912e+01 2.336e+01 2.571e+01 2.760e+01 5.271e+01, threshold=5.142e+01, percent-clipped=0.0 2024-08-15 04:19:52,320 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 21 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-15 04:19:52,912 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=3002850.0, ans=15.0 2024-08-15 04:19:53,650 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 10450, loss[loss=0.1081, beats_loss=0.01043, ecapa_loss=0.0001527, whisper_loss=0.09612, over 16414.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01073, ecapa_loss=0.0001517, whisper_loss=0.08977, over 3894300.05 frames. ], batch size: 65, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:20:05,021 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3002850.0, ans=0.125 2024-08-15 04:20:16,150 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 25 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-15 04:20:17,656 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 23 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-15 04:20:25,336 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 19 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-15 04:20:31,207 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3003050.0, ans=0.125 2024-08-15 04:20:40,474 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-15 04:20:43,240 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3003150.0, ans=0.125 2024-08-15 04:20:44,396 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 31 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-15 04:21:04,010 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 10500, loss[loss=0.08964, beats_loss=0.01101, ecapa_loss=0.0001357, whisper_loss=0.07727, over 21203.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01063, ecapa_loss=0.0001532, whisper_loss=0.09028, over 3895470.59 frames. ], batch size: 83, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:21:12,450 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3003350.0, ans=0.1 2024-08-15 04:21:13,359 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 15 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-15 04:21:16,355 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 18 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-15 04:21:27,285 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3003450.0, ans=0.0 2024-08-15 04:21:28,538 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-15 04:21:36,818 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3003550.0, ans=0.125 2024-08-15 04:21:45,106 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3003650.0, ans=0.125 2024-08-15 04:21:46,112 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 17 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-15 04:21:46,414 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3003650.0, ans=0.0 2024-08-15 04:21:54,328 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.921e+01 2.267e+01 2.471e+01 2.846e+01 8.765e+01, threshold=4.941e+01, percent-clipped=1.0 2024-08-15 04:22:12,628 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 10550, loss[loss=0.1101, beats_loss=0.009411, ecapa_loss=0.0001811, whisper_loss=0.09888, over 22069.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01062, ecapa_loss=0.0001529, whisper_loss=0.09089, over 3886109.87 frames. ], batch size: 90, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:22:18,951 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3003850.0, ans=0.0 2024-08-15 04:22:28,126 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3003950.0, ans=0.125 2024-08-15 04:22:32,870 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.98 vs. limit=10.0 2024-08-15 04:22:43,424 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3004050.0, ans=0.2 2024-08-15 04:22:47,738 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 17 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-15 04:23:03,960 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3004150.0, ans=0.125 2024-08-15 04:23:21,336 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 10600, loss[loss=0.1093, beats_loss=0.01068, ecapa_loss=0.0001417, whisper_loss=0.0972, over 23322.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01063, ecapa_loss=0.0001529, whisper_loss=0.09039, over 3879894.83 frames. ], batch size: 92, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:23:22,518 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.12 vs. limit=12.0 2024-08-15 04:23:24,506 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3004350.0, ans=0.125 2024-08-15 04:23:28,541 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 21 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-15 04:23:32,852 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3004350.0, ans=0.0 2024-08-15 04:23:33,894 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-15 04:23:34,763 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.02 vs. limit=10.0 2024-08-15 04:23:36,858 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-15 04:23:41,972 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 34 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-15 04:23:57,494 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3004550.0, ans=0.0 2024-08-15 04:24:00,548 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.26 vs. limit=10.0 2024-08-15 04:24:07,246 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3004650.0, ans=0.0 2024-08-15 04:24:12,013 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.948e+01 2.388e+01 2.629e+01 3.044e+01 4.366e+02, threshold=5.258e+01, percent-clipped=2.0 2024-08-15 04:24:12,505 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3004650.0, ans=0.125 2024-08-15 04:24:12,541 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3004650.0, ans=0.2 2024-08-15 04:24:25,310 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-15 04:24:30,049 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3004850.0, ans=0.125 2024-08-15 04:24:30,787 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 10650, loss[loss=0.08472, beats_loss=0.01186, ecapa_loss=0.0001432, whisper_loss=0.07143, over 21012.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01057, ecapa_loss=0.000152, whisper_loss=0.09137, over 3902322.36 frames. ], batch size: 84, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:24:53,224 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 29 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-15 04:24:53,528 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3004950.0, ans=0.125 2024-08-15 04:25:14,578 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.04 vs. limit=15.0 2024-08-15 04:25:16,663 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-15 04:25:19,819 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3005150.0, ans=0.0 2024-08-15 04:25:24,103 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.68 vs. limit=22.5 2024-08-15 04:25:43,965 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 10700, loss[loss=0.1187, beats_loss=0.01004, ecapa_loss=0.0001407, whisper_loss=0.1072, over 23813.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01054, ecapa_loss=0.0001519, whisper_loss=0.09196, over 3923227.01 frames. ], batch size: 91, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:25:46,110 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 04:25:53,807 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3005350.0, ans=0.0 2024-08-15 04:25:58,298 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3005450.0, ans=0.0 2024-08-15 04:26:04,490 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3005450.0, ans=0.125 2024-08-15 04:26:06,819 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 25 from LS+wenet, 18 from Vox, 49 fro AS 2024-08-15 04:26:08,152 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-15 04:26:16,031 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3005550.0, ans=0.125 2024-08-15 04:26:21,949 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3005550.0, ans=0.0 2024-08-15 04:26:25,188 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3005550.0, ans=0.0 2024-08-15 04:26:32,622 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 27 from LS+wenet, 14 from Vox, 18 fro AS 2024-08-15 04:26:38,929 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3005650.0, ans=0.125 2024-08-15 04:26:39,701 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.854e+01 2.322e+01 2.552e+01 2.947e+01 1.324e+02, threshold=5.105e+01, percent-clipped=0.0 2024-08-15 04:26:42,915 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 33 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-15 04:26:52,654 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3005750.0, ans=0.0 2024-08-15 04:26:54,234 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3005750.0, ans=0.125 2024-08-15 04:26:59,472 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 10750, loss[loss=0.09187, beats_loss=0.01227, ecapa_loss=0.0001302, whisper_loss=0.07829, over 21493.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01058, ecapa_loss=0.0001519, whisper_loss=0.09165, over 3899350.62 frames. ], batch size: 86, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:27:01,434 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.42 vs. limit=22.5 2024-08-15 04:27:05,731 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3005850.0, ans=0.0 2024-08-15 04:27:28,658 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-15 04:27:36,331 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 19 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-15 04:27:43,261 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3006050.0, ans=0.035 2024-08-15 04:27:51,334 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.17 vs. limit=15.0 2024-08-15 04:27:55,180 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 41 from LS+wenet, 13 from Vox, 40 fro AS 2024-08-15 04:28:09,604 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.141e-01 2024-08-15 04:28:16,877 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 10800, loss[loss=0.1211, beats_loss=0.009973, ecapa_loss=0.0001897, whisper_loss=0.1092, over 21714.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01064, ecapa_loss=0.0001528, whisper_loss=0.0909, over 3888661.18 frames. ], batch size: 88, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:28:28,052 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-15 04:28:28,088 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3006350.0, ans=0.2 2024-08-15 04:28:36,979 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3006450.0, ans=0.125 2024-08-15 04:28:59,665 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3006550.0, ans=0.07 2024-08-15 04:29:12,515 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.977e+01 2.426e+01 2.732e+01 3.113e+01 1.619e+02, threshold=5.464e+01, percent-clipped=2.0 2024-08-15 04:29:31,754 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 10850, loss[loss=0.1041, beats_loss=0.01148, ecapa_loss=0.0001629, whisper_loss=0.09101, over 16817.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01068, ecapa_loss=0.0001527, whisper_loss=0.09136, over 3903862.50 frames. ], batch size: 66, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:29:41,541 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3006850.0, ans=0.125 2024-08-15 04:29:47,324 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.609e-03 2024-08-15 04:30:04,603 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 40 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-15 04:30:22,884 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 23 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-15 04:30:23,034 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3007150.0, ans=0.125 2024-08-15 04:30:50,669 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 10900, loss[loss=0.07849, beats_loss=0.01164, ecapa_loss=0.0001606, whisper_loss=0.06524, over 21674.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01066, ecapa_loss=0.0001525, whisper_loss=0.09177, over 3937305.57 frames. ], batch size: 93, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:30:53,957 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 19 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-15 04:31:12,007 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3007450.0, ans=0.1 2024-08-15 04:31:17,227 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3007450.0, ans=0.2 2024-08-15 04:31:25,972 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 24 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-15 04:31:26,227 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3007550.0, ans=0.125 2024-08-15 04:31:36,539 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 30 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-15 04:31:47,146 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.315e+01 2.550e+01 2.913e+01 4.386e+01, threshold=5.099e+01, percent-clipped=0.0 2024-08-15 04:31:57,047 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3007750.0, ans=0.125 2024-08-15 04:32:06,992 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 10950, loss[loss=0.1132, beats_loss=0.009572, ecapa_loss=0.0001478, whisper_loss=0.1021, over 15030.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01064, ecapa_loss=0.0001511, whisper_loss=0.09184, over 3937402.41 frames. ], batch size: 59, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:32:08,666 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3007850.0, ans=0.125 2024-08-15 04:32:25,312 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 17 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-15 04:32:36,413 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3008050.0, ans=0.1 2024-08-15 04:32:37,843 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3008050.0, ans=0.0 2024-08-15 04:32:49,043 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3008050.0, ans=0.0 2024-08-15 04:32:53,815 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 27 from LS+wenet, 23 from Vox, 20 fro AS 2024-08-15 04:32:54,469 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3008150.0, ans=0.1 2024-08-15 04:33:03,468 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3008150.0, ans=0.1 2024-08-15 04:33:10,886 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3008250.0, ans=0.125 2024-08-15 04:33:13,181 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-15 04:33:22,421 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 11000, loss[loss=0.08358, beats_loss=0.01274, ecapa_loss=0.0001412, whisper_loss=0.06944, over 22881.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01063, ecapa_loss=0.0001524, whisper_loss=0.09093, over 3946369.25 frames. ], batch size: 93, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:33:40,658 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3008450.0, ans=0.125 2024-08-15 04:34:05,452 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.94 vs. limit=15.0 2024-08-15 04:34:18,111 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3008650.0, ans=0.125 2024-08-15 04:34:20,285 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.899e+01 2.435e+01 2.579e+01 2.993e+01 2.045e+02, threshold=5.158e+01, percent-clipped=2.0 2024-08-15 04:34:24,728 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3008750.0, ans=0.0 2024-08-15 04:34:32,946 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-15 04:34:37,244 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 11050, loss[loss=0.08514, beats_loss=0.01317, ecapa_loss=0.000113, whisper_loss=0.07084, over 19207.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01057, ecapa_loss=0.0001529, whisper_loss=0.09101, over 3930615.80 frames. ], batch size: 77, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:34:50,298 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 17 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-15 04:34:53,455 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 32 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-15 04:35:01,349 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3008950.0, ans=0.0 2024-08-15 04:35:10,148 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3009050.0, ans=0.1 2024-08-15 04:35:19,678 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3009050.0, ans=0.0 2024-08-15 04:35:20,621 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-15 04:35:29,579 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3009150.0, ans=0.1 2024-08-15 04:35:51,650 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3009350.0, ans=0.1 2024-08-15 04:35:52,395 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 11100, loss[loss=0.1099, beats_loss=0.01031, ecapa_loss=0.0001325, whisper_loss=0.09827, over 20996.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01058, ecapa_loss=0.0001526, whisper_loss=0.09116, over 3926078.65 frames. ], batch size: 81, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:36:05,181 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3009350.0, ans=0.2 2024-08-15 04:36:09,119 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3009450.0, ans=0.125 2024-08-15 04:36:14,028 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 32 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-15 04:36:16,989 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3009450.0, ans=0.1 2024-08-15 04:36:26,993 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 18 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-15 04:36:29,651 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 18 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-15 04:36:34,489 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 28 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-15 04:36:44,737 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 12 from LS+wenet, 10 from Vox, 33 fro AS 2024-08-15 04:36:48,762 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.388e+01 2.670e+01 2.959e+01 6.163e+01, threshold=5.341e+01, percent-clipped=1.0 2024-08-15 04:37:07,420 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 11150, loss[loss=0.1192, beats_loss=0.008109, ecapa_loss=0.0001749, whisper_loss=0.1093, over 18118.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0105, ecapa_loss=0.0001527, whisper_loss=0.0916, over 3888346.01 frames. ], batch size: 73, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:37:24,126 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-15 04:37:31,245 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 24 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-15 04:37:31,463 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3009950.0, ans=0.2 2024-08-15 04:37:39,692 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=3010050.0, ans=0.05 2024-08-15 04:37:47,987 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-15 04:37:56,755 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 18 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-15 04:38:01,029 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 20 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-15 04:38:09,394 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-15 04:38:19,749 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 11200, loss[loss=0.09997, beats_loss=0.01168, ecapa_loss=0.000169, whisper_loss=0.0866, over 16567.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01058, ecapa_loss=0.0001527, whisper_loss=0.09107, over 3853088.45 frames. ], batch size: 70, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:38:31,900 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3010350.0, ans=0.2 2024-08-15 04:38:36,099 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.940e+01 2024-08-15 04:38:36,169 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3010450.0, ans=0.125 2024-08-15 04:38:47,920 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.07 vs. limit=10.0 2024-08-15 04:38:50,896 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3010550.0, ans=0.1 2024-08-15 04:39:15,040 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.986e+01 2.332e+01 2.561e+01 2.829e+01 4.358e+01, threshold=5.122e+01, percent-clipped=0.0 2024-08-15 04:39:27,883 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 18 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-15 04:39:33,974 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 11250, loss[loss=0.1021, beats_loss=0.01137, ecapa_loss=0.0001425, whisper_loss=0.08928, over 18510.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01064, ecapa_loss=0.000153, whisper_loss=0.09066, over 3861602.06 frames. ], batch size: 76, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:39:55,943 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=5.592e+01 2024-08-15 04:40:00,183 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3010950.0, ans=0.125 2024-08-15 04:40:12,293 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3011050.0, ans=0.0 2024-08-15 04:40:30,550 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3011150.0, ans=0.125 2024-08-15 04:40:50,877 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 11300, loss[loss=0.07669, beats_loss=0.01316, ecapa_loss=0.0001472, whisper_loss=0.06206, over 13986.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01054, ecapa_loss=0.0001533, whisper_loss=0.0907, over 3858935.89 frames. ], batch size: 57, lr: 2.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:40:59,509 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-15 04:41:00,734 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 17 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-15 04:41:29,878 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 27 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-15 04:41:31,479 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3011550.0, ans=0.0 2024-08-15 04:41:35,721 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 21 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-15 04:41:37,509 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-15 04:41:37,516 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3011650.0, ans=0.125 2024-08-15 04:41:52,938 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.854e+01 2.314e+01 2.562e+01 2.942e+01 5.561e+01, threshold=5.125e+01, percent-clipped=1.0 2024-08-15 04:41:53,437 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 17 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-15 04:42:10,030 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 11350, loss[loss=0.1069, beats_loss=0.009197, ecapa_loss=0.0001526, whisper_loss=0.09615, over 22255.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01048, ecapa_loss=0.0001524, whisper_loss=0.09114, over 3896981.51 frames. ], batch size: 91, lr: 2.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:42:20,216 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3011850.0, ans=0.025 2024-08-15 04:42:29,639 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 23 from LS+wenet, 23 from Vox, 47 fro AS 2024-08-15 04:42:37,714 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3011950.0, ans=0.0 2024-08-15 04:42:37,756 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3011950.0, ans=0.1 2024-08-15 04:42:53,555 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3012050.0, ans=0.1 2024-08-15 04:43:01,761 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 14 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-15 04:43:02,088 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3012150.0, ans=0.125 2024-08-15 04:43:03,378 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 26 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-15 04:43:11,803 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3012250.0, ans=0.1 2024-08-15 04:43:24,498 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3012350.0, ans=0.0 2024-08-15 04:43:25,266 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 11400, loss[loss=0.08875, beats_loss=0.01025, ecapa_loss=0.0001618, whisper_loss=0.07687, over 13985.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0105, ecapa_loss=0.000152, whisper_loss=0.09123, over 3873514.53 frames. ], batch size: 56, lr: 2.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:43:30,259 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 20 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-15 04:43:35,316 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3012350.0, ans=0.125 2024-08-15 04:43:42,154 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-15 04:43:42,896 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.99 vs. limit=10.0 2024-08-15 04:43:45,885 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 17 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-15 04:44:09,744 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 22 from LS+wenet, 9 from Vox, 27 fro AS 2024-08-15 04:44:19,033 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3012650.0, ans=0.0 2024-08-15 04:44:20,195 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 13 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-15 04:44:20,917 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.64 vs. limit=22.5 2024-08-15 04:44:22,709 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+01 2.412e+01 2.712e+01 2.971e+01 3.918e+01, threshold=5.424e+01, percent-clipped=0.0 2024-08-15 04:44:34,276 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 22 from LS+wenet, 20 from Vox, 51 fro AS 2024-08-15 04:44:38,697 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 23 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-15 04:44:39,700 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 11450, loss[loss=0.09196, beats_loss=0.01092, ecapa_loss=0.0001679, whisper_loss=0.07936, over 20313.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01056, ecapa_loss=0.0001526, whisper_loss=0.09053, over 3875308.41 frames. ], batch size: 85, lr: 2.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:44:48,966 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3012850.0, ans=0.125 2024-08-15 04:44:53,112 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 27 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-15 04:44:54,792 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.31 vs. limit=22.5 2024-08-15 04:45:16,782 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3013050.0, ans=0.1 2024-08-15 04:45:25,641 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 17 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-15 04:45:39,681 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3013250.0, ans=0.1 2024-08-15 04:45:43,556 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 16 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-15 04:45:52,248 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.61 vs. limit=22.5 2024-08-15 04:45:55,948 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 11500, loss[loss=0.1109, beats_loss=0.008801, ecapa_loss=0.0001377, whisper_loss=0.1007, over 23285.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01054, ecapa_loss=0.0001514, whisper_loss=0.09093, over 3895807.95 frames. ], batch size: 89, lr: 2.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:46:13,630 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.96 vs. limit=15.0 2024-08-15 04:46:52,429 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.820e+01 2.370e+01 2.550e+01 2.848e+01 7.027e+01, threshold=5.100e+01, percent-clipped=1.0 2024-08-15 04:47:08,657 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 11550, loss[loss=0.1386, beats_loss=0.007757, ecapa_loss=0.0001519, whisper_loss=0.1294, over 17674.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01053, ecapa_loss=0.0001527, whisper_loss=0.09095, over 3890467.00 frames. ], batch size: 63, lr: 2.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:47:14,958 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.69 vs. limit=15.0 2024-08-15 04:47:39,874 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 26 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-15 04:47:45,645 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.04 vs. limit=12.0 2024-08-15 04:48:29,165 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 11600, loss[loss=0.09667, beats_loss=0.01141, ecapa_loss=0.000165, whisper_loss=0.08361, over 21625.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01049, ecapa_loss=0.0001523, whisper_loss=0.09152, over 3902497.30 frames. ], batch size: 90, lr: 2.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:48:35,360 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 22 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-15 04:48:41,897 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 25 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-15 04:49:02,497 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3014550.0, ans=0.0 2024-08-15 04:49:24,216 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3014650.0, ans=0.125 2024-08-15 04:49:32,699 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.834e+01 2.366e+01 2.590e+01 2.931e+01 3.199e+02, threshold=5.179e+01, percent-clipped=2.0 2024-08-15 04:49:49,118 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 11650, loss[loss=0.1076, beats_loss=0.01091, ecapa_loss=0.0001421, whisper_loss=0.09523, over 21767.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01055, ecapa_loss=0.0001513, whisper_loss=0.09096, over 3916018.49 frames. ], batch size: 87, lr: 2.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:49:50,060 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.23 vs. limit=15.0 2024-08-15 04:50:04,708 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3014950.0, ans=0.125 2024-08-15 04:50:28,124 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3015050.0, ans=0.2 2024-08-15 04:50:49,177 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3015150.0, ans=0.0 2024-08-15 04:50:56,847 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3015250.0, ans=0.0 2024-08-15 04:51:01,375 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-15 04:51:06,672 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 11700, loss[loss=0.1219, beats_loss=0.01152, ecapa_loss=0.0001466, whisper_loss=0.1089, over 23656.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01064, ecapa_loss=0.0001518, whisper_loss=0.09064, over 3931765.56 frames. ], batch size: 94, lr: 2.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:51:09,134 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3015350.0, ans=0.0 2024-08-15 04:51:09,478 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.71 vs. limit=12.0 2024-08-15 04:51:11,196 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.49 vs. limit=15.0 2024-08-15 04:51:27,496 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3015450.0, ans=0.125 2024-08-15 04:51:30,421 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3015450.0, ans=0.125 2024-08-15 04:51:48,711 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3015550.0, ans=0.0 2024-08-15 04:52:04,992 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 17 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-15 04:52:05,255 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3015650.0, ans=0.125 2024-08-15 04:52:08,827 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.840e+01 2.391e+01 2.584e+01 2.894e+01 1.234e+02, threshold=5.167e+01, percent-clipped=2.0 2024-08-15 04:52:26,667 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 11750, loss[loss=0.1222, beats_loss=0.009893, ecapa_loss=0.0001441, whisper_loss=0.1109, over 18704.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01075, ecapa_loss=0.0001508, whisper_loss=0.09072, over 3953173.50 frames. ], batch size: 73, lr: 2.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:52:34,793 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3015850.0, ans=0.2 2024-08-15 04:52:35,089 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.54 vs. limit=22.5 2024-08-15 04:53:12,865 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3016050.0, ans=0.0 2024-08-15 04:53:22,487 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3016150.0, ans=0.05 2024-08-15 04:53:30,113 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.89 vs. limit=15.0 2024-08-15 04:53:47,777 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 11800, loss[loss=0.109, beats_loss=0.01042, ecapa_loss=0.000131, whisper_loss=0.09728, over 21872.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01078, ecapa_loss=0.00015, whisper_loss=0.09056, over 3935296.18 frames. ], batch size: 86, lr: 2.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:53:49,808 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3016350.0, ans=0.1 2024-08-15 04:54:00,066 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3016350.0, ans=0.1 2024-08-15 04:54:01,790 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 25 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-15 04:54:16,096 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 20 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-15 04:54:17,560 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 14 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-15 04:54:42,757 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.86 vs. limit=15.0 2024-08-15 04:54:47,709 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.714e+01 2.361e+01 2.696e+01 3.037e+01 7.582e+01, threshold=5.392e+01, percent-clipped=2.0 2024-08-15 04:54:50,815 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-15 04:54:56,417 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3016750.0, ans=0.0 2024-08-15 04:54:58,274 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3016750.0, ans=0.1 2024-08-15 04:55:01,059 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 25 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-15 04:55:01,528 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3016750.0, ans=0.125 2024-08-15 04:55:04,538 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.64 vs. limit=12.0 2024-08-15 04:55:05,077 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 11850, loss[loss=0.08519, beats_loss=0.01113, ecapa_loss=0.0001543, whisper_loss=0.07252, over 20760.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01075, ecapa_loss=0.0001495, whisper_loss=0.09047, over 3932244.73 frames. ], batch size: 88, lr: 2.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:55:05,197 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 21 from LS+wenet, 31 from Vox, 34 fro AS 2024-08-15 04:55:21,966 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3016950.0, ans=0.0 2024-08-15 04:55:30,533 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 29 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-15 04:55:43,079 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-15 04:56:01,939 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 28 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-15 04:56:05,151 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3017250.0, ans=0.2 2024-08-15 04:56:14,453 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3017250.0, ans=0.0 2024-08-15 04:56:15,799 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3017250.0, ans=0.125 2024-08-15 04:56:20,988 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 11900, loss[loss=0.12, beats_loss=0.009706, ecapa_loss=0.000134, whisper_loss=0.109, over 22906.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0107, ecapa_loss=0.0001515, whisper_loss=0.09075, over 3939713.88 frames. ], batch size: 88, lr: 2.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:56:47,960 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.39 vs. limit=15.0 2024-08-15 04:56:56,550 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 19 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-15 04:57:04,340 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3017550.0, ans=0.125 2024-08-15 04:57:04,399 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3017550.0, ans=0.0 2024-08-15 04:57:08,282 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 19 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-15 04:57:17,750 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3017650.0, ans=0.0 2024-08-15 04:57:20,143 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.806e+01 2.277e+01 2.486e+01 2.850e+01 3.770e+01, threshold=4.972e+01, percent-clipped=0.0 2024-08-15 04:57:29,086 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3017750.0, ans=0.0 2024-08-15 04:57:31,113 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.37 vs. limit=22.5 2024-08-15 04:57:35,725 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 11950, loss[loss=0.1172, beats_loss=0.009254, ecapa_loss=0.0001494, whisper_loss=0.1064, over 19364.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01059, ecapa_loss=0.0001516, whisper_loss=0.09135, over 3915199.46 frames. ], batch size: 76, lr: 2.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:57:36,085 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3017850.0, ans=0.1 2024-08-15 04:57:53,498 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.93 vs. limit=12.0 2024-08-15 04:57:54,785 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.32 vs. limit=10.0 2024-08-15 04:57:56,115 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3017950.0, ans=0.125 2024-08-15 04:58:02,405 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3017950.0, ans=0.0 2024-08-15 04:58:08,092 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 21 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-15 04:58:09,320 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 26 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-15 04:58:14,250 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.62 vs. limit=15.0 2024-08-15 04:58:26,411 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3018150.0, ans=0.125 2024-08-15 04:58:34,911 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 28 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-15 04:58:39,320 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 21 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-15 04:58:48,136 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 12000, loss[loss=0.1, beats_loss=0.009667, ecapa_loss=0.0001591, whisper_loss=0.08876, over 15296.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01064, ecapa_loss=0.0001519, whisper_loss=0.09074, over 3906888.98 frames. ], batch size: 60, lr: 2.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:58:48,137 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-15 04:59:05,671 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.9612, 3.1926, 2.2827, 3.4051], device='cuda:1') 2024-08-15 04:59:32,720 INFO [train_multi_KD3.py:1149] (1/4) Epoch 21, validation on ASR_libri: loss=0.2527, beats_loss=0, ecapa_loss=0.0005394, whisper_loss=0.2473, over 922467.00 frames. 2024-08-15 04:59:52,983 INFO [train_multi_KD3.py:1149] (1/4) Epoch 21, validation on SV_voxceleb1: loss=0.004335, beats_loss=0, ecapa_loss=0.0004335, whisper_loss=0, over 939242.00 frames. 2024-08-15 05:00:31,036 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.6984, 4.1887, 2.9032, 4.5076], device='cuda:1') 2024-08-15 05:01:54,683 INFO [train_multi_KD3.py:1149] (1/4) Epoch 21, validation on AT_audioset: loss=0.02336, beats_loss=0.02336, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 05:01:54,687 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 30639MB 2024-08-15 05:02:11,483 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.74 vs. limit=15.0 2024-08-15 05:02:18,255 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 21 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-15 05:02:22,439 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3018550.0, ans=0.2 2024-08-15 05:02:31,690 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 24 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-15 05:02:49,306 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3018650.0, ans=0.2 2024-08-15 05:02:50,210 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.896e+01 2.329e+01 2.556e+01 2.882e+01 4.155e+01, threshold=5.113e+01, percent-clipped=0.0 2024-08-15 05:02:58,691 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 22 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-15 05:03:01,545 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3018750.0, ans=0.0 2024-08-15 05:03:04,938 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 12050, loss[loss=0.1013, beats_loss=0.0101, ecapa_loss=0.0001502, whisper_loss=0.08975, over 23193.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01059, ecapa_loss=0.0001511, whisper_loss=0.09057, over 3862473.24 frames. ], batch size: 92, lr: 2.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 05:03:06,694 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3018850.0, ans=0.125 2024-08-15 05:03:25,451 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.19 vs. limit=15.0 2024-08-15 05:03:32,076 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.71 vs. limit=12.0 2024-08-15 05:03:39,212 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 19 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-15 05:03:41,828 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-15 05:03:54,743 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3019150.0, ans=0.0 2024-08-15 05:03:54,749 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3019150.0, ans=0.0 2024-08-15 05:03:56,929 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 19 from LS+wenet, 26 from Vox, 45 fro AS 2024-08-15 05:04:09,708 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3019250.0, ans=0.125 2024-08-15 05:04:12,442 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3019350.0, ans=0.0 2024-08-15 05:04:13,361 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 12100, loss[loss=0.1031, beats_loss=0.01168, ecapa_loss=0.0001495, whisper_loss=0.08989, over 17951.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01056, ecapa_loss=0.0001521, whisper_loss=0.09083, over 3862961.54 frames. ], batch size: 73, lr: 2.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 05:04:27,077 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-15 05:04:37,497 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-15 05:04:45,881 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.238e-01 2024-08-15 05:04:56,226 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3019650.0, ans=0.125 2024-08-15 05:05:04,648 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 05:05:07,114 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 19 from LS+wenet, 27 from Vox, 45 fro AS 2024-08-15 05:05:09,572 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.760e+01 2.350e+01 2.548e+01 2.785e+01 3.671e+01, threshold=5.096e+01, percent-clipped=0.0 2024-08-15 05:05:17,109 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3019750.0, ans=0.0 2024-08-15 05:05:26,319 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 12150, loss[loss=0.1155, beats_loss=0.01166, ecapa_loss=0.0001482, whisper_loss=0.1024, over 22849.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01053, ecapa_loss=0.0001525, whisper_loss=0.09078, over 3854501.99 frames. ], batch size: 88, lr: 2.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 05:05:33,503 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-15 05:05:35,634 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=6.57 vs. limit=12.0 2024-08-15 05:05:49,082 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 21 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-15 05:06:10,324 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3020050.0, ans=0.2 2024-08-15 05:06:14,706 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3020150.0, ans=0.1 2024-08-15 05:06:18,866 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3020150.0, ans=0.125 2024-08-15 05:06:18,950 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3020150.0, ans=0.125 2024-08-15 05:06:27,272 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 21 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-15 05:06:28,913 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3020250.0, ans=0.1 2024-08-15 05:06:30,225 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3020250.0, ans=0.125 2024-08-15 05:06:33,351 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3020250.0, ans=0.125 2024-08-15 05:06:40,329 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 25 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-15 05:06:42,148 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 27 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-15 05:06:42,373 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=3020250.0, ans=0.2 2024-08-15 05:06:46,610 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 12200, loss[loss=0.1119, beats_loss=0.006576, ecapa_loss=0.0001419, whisper_loss=0.1039, over 15724.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01055, ecapa_loss=0.0001512, whisper_loss=0.09118, over 3872987.25 frames. ], batch size: 57, lr: 2.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 05:07:26,841 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3020550.0, ans=0.0 2024-08-15 05:07:43,976 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3020650.0, ans=0.125 2024-08-15 05:07:45,908 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.310e+01 2.623e+01 3.026e+01 6.571e+01, threshold=5.245e+01, percent-clipped=3.0 2024-08-15 05:07:50,708 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=3020750.0, ans=0.5 2024-08-15 05:07:50,736 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3020750.0, ans=0.0 2024-08-15 05:07:57,717 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 28 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-15 05:07:59,005 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 25 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-15 05:08:03,542 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 12250, loss[loss=0.1037, beats_loss=0.009362, ecapa_loss=0.0001411, whisper_loss=0.09296, over 21134.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.0105, ecapa_loss=0.0001519, whisper_loss=0.09144, over 3887485.40 frames. ], batch size: 80, lr: 2.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 05:08:07,702 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-15 05:08:16,683 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 22 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-15 05:08:28,364 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.23 vs. limit=15.0 2024-08-15 05:08:32,156 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3020950.0, ans=0.125 2024-08-15 05:08:51,358 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-15 05:08:54,930 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.79 vs. limit=22.5 2024-08-15 05:09:11,778 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.21 vs. limit=12.0 2024-08-15 05:09:20,936 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 12300, loss[loss=0.08823, beats_loss=0.01442, ecapa_loss=0.0001315, whisper_loss=0.0725, over 17876.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01057, ecapa_loss=0.0001514, whisper_loss=0.09104, over 3880256.92 frames. ], batch size: 75, lr: 2.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 05:09:21,686 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.66 vs. limit=15.0 2024-08-15 05:09:29,450 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 23 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-15 05:09:37,755 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.95 vs. limit=10.0 2024-08-15 05:09:49,450 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3021450.0, ans=0.0 2024-08-15 05:09:56,038 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.41 vs. limit=5.0 2024-08-15 05:10:01,984 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3021550.0, ans=0.1 2024-08-15 05:10:15,752 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-15 05:10:22,764 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.930e+01 2.390e+01 2.646e+01 2.944e+01 2.237e+02, threshold=5.293e+01, percent-clipped=1.0 2024-08-15 05:10:23,537 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.51 vs. limit=10.0 2024-08-15 05:10:28,479 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 20 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-15 05:10:38,177 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 12350, loss[loss=0.1073, beats_loss=0.009554, ecapa_loss=0.0001856, whisper_loss=0.0959, over 20974.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01055, ecapa_loss=0.0001521, whisper_loss=0.09132, over 3917103.45 frames. ], batch size: 90, lr: 2.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 05:10:40,937 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.75 vs. limit=15.0 2024-08-15 05:10:43,485 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 29 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-15 05:10:52,172 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3021950.0, ans=10.0 2024-08-15 05:11:05,500 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 27 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-15 05:11:10,408 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.98 vs. limit=15.0 2024-08-15 05:11:12,726 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-15 05:11:19,938 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=3.374e-02 2024-08-15 05:11:20,804 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 37 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-15 05:11:30,353 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.11 vs. limit=10.0 2024-08-15 05:11:50,361 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 12400, loss[loss=0.1045, beats_loss=0.01428, ecapa_loss=0.0001389, whisper_loss=0.08885, over 21465.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01054, ecapa_loss=0.0001531, whisper_loss=0.09162, over 3935373.64 frames. ], batch size: 90, lr: 2.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 05:12:02,948 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3022450.0, ans=0.125 2024-08-15 05:12:07,284 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3022450.0, ans=0.0 2024-08-15 05:12:11,133 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 16 from LS+wenet, 25 from Vox, 22 fro AS 2024-08-15 05:12:14,340 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3022450.0, ans=0.1 2024-08-15 05:12:25,809 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=3022550.0, ans=0.05 2024-08-15 05:12:33,993 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 28 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-15 05:12:42,912 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.329e+01 2.587e+01 2.851e+01 3.829e+01, threshold=5.175e+01, percent-clipped=0.0 2024-08-15 05:12:49,960 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3022750.0, ans=0.125 2024-08-15 05:12:58,005 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 12450, loss[loss=0.1026, beats_loss=0.01062, ecapa_loss=0.0001509, whisper_loss=0.09051, over 16122.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01052, ecapa_loss=0.0001527, whisper_loss=0.09161, over 3933234.22 frames. ], batch size: 62, lr: 2.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 05:13:01,039 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 21 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-15 05:13:04,995 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 20 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-15 05:13:13,368 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3022950.0, ans=0.2 2024-08-15 05:13:24,061 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-15 05:13:24,775 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2024-08-15 05:13:32,475 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3023050.0, ans=0.07 2024-08-15 05:13:40,369 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3023150.0, ans=0.1 2024-08-15 05:14:04,752 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 12500, loss[loss=0.09194, beats_loss=0.01032, ecapa_loss=0.000151, whisper_loss=0.08012, over 18733.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01069, ecapa_loss=0.0001517, whisper_loss=0.09089, over 3899596.31 frames. ], batch size: 76, lr: 2.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 05:14:16,954 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 26 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-15 05:14:17,547 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3023450.0, ans=0.1 2024-08-15 05:14:19,826 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 25 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-15 05:14:32,176 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3023550.0, ans=0.125 2024-08-15 05:14:33,753 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3023550.0, ans=0.125 2024-08-15 05:14:37,860 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.25 vs. limit=15.0 2024-08-15 05:14:39,479 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.33 vs. limit=10.0 2024-08-15 05:14:41,927 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3023550.0, ans=0.125 2024-08-15 05:14:54,185 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.52 vs. limit=12.0 2024-08-15 05:14:56,527 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 27 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-15 05:14:57,795 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.752e+01 2.324e+01 2.569e+01 2.941e+01 3.163e+02, threshold=5.138e+01, percent-clipped=2.0 2024-08-15 05:15:03,946 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3023750.0, ans=0.0 2024-08-15 05:15:12,644 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 12550, loss[loss=0.1071, beats_loss=0.01019, ecapa_loss=0.0001281, whisper_loss=0.09567, over 18447.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01065, ecapa_loss=0.0001515, whisper_loss=0.09078, over 3921268.31 frames. ], batch size: 70, lr: 2.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 05:15:14,079 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 26 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-15 05:15:19,643 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3023850.0, ans=0.0 2024-08-15 05:15:32,949 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-15 05:15:52,597 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-15 05:16:10,922 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.53 vs. limit=10.0 2024-08-15 05:16:16,402 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3024250.0, ans=0.125 2024-08-15 05:16:16,805 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.49 vs. limit=15.0 2024-08-15 05:16:20,254 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 12600, loss[loss=0.09884, beats_loss=0.01046, ecapa_loss=0.0001662, whisper_loss=0.08672, over 22723.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01064, ecapa_loss=0.0001527, whisper_loss=0.09129, over 3924904.35 frames. ], batch size: 93, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:16:29,637 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 33 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-15 05:16:30,891 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 30 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-15 05:16:52,541 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 29 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-15 05:17:10,802 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.27 vs. limit=15.0 2024-08-15 05:17:13,736 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.257e+01 2.680e+01 2.970e+01 2.910e+02, threshold=5.361e+01, percent-clipped=1.0 2024-08-15 05:17:25,020 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3024750.0, ans=0.0 2024-08-15 05:17:26,275 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3024850.0, ans=0.125 2024-08-15 05:17:27,241 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 12650, loss[loss=0.09727, beats_loss=0.01233, ecapa_loss=0.0001171, whisper_loss=0.08377, over 21353.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01064, ecapa_loss=0.0001532, whisper_loss=0.0915, over 3944161.68 frames. ], batch size: 82, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:17:31,155 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-15 05:17:39,565 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3024950.0, ans=0.0 2024-08-15 05:17:46,564 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3024950.0, ans=0.1 2024-08-15 05:18:09,989 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-15 05:18:33,337 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 12700, loss[loss=0.1125, beats_loss=0.009723, ecapa_loss=0.0001489, whisper_loss=0.1013, over 20084.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01065, ecapa_loss=0.0001526, whisper_loss=0.09166, over 3894629.22 frames. ], batch size: 77, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:18:43,589 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.37 vs. limit=15.0 2024-08-15 05:18:50,191 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 23 from LS+wenet, 30 from Vox, 28 fro AS 2024-08-15 05:18:54,083 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3025450.0, ans=0.0 2024-08-15 05:18:56,508 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3025450.0, ans=0.125 2024-08-15 05:18:56,745 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3025450.0, ans=0.0 2024-08-15 05:18:59,369 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3025550.0, ans=0.0 2024-08-15 05:19:11,268 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 27 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-15 05:19:15,740 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.67 vs. limit=15.0 2024-08-15 05:19:18,040 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.99 vs. limit=15.0 2024-08-15 05:19:23,198 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-15 05:19:26,769 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.807e+01 2.352e+01 2.609e+01 2.982e+01 1.854e+02, threshold=5.218e+01, percent-clipped=2.0 2024-08-15 05:19:30,992 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3025750.0, ans=0.1 2024-08-15 05:19:39,861 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 12750, loss[loss=0.1077, beats_loss=0.009997, ecapa_loss=0.0001146, whisper_loss=0.09651, over 15148.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01067, ecapa_loss=0.0001527, whisper_loss=0.09178, over 3883909.58 frames. ], batch size: 57, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:19:56,052 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.19 vs. limit=10.0 2024-08-15 05:20:06,370 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.65 vs. limit=15.0 2024-08-15 05:20:13,646 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3026050.0, ans=0.07 2024-08-15 05:20:17,131 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3026050.0, ans=0.125 2024-08-15 05:20:22,919 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.22 vs. limit=15.0 2024-08-15 05:20:24,892 WARNING [optim.py:496] (1/4) Scaling gradients by 0.023750245571136475, model_norm_threshold=52.18341064453125 2024-08-15 05:20:25,067 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.16, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.496e+05, grad_sumsq=7.496e+05, orig_rms_sq=1.000e+00 2024-08-15 05:20:45,886 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 12800, loss[loss=0.08911, beats_loss=0.01101, ecapa_loss=0.0001818, whisper_loss=0.07628, over 16142.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01068, ecapa_loss=0.000153, whisper_loss=0.09164, over 3887179.71 frames. ], batch size: 68, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:20:49,146 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3026350.0, ans=0.0 2024-08-15 05:20:51,560 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3026350.0, ans=0.125 2024-08-15 05:21:06,326 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3026450.0, ans=0.0 2024-08-15 05:21:33,927 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 29 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-15 05:21:39,055 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.818e+01 2.369e+01 2.658e+01 2.978e+01 2.197e+03, threshold=5.317e+01, percent-clipped=3.0 2024-08-15 05:21:52,658 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 12850, loss[loss=0.08058, beats_loss=0.01205, ecapa_loss=0.0001981, whisper_loss=0.06655, over 13793.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01068, ecapa_loss=0.0001535, whisper_loss=0.09123, over 3873353.40 frames. ], batch size: 59, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:21:55,965 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3026850.0, ans=0.2 2024-08-15 05:22:11,089 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3026950.0, ans=0.125 2024-08-15 05:22:11,507 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.69 vs. limit=12.0 2024-08-15 05:22:27,915 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3027050.0, ans=0.1 2024-08-15 05:22:38,265 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 35 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-15 05:22:38,783 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3027150.0, ans=0.0 2024-08-15 05:22:42,637 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3027150.0, ans=0.125 2024-08-15 05:22:47,751 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 20 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-15 05:22:47,969 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3027250.0, ans=0.0 2024-08-15 05:22:59,375 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 12900, loss[loss=0.08836, beats_loss=0.01314, ecapa_loss=0.0001295, whisper_loss=0.07393, over 22708.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01071, ecapa_loss=0.0001523, whisper_loss=0.09074, over 3849615.95 frames. ], batch size: 93, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:23:09,358 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 23 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-15 05:23:24,293 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3027450.0, ans=0.125 2024-08-15 05:23:27,188 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3027550.0, ans=0.125 2024-08-15 05:23:32,583 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3027550.0, ans=0.2 2024-08-15 05:23:34,943 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 23 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-15 05:23:44,705 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.32 vs. limit=10.0 2024-08-15 05:23:53,727 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.303e+01 2.501e+01 2.765e+01 4.358e+01, threshold=5.003e+01, percent-clipped=0.0 2024-08-15 05:23:55,456 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3027750.0, ans=0.0 2024-08-15 05:24:06,525 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 12950, loss[loss=0.1105, beats_loss=0.009373, ecapa_loss=0.0001651, whisper_loss=0.09944, over 16225.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0107, ecapa_loss=0.0001521, whisper_loss=0.0899, over 3847725.80 frames. ], batch size: 62, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:24:16,555 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3027850.0, ans=0.125 2024-08-15 05:24:32,300 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3027950.0, ans=0.125 2024-08-15 05:24:49,896 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3028150.0, ans=0.125 2024-08-15 05:24:52,505 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 31 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-15 05:24:53,990 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3028150.0, ans=0.1 2024-08-15 05:25:08,869 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.26 vs. limit=15.0 2024-08-15 05:25:13,670 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 13000, loss[loss=0.1007, beats_loss=0.007348, ecapa_loss=0.0001771, whisper_loss=0.09154, over 16068.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01055, ecapa_loss=0.0001533, whisper_loss=0.09135, over 3883248.41 frames. ], batch size: 62, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:25:25,821 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 32 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-15 05:25:37,175 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3028450.0, ans=0.0 2024-08-15 05:25:37,495 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.34 vs. limit=22.5 2024-08-15 05:25:53,388 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 16 from LS+wenet, 26 from Vox, 17 fro AS 2024-08-15 05:26:03,018 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.21 vs. limit=15.0 2024-08-15 05:26:07,342 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.787e+01 2.449e+01 2.686e+01 3.098e+01 1.940e+02, threshold=5.373e+01, percent-clipped=2.0 2024-08-15 05:26:12,187 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3028750.0, ans=0.125 2024-08-15 05:26:20,107 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3028850.0, ans=0.0 2024-08-15 05:26:21,091 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 13050, loss[loss=0.1095, beats_loss=0.01192, ecapa_loss=0.0001227, whisper_loss=0.09636, over 16892.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01059, ecapa_loss=0.0001535, whisper_loss=0.09093, over 3891803.48 frames. ], batch size: 67, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:26:27,274 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3028850.0, ans=0.1 2024-08-15 05:26:42,686 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 18 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-15 05:26:53,221 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 19 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-15 05:26:53,741 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.11 vs. limit=15.0 2024-08-15 05:27:26,912 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3029250.0, ans=0.125 2024-08-15 05:27:32,070 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-15 05:27:33,578 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 13100, loss[loss=0.1113, beats_loss=0.009528, ecapa_loss=0.000157, whisper_loss=0.1002, over 22931.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01062, ecapa_loss=0.0001526, whisper_loss=0.09016, over 3854060.79 frames. ], batch size: 91, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:27:35,681 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 19 from LS+wenet, 10 from Vox, 30 fro AS 2024-08-15 05:28:12,259 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 40 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-15 05:28:20,888 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3029650.0, ans=0.0 2024-08-15 05:28:29,586 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.54 vs. limit=22.5 2024-08-15 05:28:36,025 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.005e+01 2.367e+01 2.624e+01 3.031e+01 1.630e+02, threshold=5.247e+01, percent-clipped=4.0 2024-08-15 05:28:36,965 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.57 vs. limit=6.0 2024-08-15 05:28:37,622 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 24 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-15 05:28:47,926 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 24 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-15 05:28:50,976 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 13150, loss[loss=0.0882, beats_loss=0.01376, ecapa_loss=0.0001444, whisper_loss=0.073, over 21244.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01057, ecapa_loss=0.0001523, whisper_loss=0.09091, over 3871361.53 frames. ], batch size: 88, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:29:33,825 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 17 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-15 05:29:37,358 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 15 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-15 05:29:58,321 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3030250.0, ans=0.0 2024-08-15 05:30:00,819 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 32 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-15 05:30:09,234 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 13200, loss[loss=0.09852, beats_loss=0.01304, ecapa_loss=0.0001577, whisper_loss=0.08391, over 20588.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01051, ecapa_loss=0.0001528, whisper_loss=0.09129, over 3839230.64 frames. ], batch size: 84, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:30:18,766 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-15 05:30:33,992 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-15 05:30:49,137 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3030550.0, ans=0.2 2024-08-15 05:30:50,452 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 16 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-15 05:31:11,628 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.934e+01 2.273e+01 2.515e+01 2.855e+01 4.648e+01, threshold=5.030e+01, percent-clipped=0.0 2024-08-15 05:31:12,135 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3030750.0, ans=0.125 2024-08-15 05:31:12,171 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3030750.0, ans=0.125 2024-08-15 05:31:16,510 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3030750.0, ans=0.0 2024-08-15 05:31:26,949 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 13250, loss[loss=0.1127, beats_loss=0.009085, ecapa_loss=0.0001887, whisper_loss=0.1017, over 20394.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01045, ecapa_loss=0.0001539, whisper_loss=0.09124, over 3822878.72 frames. ], batch size: 87, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:31:27,066 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 19 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-15 05:31:40,096 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3030850.0, ans=0.125 2024-08-15 05:31:41,517 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3030950.0, ans=0.2 2024-08-15 05:31:54,815 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3030950.0, ans=0.125 2024-08-15 05:32:28,269 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3031250.0, ans=0.125 2024-08-15 05:32:41,692 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 13300, loss[loss=0.1192, beats_loss=0.01036, ecapa_loss=0.0001854, whisper_loss=0.107, over 22441.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01051, ecapa_loss=0.0001532, whisper_loss=0.09119, over 3842401.77 frames. ], batch size: 92, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:32:54,579 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3031350.0, ans=0.125 2024-08-15 05:33:05,600 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.01 vs. limit=15.0 2024-08-15 05:33:13,791 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 22 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-15 05:33:31,324 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 17 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-15 05:33:31,624 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3031650.0, ans=0.125 2024-08-15 05:33:32,924 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-15 05:33:41,234 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.965e+01 2.389e+01 2.602e+01 2.951e+01 3.808e+01, threshold=5.204e+01, percent-clipped=0.0 2024-08-15 05:33:41,962 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3031750.0, ans=0.125 2024-08-15 05:33:45,880 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3031750.0, ans=0.125 2024-08-15 05:33:52,683 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 25 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-15 05:33:55,374 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 13350, loss[loss=0.08058, beats_loss=0.009886, ecapa_loss=0.0001244, whisper_loss=0.06945, over 15579.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01046, ecapa_loss=0.0001529, whisper_loss=0.09145, over 3860031.96 frames. ], batch size: 56, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:33:58,619 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3031850.0, ans=0.0 2024-08-15 05:33:58,659 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3031850.0, ans=0.125 2024-08-15 05:34:10,137 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3031950.0, ans=0.125 2024-08-15 05:34:17,378 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 33 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-15 05:34:17,754 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3031950.0, ans=0.125 2024-08-15 05:34:19,027 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3031950.0, ans=0.0 2024-08-15 05:34:21,355 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 27 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-15 05:34:35,754 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3032050.0, ans=0.0 2024-08-15 05:35:06,185 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 13400, loss[loss=0.08788, beats_loss=0.008717, ecapa_loss=0.0002013, whisper_loss=0.07715, over 13610.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01052, ecapa_loss=0.0001521, whisper_loss=0.09105, over 3837884.39 frames. ], batch size: 57, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:35:16,275 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3032350.0, ans=0.125 2024-08-15 05:35:17,350 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 14 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-15 05:35:18,641 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 13 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-15 05:35:22,338 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.78 vs. limit=15.0 2024-08-15 05:35:22,424 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.11 vs. limit=15.0 2024-08-15 05:35:25,739 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 39 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-15 05:35:47,795 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 22 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-15 05:36:02,278 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.770e+01 2.320e+01 2.582e+01 2.828e+01 6.062e+01, threshold=5.164e+01, percent-clipped=1.0 2024-08-15 05:36:02,798 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3032750.0, ans=0.125 2024-08-15 05:36:18,145 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 13450, loss[loss=0.1126, beats_loss=0.009241, ecapa_loss=0.0001474, whisper_loss=0.1019, over 22499.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01055, ecapa_loss=0.0001528, whisper_loss=0.09089, over 3870986.33 frames. ], batch size: 89, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:36:33,736 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 16 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-15 05:36:44,431 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 24 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-15 05:37:12,189 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3033150.0, ans=0.125 2024-08-15 05:37:13,146 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 26 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-15 05:37:13,707 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.12 vs. limit=6.0 2024-08-15 05:37:16,557 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3033250.0, ans=0.125 2024-08-15 05:37:21,762 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.90 vs. limit=12.0 2024-08-15 05:37:26,096 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3033250.0, ans=0.125 2024-08-15 05:37:30,428 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 13500, loss[loss=0.1078, beats_loss=0.008724, ecapa_loss=0.0001667, whisper_loss=0.09742, over 19802.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01054, ecapa_loss=0.0001529, whisper_loss=0.09074, over 3896487.48 frames. ], batch size: 77, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:38:05,797 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-15 05:38:19,220 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3033650.0, ans=0.125 2024-08-15 05:38:20,282 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 15 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-15 05:38:26,315 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.60 vs. limit=10.0 2024-08-15 05:38:26,866 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.796e+01 2.315e+01 2.561e+01 2.861e+01 3.892e+01, threshold=5.123e+01, percent-clipped=0.0 2024-08-15 05:38:27,328 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 22 from LS+wenet, 11 from Vox, 39 fro AS 2024-08-15 05:38:30,553 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3033750.0, ans=0.1 2024-08-15 05:38:31,552 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 22 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-15 05:38:33,357 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.05 vs. limit=15.0 2024-08-15 05:38:35,443 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 25 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-15 05:38:41,140 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 13550, loss[loss=0.1223, beats_loss=0.009744, ecapa_loss=0.000155, whisper_loss=0.111, over 21923.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01059, ecapa_loss=0.0001525, whisper_loss=0.09047, over 3894169.62 frames. ], batch size: 88, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:38:43,699 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.35 vs. limit=12.0 2024-08-15 05:38:47,668 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3033850.0, ans=0.1 2024-08-15 05:38:52,266 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3033850.0, ans=0.125 2024-08-15 05:39:15,693 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=8.853e+00 2024-08-15 05:39:31,023 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3034150.0, ans=0.0 2024-08-15 05:39:38,100 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-15 05:39:39,810 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.29 vs. limit=15.0 2024-08-15 05:39:42,114 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3034250.0, ans=0.0 2024-08-15 05:39:53,127 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 13600, loss[loss=0.1016, beats_loss=0.009343, ecapa_loss=0.0001554, whisper_loss=0.0907, over 16519.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01067, ecapa_loss=0.000152, whisper_loss=0.09021, over 3906506.16 frames. ], batch size: 67, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:39:59,043 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3034350.0, ans=0.1 2024-08-15 05:40:01,685 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 23 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-15 05:40:16,396 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3034450.0, ans=0.125 2024-08-15 05:40:18,279 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.36 vs. limit=15.0 2024-08-15 05:40:46,987 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3034650.0, ans=0.125 2024-08-15 05:40:51,296 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.15 vs. limit=15.0 2024-08-15 05:40:53,772 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.747e+01 2.277e+01 2.545e+01 2.819e+01 3.866e+01, threshold=5.090e+01, percent-clipped=0.0 2024-08-15 05:40:54,791 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.03 vs. limit=22.5 2024-08-15 05:41:00,397 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3034750.0, ans=0.0 2024-08-15 05:41:08,344 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 13650, loss[loss=0.1088, beats_loss=0.01011, ecapa_loss=0.0001419, whisper_loss=0.09726, over 22945.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01067, ecapa_loss=0.000153, whisper_loss=0.09033, over 3896707.73 frames. ], batch size: 92, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:41:08,448 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 28 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-15 05:41:27,617 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 34 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-15 05:41:39,276 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 22 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-15 05:41:41,323 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3035050.0, ans=0.125 2024-08-15 05:41:42,458 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3035050.0, ans=0.1 2024-08-15 05:41:50,761 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3035150.0, ans=0.125 2024-08-15 05:42:22,458 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 13700, loss[loss=0.1149, beats_loss=0.00913, ecapa_loss=0.0001365, whisper_loss=0.1044, over 16024.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01072, ecapa_loss=0.0001523, whisper_loss=0.09074, over 3909907.07 frames. ], batch size: 62, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:42:27,970 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.82 vs. limit=12.0 2024-08-15 05:42:29,162 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3035350.0, ans=0.04949747468305833 2024-08-15 05:42:50,796 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.53 vs. limit=15.0 2024-08-15 05:43:06,842 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3035650.0, ans=0.1 2024-08-15 05:43:15,824 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3035650.0, ans=0.0 2024-08-15 05:43:23,415 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.259e+01 2.458e+01 2.754e+01 9.155e+01, threshold=4.917e+01, percent-clipped=1.0 2024-08-15 05:43:25,119 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3035750.0, ans=0.125 2024-08-15 05:43:25,206 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3035750.0, ans=0.125 2024-08-15 05:43:38,632 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 13750, loss[loss=0.1182, beats_loss=0.007612, ecapa_loss=0.000173, whisper_loss=0.1088, over 20291.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01075, ecapa_loss=0.0001513, whisper_loss=0.09026, over 3891306.41 frames. ], batch size: 79, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:43:39,091 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3035850.0, ans=0.0 2024-08-15 05:43:55,569 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 22 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-15 05:43:58,088 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 27 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-15 05:44:04,116 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.99 vs. limit=15.0 2024-08-15 05:44:12,567 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.53 vs. limit=6.0 2024-08-15 05:44:14,646 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 22 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-15 05:44:39,923 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 22 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-15 05:44:41,982 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 20 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-15 05:44:42,417 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3036150.0, ans=0.0 2024-08-15 05:44:51,320 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-15 05:44:51,997 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3036250.0, ans=0.0 2024-08-15 05:45:00,496 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 13800, loss[loss=0.109, beats_loss=0.01174, ecapa_loss=0.0001608, whisper_loss=0.09569, over 22179.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01076, ecapa_loss=0.0001515, whisper_loss=0.09022, over 3864743.32 frames. ], batch size: 90, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:45:01,123 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3036350.0, ans=0.07 2024-08-15 05:45:06,804 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3036350.0, ans=0.1 2024-08-15 05:45:08,248 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.416e+01 2024-08-15 05:45:11,619 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3036350.0, ans=0.1 2024-08-15 05:45:11,713 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.76 vs. limit=15.0 2024-08-15 05:46:04,350 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-15 05:46:05,924 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.641e+01 2.299e+01 2.505e+01 2.770e+01 3.939e+01, threshold=5.011e+01, percent-clipped=0.0 2024-08-15 05:46:18,017 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 23 from LS+wenet, 34 from Vox, 37 fro AS 2024-08-15 05:46:22,213 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 13850, loss[loss=0.1161, beats_loss=0.01226, ecapa_loss=0.0001239, whisper_loss=0.1026, over 23088.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01071, ecapa_loss=0.0001524, whisper_loss=0.09094, over 3880073.12 frames. ], batch size: 89, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:46:24,948 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 23 from LS+wenet, 22 from Vox, 49 fro AS 2024-08-15 05:46:32,836 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3036850.0, ans=0.125 2024-08-15 05:46:35,207 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 21 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-15 05:46:38,636 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3036950.0, ans=0.0 2024-08-15 05:46:56,124 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 26 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-15 05:46:57,328 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-15 05:47:00,404 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3037050.0, ans=0.0 2024-08-15 05:47:10,229 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-15 05:47:13,391 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-15 05:47:16,435 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3037150.0, ans=0.07 2024-08-15 05:47:18,672 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 15 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-15 05:47:19,022 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3037150.0, ans=0.1 2024-08-15 05:47:27,397 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3037250.0, ans=0.0 2024-08-15 05:47:31,748 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-15 05:47:36,246 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.03 vs. limit=22.5 2024-08-15 05:47:40,621 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3037250.0, ans=0.125 2024-08-15 05:47:42,796 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 13900, loss[loss=0.1014, beats_loss=0.01213, ecapa_loss=0.0001581, whisper_loss=0.08765, over 22847.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0107, ecapa_loss=0.0001527, whisper_loss=0.09068, over 3900861.51 frames. ], batch size: 94, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:48:01,127 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.52 vs. limit=15.0 2024-08-15 05:48:14,046 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 29 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-15 05:48:22,691 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 33 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-15 05:48:31,293 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 25 from LS+wenet, 18 from Vox, 51 fro AS 2024-08-15 05:48:36,729 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3037650.0, ans=0.125 2024-08-15 05:48:43,399 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3037650.0, ans=0.125 2024-08-15 05:48:44,958 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3037650.0, ans=0.2 2024-08-15 05:48:49,087 WARNING [optim.py:496] (1/4) Scaling gradients by 0.05059259384870529, model_norm_threshold=50.10878372192383 2024-08-15 05:48:49,257 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.37, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.601e+05, grad_sumsq=3.601e+05, orig_rms_sq=1.000e+00 2024-08-15 05:48:51,772 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.901e+01 2.379e+01 2.607e+01 2.936e+01 9.904e+02, threshold=5.213e+01, percent-clipped=4.0 2024-08-15 05:49:06,848 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 13950, loss[loss=0.1133, beats_loss=0.01085, ecapa_loss=0.0001548, whisper_loss=0.1009, over 19443.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01067, ecapa_loss=0.0001523, whisper_loss=0.09183, over 3920018.15 frames. ], batch size: 77, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:49:10,433 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.97 vs. limit=15.0 2024-08-15 05:49:19,336 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3037850.0, ans=0.125 2024-08-15 05:49:20,756 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3037850.0, ans=0.125 2024-08-15 05:49:28,445 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3037950.0, ans=0.125 2024-08-15 05:49:28,526 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3037950.0, ans=0.2 2024-08-15 05:49:30,450 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 22 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-15 05:49:39,263 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3037950.0, ans=0.0 2024-08-15 05:49:42,673 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-15 05:49:46,818 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 29 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-15 05:49:47,086 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3038050.0, ans=0.125 2024-08-15 05:49:48,549 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 20 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-15 05:50:09,068 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3038150.0, ans=0.125 2024-08-15 05:50:43,771 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3038350.0, ans=0.0 2024-08-15 05:50:44,598 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 14000, loss[loss=0.1097, beats_loss=0.008646, ecapa_loss=0.0001725, whisper_loss=0.09934, over 19258.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01068, ecapa_loss=0.000151, whisper_loss=0.09176, over 3916261.94 frames. ], batch size: 78, lr: 2.93e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:51:07,199 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 26 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-15 05:51:28,341 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.44 vs. limit=22.5 2024-08-15 05:51:45,954 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3038550.0, ans=0.125 2024-08-15 05:51:53,476 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3038650.0, ans=0.2 2024-08-15 05:52:02,728 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 23 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-15 05:52:03,421 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3038650.0, ans=0.125 2024-08-15 05:52:11,446 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.958e+01 2.343e+01 2.615e+01 2.930e+01 6.184e+01, threshold=5.231e+01, percent-clipped=1.0 2024-08-15 05:52:35,271 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 14050, loss[loss=0.07922, beats_loss=0.01341, ecapa_loss=0.0001484, whisper_loss=0.06432, over 14193.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01073, ecapa_loss=0.0001497, whisper_loss=0.09108, over 3904334.07 frames. ], batch size: 58, lr: 2.93e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:52:44,313 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3038850.0, ans=0.125 2024-08-15 05:52:50,725 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3038850.0, ans=0.0 2024-08-15 05:52:55,164 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.57 vs. limit=22.5 2024-08-15 05:53:30,263 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3039050.0, ans=0.125 2024-08-15 05:53:33,527 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3039050.0, ans=0.0 2024-08-15 05:53:48,163 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3039150.0, ans=0.125 2024-08-15 05:53:53,681 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3039150.0, ans=0.2 2024-08-15 05:54:14,780 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3039250.0, ans=0.0 2024-08-15 05:54:18,248 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 14100, loss[loss=0.09591, beats_loss=0.009847, ecapa_loss=0.0002135, whisper_loss=0.08392, over 15837.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01076, ecapa_loss=0.0001503, whisper_loss=0.0908, over 3895146.73 frames. ], batch size: 68, lr: 2.93e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:54:23,078 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3039350.0, ans=0.1 2024-08-15 05:54:38,192 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.538e-02 2024-08-15 05:54:47,673 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.87 vs. limit=15.0 2024-08-15 05:54:51,173 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 20 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-15 05:55:04,138 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3039550.0, ans=0.0 2024-08-15 05:55:25,353 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 17 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-15 05:55:27,907 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.018e+01 2.350e+01 2.664e+01 3.020e+01 1.564e+02, threshold=5.328e+01, percent-clipped=1.0 2024-08-15 05:55:32,748 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3039750.0, ans=0.0 2024-08-15 05:55:41,440 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 14150, loss[loss=0.09853, beats_loss=0.01062, ecapa_loss=0.0001751, whisper_loss=0.08616, over 14784.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01077, ecapa_loss=0.0001507, whisper_loss=0.09041, over 3902147.12 frames. ], batch size: 60, lr: 2.93e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:55:41,860 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3039850.0, ans=0.125 2024-08-15 05:55:51,847 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.75 vs. limit=15.0 2024-08-15 05:55:56,943 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3039950.0, ans=0.125 2024-08-15 05:56:07,315 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.50 vs. limit=6.0 2024-08-15 05:56:14,940 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3040050.0, ans=0.125 2024-08-15 05:56:25,868 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3040050.0, ans=0.125 2024-08-15 05:56:42,952 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3040250.0, ans=0.125 2024-08-15 05:56:54,985 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3040250.0, ans=0.125 2024-08-15 05:56:58,501 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 14200, loss[loss=0.08299, beats_loss=0.01174, ecapa_loss=0.0001175, whisper_loss=0.07007, over 19247.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01079, ecapa_loss=0.00015, whisper_loss=0.08992, over 3905772.13 frames. ], batch size: 76, lr: 2.93e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:57:00,346 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 20 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-15 05:57:03,121 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3040350.0, ans=0.09899494936611666 2024-08-15 05:57:15,162 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3040450.0, ans=0.125 2024-08-15 05:57:16,342 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=3040450.0, ans=0.1 2024-08-15 05:57:33,439 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 33 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-15 05:57:44,860 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3040650.0, ans=0.05 2024-08-15 05:58:00,374 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.915e+01 2.326e+01 2.592e+01 2.924e+01 6.304e+01, threshold=5.183e+01, percent-clipped=1.0 2024-08-15 05:58:00,546 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 26 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-15 05:58:15,388 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 14250, loss[loss=0.09136, beats_loss=0.01442, ecapa_loss=9.872e-05, whisper_loss=0.07596, over 23532.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01081, ecapa_loss=0.0001492, whisper_loss=0.08985, over 3922571.72 frames. ], batch size: 92, lr: 2.93e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:58:44,667 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.41 vs. limit=15.0 2024-08-15 05:58:53,811 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3041050.0, ans=0.1 2024-08-15 05:58:57,238 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3041050.0, ans=0.125 2024-08-15 05:58:57,264 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3041050.0, ans=0.2 2024-08-15 05:58:58,738 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3041050.0, ans=0.1 2024-08-15 05:59:00,339 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3041050.0, ans=0.125 2024-08-15 05:59:07,413 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 34 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-15 05:59:33,382 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 21 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-15 05:59:36,772 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 14300, loss[loss=0.1149, beats_loss=0.009927, ecapa_loss=0.0001517, whisper_loss=0.1034, over 21803.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01078, ecapa_loss=0.0001489, whisper_loss=0.09013, over 3919553.59 frames. ], batch size: 87, lr: 2.93e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:59:38,580 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-15 05:59:53,920 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3041450.0, ans=0.125 2024-08-15 05:59:57,586 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3041450.0, ans=0.1 2024-08-15 06:00:00,843 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3041450.0, ans=0.09899494936611666 2024-08-15 06:00:22,787 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-15 06:00:24,077 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-15 06:00:44,978 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.985e+01 2.480e+01 2.675e+01 2.988e+01 3.150e+02, threshold=5.350e+01, percent-clipped=2.0 2024-08-15 06:00:58,184 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3041750.0, ans=0.125 2024-08-15 06:01:01,724 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 14350, loss[loss=0.08126, beats_loss=0.01233, ecapa_loss=0.0001706, whisper_loss=0.06723, over 21728.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01074, ecapa_loss=0.0001487, whisper_loss=0.09086, over 3936707.31 frames. ], batch size: 92, lr: 2.93e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 06:01:16,239 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-15 06:01:22,672 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 37 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-15 06:01:24,364 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 14 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-15 06:01:34,738 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3042050.0, ans=0.1 2024-08-15 06:01:53,282 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3042150.0, ans=0.125 2024-08-15 06:02:00,029 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 29 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-15 06:02:09,244 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 24 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-15 06:02:10,502 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 15 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-15 06:02:19,130 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 14400, loss[loss=0.09547, beats_loss=0.01156, ecapa_loss=0.0001541, whisper_loss=0.08236, over 22054.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01076, ecapa_loss=0.0001492, whisper_loss=0.09048, over 3912175.54 frames. ], batch size: 90, lr: 2.93e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 06:02:42,004 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3042450.0, ans=0.125 2024-08-15 06:02:47,228 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 19 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-15 06:03:00,341 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3042550.0, ans=0.0 2024-08-15 06:03:00,360 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3042550.0, ans=0.125 2024-08-15 06:03:11,488 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3042650.0, ans=0.1 2024-08-15 06:03:18,025 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 24 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-15 06:03:23,796 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.938e+01 2.353e+01 2.673e+01 3.020e+01 3.990e+01, threshold=5.347e+01, percent-clipped=0.0 2024-08-15 06:03:30,460 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3042750.0, ans=0.125 2024-08-15 06:03:32,289 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-15 06:03:35,986 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.66 vs. limit=15.0 2024-08-15 06:03:40,343 INFO [train_multi_KD3.py:1116] (1/4) Epoch 21, batch 14450, loss[loss=0.1169, beats_loss=0.009411, ecapa_loss=0.0001823, whisper_loss=0.1056, over 16753.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01076, ecapa_loss=0.0001502, whisper_loss=0.09015, over 3903393.86 frames. ], batch size: 69, lr: 2.93e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 06:03:40,891 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3042850.0, ans=0.125 2024-08-15 06:03:41,861 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 23 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-15 06:04:01,559 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3042950.0, ans=0.0 2024-08-15 06:04:08,624 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3042950.0, ans=0.125 2024-08-15 06:04:10,047 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3042950.0, ans=0.0 2024-08-15 06:04:10,161 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3042950.0, ans=0.2 2024-08-15 06:04:29,648 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3043150.0, ans=0.125 2024-08-15 06:05:22,771 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 0, loss[loss=0.08305, beats_loss=0.01136, ecapa_loss=0.0001778, whisper_loss=0.0699, over 21672.00 frames. ], tot_loss[loss=0.08305, beats_loss=0.01136, ecapa_loss=0.0001778, whisper_loss=0.0699, over 21672.00 frames. ], batch size: 89, lr: 2.86e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 06:05:22,771 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-15 06:06:01,404 INFO [train_multi_KD3.py:1149] (1/4) Epoch 22, validation on ASR_libri: loss=0.2521, beats_loss=0, ecapa_loss=0.0005383, whisper_loss=0.2468, over 922467.00 frames. 2024-08-15 06:06:18,289 INFO [train_multi_KD3.py:1149] (1/4) Epoch 22, validation on SV_voxceleb1: loss=0.004241, beats_loss=0, ecapa_loss=0.0004241, whisper_loss=0, over 939242.00 frames. 2024-08-15 06:08:04,718 INFO [train_multi_KD3.py:1149] (1/4) Epoch 22, validation on AT_audioset: loss=0.02334, beats_loss=0.02334, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 06:08:04,721 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 30639MB 2024-08-15 06:08:21,446 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3043270.0, ans=0.2 2024-08-15 06:08:21,709 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.16 vs. limit=12.0 2024-08-15 06:08:49,432 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3043370.0, ans=0.1 2024-08-15 06:09:01,738 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 18 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-15 06:10:00,987 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.979e+01 2.592e+01 2.838e+01 3.156e+01 2.932e+02, threshold=5.677e+01, percent-clipped=2.0 2024-08-15 06:10:01,140 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 24 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-15 06:10:01,658 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3043670.0, ans=0.1 2024-08-15 06:10:05,686 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 50, loss[loss=0.1074, beats_loss=0.01298, ecapa_loss=0.0001327, whisper_loss=0.09313, over 17459.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.009848, ecapa_loss=0.0001584, whisper_loss=0.08947, over 892908.95 frames. ], batch size: 70, lr: 2.86e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 06:10:06,259 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3043770.0, ans=0.125 2024-08-15 06:10:52,340 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-15 06:11:22,622 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=4.283e-02 2024-08-15 06:11:24,383 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-15 06:11:24,629 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3044070.0, ans=0.09899494936611666 2024-08-15 06:11:26,868 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3044070.0, ans=0.1 2024-08-15 06:11:44,401 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3044170.0, ans=10.0 2024-08-15 06:11:57,264 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 100, loss[loss=0.09657, beats_loss=0.006588, ecapa_loss=0.0001621, whisper_loss=0.08836, over 15810.00 frames. ], tot_loss[loss=0.0998, beats_loss=0.009795, ecapa_loss=0.0001554, whisper_loss=0.08845, over 1537996.20 frames. ], batch size: 57, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:12:06,045 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 23 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-15 06:12:06,406 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.413e+01 2024-08-15 06:12:19,865 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3044370.0, ans=0.0 2024-08-15 06:12:36,117 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3044370.0, ans=0.125 2024-08-15 06:12:56,353 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3044470.0, ans=0.0 2024-08-15 06:12:58,467 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3044470.0, ans=0.1 2024-08-15 06:13:04,111 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3044470.0, ans=0.125 2024-08-15 06:13:17,376 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=3044570.0, ans=15.0 2024-08-15 06:13:30,002 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3044570.0, ans=0.125 2024-08-15 06:13:30,080 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3044570.0, ans=0.125 2024-08-15 06:13:34,700 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3044670.0, ans=0.125 2024-08-15 06:13:40,268 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 21 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-15 06:13:49,411 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.051e+01 2.691e+01 2.918e+01 3.263e+01 8.817e+01, threshold=5.837e+01, percent-clipped=1.0 2024-08-15 06:13:54,353 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 150, loss[loss=0.1186, beats_loss=0.009675, ecapa_loss=0.0001523, whisper_loss=0.1074, over 21920.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.009614, ecapa_loss=0.0001539, whisper_loss=0.09098, over 2048529.11 frames. ], batch size: 84, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:14:13,632 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 18 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-15 06:14:15,569 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3044870.0, ans=0.0 2024-08-15 06:14:17,478 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3044870.0, ans=0.125 2024-08-15 06:14:39,021 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 31 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-15 06:14:53,765 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3045070.0, ans=0.0 2024-08-15 06:14:59,959 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 25 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-15 06:15:02,430 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3045070.0, ans=0.2 2024-08-15 06:15:28,615 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 200, loss[loss=0.09306, beats_loss=0.01295, ecapa_loss=0.0001502, whisper_loss=0.0786, over 20493.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.009829, ecapa_loss=0.0001523, whisper_loss=0.09119, over 2415367.07 frames. ], batch size: 83, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:15:34,326 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff3.min_abs, batch_count=3045270.0, ans=0.2 2024-08-15 06:15:37,742 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=3045270.0, ans=0.025 2024-08-15 06:15:37,773 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3045270.0, ans=10.0 2024-08-15 06:15:45,492 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3045370.0, ans=0.1 2024-08-15 06:15:47,692 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.89 vs. limit=15.0 2024-08-15 06:15:52,034 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3045370.0, ans=0.2 2024-08-15 06:16:07,956 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3045470.0, ans=10.0 2024-08-15 06:16:13,916 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 17 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-15 06:16:23,194 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3045570.0, ans=0.125 2024-08-15 06:16:41,728 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-15 06:16:44,396 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.948e+01 2.328e+01 2.566e+01 2.862e+01 5.342e+01, threshold=5.131e+01, percent-clipped=0.0 2024-08-15 06:16:47,421 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 250, loss[loss=0.1221, beats_loss=0.01168, ecapa_loss=0.0001177, whisper_loss=0.1092, over 19902.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.009946, ecapa_loss=0.000154, whisper_loss=0.09095, over 2711020.98 frames. ], batch size: 77, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:16:53,978 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.84 vs. limit=15.0 2024-08-15 06:17:11,724 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3045870.0, ans=0.0 2024-08-15 06:17:32,862 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3046070.0, ans=0.125 2024-08-15 06:17:36,281 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3046070.0, ans=0.125 2024-08-15 06:17:49,334 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2.whitening_limit, batch_count=3046170.0, ans=15.0 2024-08-15 06:18:00,128 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.43 vs. limit=10.0 2024-08-15 06:18:00,210 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.03 vs. limit=12.0 2024-08-15 06:18:03,070 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3046170.0, ans=0.2 2024-08-15 06:18:05,557 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 300, loss[loss=0.08343, beats_loss=0.01233, ecapa_loss=0.0001523, whisper_loss=0.06958, over 22035.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01029, ecapa_loss=0.0001508, whisper_loss=0.08905, over 2963437.93 frames. ], batch size: 88, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:18:27,215 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 22 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-15 06:18:52,254 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3046570.0, ans=0.2 2024-08-15 06:18:53,365 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 15 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-15 06:18:53,691 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3046570.0, ans=0.125 2024-08-15 06:18:55,202 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3046570.0, ans=0.125 2024-08-15 06:19:12,218 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 25 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-15 06:19:16,107 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.74 vs. limit=15.0 2024-08-15 06:19:19,536 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.288e+01 2.599e+01 2.904e+01 1.999e+02, threshold=5.198e+01, percent-clipped=4.0 2024-08-15 06:19:20,198 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3046670.0, ans=0.0 2024-08-15 06:19:22,683 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 350, loss[loss=0.09774, beats_loss=0.01065, ecapa_loss=0.0001695, whisper_loss=0.08539, over 20708.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01037, ecapa_loss=0.0001515, whisper_loss=0.0891, over 3137784.88 frames. ], batch size: 83, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:19:38,276 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3046870.0, ans=0.125 2024-08-15 06:19:47,568 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 24 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-15 06:19:49,206 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 16 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-15 06:19:58,828 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3046970.0, ans=0.125 2024-08-15 06:20:01,871 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1.whitening_limit, batch_count=3046970.0, ans=10.0 2024-08-15 06:20:12,589 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3047070.0, ans=0.0 2024-08-15 06:20:29,170 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.46 vs. limit=15.0 2024-08-15 06:20:37,759 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-15 06:20:40,465 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 400, loss[loss=0.1197, beats_loss=0.0112, ecapa_loss=0.0001472, whisper_loss=0.1071, over 22135.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01053, ecapa_loss=0.0001497, whisper_loss=0.08874, over 3307341.05 frames. ], batch size: 88, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:20:48,329 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 17 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-15 06:20:59,898 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3047370.0, ans=0.04949747468305833 2024-08-15 06:20:59,988 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.25 vs. limit=22.5 2024-08-15 06:21:05,764 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3047370.0, ans=0.2 2024-08-15 06:21:22,662 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3047470.0, ans=0.125 2024-08-15 06:21:29,606 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.96 vs. limit=10.0 2024-08-15 06:21:45,886 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.45 vs. limit=22.5 2024-08-15 06:21:54,275 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.977e+01 2.307e+01 2.559e+01 2.888e+01 1.580e+02, threshold=5.118e+01, percent-clipped=5.0 2024-08-15 06:21:55,952 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 21 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-15 06:21:57,081 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 450, loss[loss=0.107, beats_loss=0.01154, ecapa_loss=0.0001335, whisper_loss=0.09417, over 17031.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01054, ecapa_loss=0.0001501, whisper_loss=0.08888, over 3417469.65 frames. ], batch size: 65, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:22:04,786 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.22 vs. limit=6.0 2024-08-15 06:22:16,266 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3047870.0, ans=0.1 2024-08-15 06:22:22,150 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3047870.0, ans=0.125 2024-08-15 06:22:53,182 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.99 vs. limit=15.0 2024-08-15 06:22:54,975 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3048070.0, ans=0.1 2024-08-15 06:23:07,376 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3048170.0, ans=0.125 2024-08-15 06:23:14,456 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 500, loss[loss=0.08886, beats_loss=0.01188, ecapa_loss=0.0001328, whisper_loss=0.07565, over 18451.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01053, ecapa_loss=0.0001501, whisper_loss=0.08839, over 3497236.40 frames. ], batch size: 73, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:23:16,628 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3048270.0, ans=0.2 2024-08-15 06:23:26,005 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 24 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-15 06:23:26,788 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=3048270.0, ans=15.0 2024-08-15 06:23:29,796 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3048370.0, ans=0.125 2024-08-15 06:24:01,833 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 25 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-15 06:24:18,586 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 33 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-15 06:24:28,573 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.741e+01 2.268e+01 2.600e+01 2.909e+01 8.676e+01, threshold=5.200e+01, percent-clipped=1.0 2024-08-15 06:24:31,579 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 550, loss[loss=0.1007, beats_loss=0.0116, ecapa_loss=0.0001197, whisper_loss=0.08785, over 14981.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.0105, ecapa_loss=0.0001482, whisper_loss=0.08908, over 3606376.00 frames. ], batch size: 57, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:24:36,672 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3048770.0, ans=0.125 2024-08-15 06:24:40,048 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.74 vs. limit=6.0 2024-08-15 06:24:51,969 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 19 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-15 06:24:57,012 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3048870.0, ans=0.0 2024-08-15 06:24:58,066 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-15 06:25:22,621 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3049070.0, ans=0.5 2024-08-15 06:25:34,885 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 22 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-15 06:25:48,292 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 600, loss[loss=0.09995, beats_loss=0.0102, ecapa_loss=0.0001693, whisper_loss=0.08806, over 21422.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01046, ecapa_loss=0.0001484, whisper_loss=0.08982, over 3667089.31 frames. ], batch size: 89, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:25:50,860 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3049270.0, ans=0.125 2024-08-15 06:25:55,840 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3049270.0, ans=0.1 2024-08-15 06:26:07,697 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 28 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-15 06:26:10,596 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 21 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-15 06:26:13,997 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3049370.0, ans=0.0 2024-08-15 06:26:24,342 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 25 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-15 06:26:27,964 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3049470.0, ans=0.0 2024-08-15 06:26:29,693 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3049470.0, ans=0.04949747468305833 2024-08-15 06:26:31,324 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3049470.0, ans=0.125 2024-08-15 06:26:42,274 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3049570.0, ans=0.125 2024-08-15 06:26:46,509 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 20 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-15 06:26:53,250 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3049670.0, ans=0.1 2024-08-15 06:26:55,205 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3049670.0, ans=0.0 2024-08-15 06:26:55,546 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.60 vs. limit=22.5 2024-08-15 06:26:57,219 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.79 vs. limit=22.5 2024-08-15 06:27:03,919 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.879e+01 2.381e+01 2.532e+01 2.729e+01 4.299e+01, threshold=5.065e+01, percent-clipped=0.0 2024-08-15 06:27:07,409 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 650, loss[loss=0.1193, beats_loss=0.01005, ecapa_loss=0.0001722, whisper_loss=0.1075, over 19655.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01052, ecapa_loss=0.000148, whisper_loss=0.08929, over 3717342.71 frames. ], batch size: 79, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:27:38,695 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3049970.0, ans=0.025 2024-08-15 06:28:00,548 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3050070.0, ans=0.125 2024-08-15 06:28:03,417 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3050070.0, ans=0.125 2024-08-15 06:28:23,812 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 700, loss[loss=0.09702, beats_loss=0.009942, ecapa_loss=0.0001797, whisper_loss=0.08528, over 19129.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0105, ecapa_loss=0.0001478, whisper_loss=0.09, over 3746825.25 frames. ], batch size: 78, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:28:43,865 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.48 vs. limit=15.0 2024-08-15 06:28:53,463 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.58 vs. limit=15.0 2024-08-15 06:28:56,711 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.59 vs. limit=15.0 2024-08-15 06:29:09,188 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3050470.0, ans=0.1 2024-08-15 06:29:10,347 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 16 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-15 06:29:34,291 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 14 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-15 06:29:37,209 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 16 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-15 06:29:38,408 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.809e+01 2.268e+01 2.485e+01 2.982e+01 6.162e+01, threshold=4.969e+01, percent-clipped=2.0 2024-08-15 06:29:41,982 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 750, loss[loss=0.1049, beats_loss=0.009955, ecapa_loss=0.0001574, whisper_loss=0.09333, over 16151.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0105, ecapa_loss=0.0001479, whisper_loss=0.08993, over 3749375.21 frames. ], batch size: 64, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:29:46,591 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 24 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-15 06:29:46,954 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3050770.0, ans=0.125 2024-08-15 06:29:57,787 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3050870.0, ans=0.0 2024-08-15 06:29:59,265 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3050870.0, ans=0.125 2024-08-15 06:30:30,000 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.28 vs. limit=12.0 2024-08-15 06:30:31,274 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3051070.0, ans=0.2 2024-08-15 06:30:35,656 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3051070.0, ans=0.1 2024-08-15 06:30:40,020 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3051070.0, ans=0.125 2024-08-15 06:30:40,211 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=3051070.0, ans=0.02 2024-08-15 06:30:43,529 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3051170.0, ans=0.0 2024-08-15 06:30:57,832 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 800, loss[loss=0.1001, beats_loss=0.01068, ecapa_loss=0.0001511, whisper_loss=0.08791, over 22332.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01047, ecapa_loss=0.0001486, whisper_loss=0.08977, over 3769748.88 frames. ], batch size: 88, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:31:24,133 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 33 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-15 06:31:28,729 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3051470.0, ans=0.125 2024-08-15 06:31:41,699 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3051470.0, ans=0.1 2024-08-15 06:31:59,981 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.937e-01 2024-08-15 06:32:00,041 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.19 vs. limit=15.0 2024-08-15 06:32:11,965 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.795e+01 2.305e+01 2.508e+01 2.943e+01 4.012e+02, threshold=5.016e+01, percent-clipped=1.0 2024-08-15 06:32:14,934 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 850, loss[loss=0.08998, beats_loss=0.01009, ecapa_loss=0.0001557, whisper_loss=0.07834, over 17075.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01042, ecapa_loss=0.0001499, whisper_loss=0.09032, over 3788927.95 frames. ], batch size: 68, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:32:15,687 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3051770.0, ans=0.025 2024-08-15 06:32:15,700 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3051770.0, ans=0.1 2024-08-15 06:32:42,057 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 31 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-15 06:32:45,621 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 15 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-15 06:32:45,945 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3051970.0, ans=0.0 2024-08-15 06:32:48,512 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 31 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-15 06:33:19,173 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 15 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-15 06:33:20,932 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3052170.0, ans=0.0 2024-08-15 06:33:21,051 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3052170.0, ans=0.025 2024-08-15 06:33:28,425 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3052170.0, ans=0.125 2024-08-15 06:33:28,538 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3052170.0, ans=0.0 2024-08-15 06:33:33,833 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 900, loss[loss=0.07253, beats_loss=0.01141, ecapa_loss=0.0001583, whisper_loss=0.05953, over 16369.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01055, ecapa_loss=0.0001488, whisper_loss=0.08938, over 3786233.75 frames. ], batch size: 67, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:33:35,833 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3052270.0, ans=0.125 2024-08-15 06:33:42,316 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3052270.0, ans=0.0 2024-08-15 06:33:48,718 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.694e+01 2024-08-15 06:33:54,838 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3052370.0, ans=0.125 2024-08-15 06:33:54,843 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3052370.0, ans=0.125 2024-08-15 06:33:57,271 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 24 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-15 06:33:58,328 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=21.52 vs. limit=22.5 2024-08-15 06:34:25,087 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3052570.0, ans=0.1 2024-08-15 06:34:42,833 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 25 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-15 06:34:47,332 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.883e+01 2.316e+01 2.568e+01 3.064e+01 1.106e+02, threshold=5.136e+01, percent-clipped=1.0 2024-08-15 06:34:50,216 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 950, loss[loss=0.08959, beats_loss=0.01184, ecapa_loss=0.0001442, whisper_loss=0.07632, over 18032.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01055, ecapa_loss=0.0001487, whisper_loss=0.08933, over 3798187.10 frames. ], batch size: 76, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:34:50,959 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3052770.0, ans=0.125 2024-08-15 06:35:02,342 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3052770.0, ans=0.1 2024-08-15 06:35:23,755 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3052970.0, ans=0.125 2024-08-15 06:35:25,747 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3052970.0, ans=0.2 2024-08-15 06:35:29,378 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3052970.0, ans=0.125 2024-08-15 06:35:39,989 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.48 vs. limit=22.5 2024-08-15 06:35:41,095 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 20 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-15 06:35:44,130 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 06:35:44,161 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3053070.0, ans=0.125 2024-08-15 06:35:45,405 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 20 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-15 06:36:00,012 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.87 vs. limit=22.5 2024-08-15 06:36:08,407 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 1000, loss[loss=0.1176, beats_loss=0.008246, ecapa_loss=0.0001818, whisper_loss=0.1075, over 16383.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01055, ecapa_loss=0.0001482, whisper_loss=0.08949, over 3784116.11 frames. ], batch size: 63, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:36:12,772 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3053270.0, ans=0.2 2024-08-15 06:36:12,789 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3053270.0, ans=0.1 2024-08-15 06:36:28,434 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 27 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-15 06:36:33,790 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0 2024-08-15 06:36:36,409 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3053370.0, ans=0.0 2024-08-15 06:36:51,104 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3053470.0, ans=0.0 2024-08-15 06:37:01,558 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 22 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-15 06:37:18,409 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 17 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-15 06:37:23,049 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.275e+01 2.548e+01 2.900e+01 4.496e+01, threshold=5.097e+01, percent-clipped=0.0 2024-08-15 06:37:25,273 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3053770.0, ans=0.125 2024-08-15 06:37:26,111 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 1050, loss[loss=0.1007, beats_loss=0.009901, ecapa_loss=0.0001643, whisper_loss=0.08919, over 16050.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0105, ecapa_loss=0.0001494, whisper_loss=0.08951, over 3777794.38 frames. ], batch size: 62, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:37:34,317 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3053770.0, ans=0.0 2024-08-15 06:37:39,544 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.30 vs. limit=15.0 2024-08-15 06:37:41,381 WARNING [optim.py:496] (1/4) Scaling gradients by 0.08964061737060547, model_norm_threshold=50.96524429321289 2024-08-15 06:37:41,566 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.34, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.115e+05, grad_sumsq=1.116e+07, orig_rms_sq=9.994e-03 2024-08-15 06:37:46,328 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3053870.0, ans=0.125 2024-08-15 06:38:06,697 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 22 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-15 06:38:10,091 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.73 vs. limit=12.0 2024-08-15 06:38:17,378 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 23 from LS+wenet, 10 from Vox, 30 fro AS 2024-08-15 06:38:25,612 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.99 vs. limit=15.0 2024-08-15 06:38:28,146 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 25 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-15 06:38:30,407 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3054170.0, ans=0.125 2024-08-15 06:38:31,592 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 17 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-15 06:38:43,793 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 1100, loss[loss=0.0863, beats_loss=0.01141, ecapa_loss=0.0001189, whisper_loss=0.0737, over 15619.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01046, ecapa_loss=0.0001495, whisper_loss=0.09052, over 3811376.58 frames. ], batch size: 58, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:38:46,103 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3054270.0, ans=0.1 2024-08-15 06:38:50,551 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3054270.0, ans=0.125 2024-08-15 06:38:50,754 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.04 vs. limit=15.0 2024-08-15 06:38:53,729 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3054270.0, ans=0.125 2024-08-15 06:39:39,777 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3054470.0, ans=0.125 2024-08-15 06:40:11,074 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3054570.0, ans=0.125 2024-08-15 06:40:32,059 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.929e+01 2.402e+01 2.652e+01 3.039e+01 5.686e+02, threshold=5.304e+01, percent-clipped=1.0 2024-08-15 06:40:32,245 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 22 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-15 06:40:34,950 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 1150, loss[loss=0.1036, beats_loss=0.01081, ecapa_loss=0.0001694, whisper_loss=0.09114, over 21397.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0105, ecapa_loss=0.0001492, whisper_loss=0.09048, over 3811907.92 frames. ], batch size: 88, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:40:40,497 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3054770.0, ans=0.0 2024-08-15 06:40:43,756 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.91 vs. limit=15.0 2024-08-15 06:40:54,602 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3054870.0, ans=0.0 2024-08-15 06:41:08,619 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 19 from LS+wenet, 13 from Vox, 40 fro AS 2024-08-15 06:41:09,773 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 25 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-15 06:41:17,257 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.75 vs. limit=15.0 2024-08-15 06:41:22,849 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-15 06:41:28,010 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 26 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-15 06:41:33,904 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 24 from LS+wenet, 23 from Vox, 48 fro AS 2024-08-15 06:41:38,275 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 26 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-15 06:41:47,972 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 20 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-15 06:41:50,383 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 29 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-15 06:41:50,710 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3055170.0, ans=0.2 2024-08-15 06:41:51,025 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.95 vs. limit=15.0 2024-08-15 06:41:56,787 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 27 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-15 06:42:03,245 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 1200, loss[loss=0.1179, beats_loss=0.009064, ecapa_loss=0.0001713, whisper_loss=0.1072, over 19134.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01055, ecapa_loss=0.0001481, whisper_loss=0.09058, over 3825919.20 frames. ], batch size: 76, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:42:18,393 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 18 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-15 06:42:29,102 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 28 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-15 06:42:50,122 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3055470.0, ans=0.125 2024-08-15 06:42:51,996 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 23 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-15 06:43:01,933 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=3055470.0, ans=0.1 2024-08-15 06:43:16,113 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3055570.0, ans=0.1 2024-08-15 06:43:18,243 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3055570.0, ans=0.0 2024-08-15 06:43:21,698 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3055570.0, ans=0.125 2024-08-15 06:43:41,895 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3055670.0, ans=0.125 2024-08-15 06:43:43,617 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.652e+01 2.260e+01 2.457e+01 2.910e+01 3.777e+01, threshold=4.914e+01, percent-clipped=0.0 2024-08-15 06:43:47,480 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3055770.0, ans=0.0 2024-08-15 06:43:48,953 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 1250, loss[loss=0.1019, beats_loss=0.00999, ecapa_loss=0.0001656, whisper_loss=0.09028, over 18488.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01053, ecapa_loss=0.0001471, whisper_loss=0.0914, over 3850418.94 frames. ], batch size: 74, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:44:02,594 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3055770.0, ans=0.0 2024-08-15 06:44:31,085 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 21 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-15 06:44:50,276 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3055970.0, ans=0.035 2024-08-15 06:45:37,605 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-15 06:45:51,207 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3056270.0, ans=0.125 2024-08-15 06:45:53,409 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 1300, loss[loss=0.09464, beats_loss=0.01002, ecapa_loss=0.0001518, whisper_loss=0.08311, over 22162.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01055, ecapa_loss=0.0001483, whisper_loss=0.09135, over 3871005.32 frames. ], batch size: 88, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:46:34,218 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3056370.0, ans=0.1 2024-08-15 06:46:56,959 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3056470.0, ans=0.125 2024-08-15 06:47:03,677 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 21 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-15 06:47:03,861 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3056470.0, ans=0.125 2024-08-15 06:47:39,905 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3056670.0, ans=0.1 2024-08-15 06:47:44,464 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.85 vs. limit=10.0 2024-08-15 06:47:51,353 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.719e+01 2.277e+01 2.465e+01 2.867e+01 3.912e+01, threshold=4.931e+01, percent-clipped=0.0 2024-08-15 06:47:53,704 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-15 06:47:55,974 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 1350, loss[loss=0.1054, beats_loss=0.01071, ecapa_loss=0.0001615, whisper_loss=0.09306, over 22571.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01058, ecapa_loss=0.0001472, whisper_loss=0.09066, over 3848709.47 frames. ], batch size: 93, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:47:56,830 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.49 vs. limit=15.0 2024-08-15 06:48:47,826 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 15 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-15 06:49:27,895 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 17 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-15 06:49:46,543 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 15 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-15 06:49:48,980 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 1400, loss[loss=0.1035, beats_loss=0.008787, ecapa_loss=0.0001765, whisper_loss=0.09298, over 19779.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01051, ecapa_loss=0.0001478, whisper_loss=0.09088, over 3833643.44 frames. ], batch size: 78, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:49:58,445 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 22 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-15 06:50:21,013 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3057370.0, ans=0.125 2024-08-15 06:50:32,403 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-15 06:50:36,312 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2024-08-15 06:51:09,377 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.85 vs. limit=15.0 2024-08-15 06:51:13,452 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.822e+01 2.207e+01 2.496e+01 2.856e+01 4.886e+01, threshold=4.993e+01, percent-clipped=0.0 2024-08-15 06:51:56,674 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 1450, loss[loss=0.09457, beats_loss=0.01055, ecapa_loss=0.0001465, whisper_loss=0.08256, over 18985.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01056, ecapa_loss=0.0001481, whisper_loss=0.08998, over 3828392.45 frames. ], batch size: 74, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:52:02,150 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 22 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-15 06:52:04,002 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 34 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-15 06:52:04,330 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3057770.0, ans=0.125 2024-08-15 06:52:09,470 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3057770.0, ans=0.1 2024-08-15 06:52:34,956 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3057970.0, ans=0.025 2024-08-15 06:52:47,458 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3057970.0, ans=0.125 2024-08-15 06:53:08,381 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 17 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-15 06:53:17,464 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3058170.0, ans=0.125 2024-08-15 06:53:28,636 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 1500, loss[loss=0.1161, beats_loss=0.009661, ecapa_loss=0.0001302, whisper_loss=0.1052, over 16422.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01063, ecapa_loss=0.000147, whisper_loss=0.08898, over 3820358.01 frames. ], batch size: 63, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:53:56,810 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3058370.0, ans=0.125 2024-08-15 06:54:13,766 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3058470.0, ans=0.125 2024-08-15 06:54:15,691 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 26 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-15 06:54:33,397 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-15 06:54:50,260 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3058670.0, ans=0.0 2024-08-15 06:54:58,681 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.699e+01 2.187e+01 2.410e+01 2.690e+01 4.725e+01, threshold=4.819e+01, percent-clipped=0.0 2024-08-15 06:55:01,129 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3058770.0, ans=0.1 2024-08-15 06:55:02,226 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 1550, loss[loss=0.09935, beats_loss=0.009766, ecapa_loss=0.0001441, whisper_loss=0.08815, over 19080.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01061, ecapa_loss=0.000146, whisper_loss=0.08945, over 3825460.60 frames. ], batch size: 77, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:55:15,697 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3058770.0, ans=0.125 2024-08-15 06:55:18,293 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.39 vs. limit=15.0 2024-08-15 06:55:30,304 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3058870.0, ans=0.2 2024-08-15 06:55:34,246 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.82 vs. limit=5.0 2024-08-15 06:55:49,833 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 27 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-15 06:56:03,191 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3059070.0, ans=0.125 2024-08-15 06:56:11,377 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3059070.0, ans=0.2 2024-08-15 06:56:26,020 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3059170.0, ans=0.0 2024-08-15 06:56:32,670 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 1600, loss[loss=0.0924, beats_loss=0.009549, ecapa_loss=0.0001538, whisper_loss=0.08132, over 18184.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0106, ecapa_loss=0.0001446, whisper_loss=0.08941, over 3843952.27 frames. ], batch size: 68, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:56:33,306 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3059270.0, ans=0.125 2024-08-15 06:56:38,395 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3059270.0, ans=0.1 2024-08-15 06:56:38,472 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3059270.0, ans=0.125 2024-08-15 06:56:50,262 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3059370.0, ans=0.07 2024-08-15 06:56:50,309 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3059370.0, ans=0.1 2024-08-15 06:56:53,386 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3059370.0, ans=0.07 2024-08-15 06:56:56,613 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 24 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-15 06:56:59,014 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3059370.0, ans=0.2 2024-08-15 06:57:21,678 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3059470.0, ans=0.125 2024-08-15 06:57:59,115 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.901e+01 2.263e+01 2.473e+01 2.776e+01 4.144e+01, threshold=4.945e+01, percent-clipped=0.0 2024-08-15 06:58:02,338 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 1650, loss[loss=0.1119, beats_loss=0.009284, ecapa_loss=0.0001667, whisper_loss=0.1009, over 22449.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01055, ecapa_loss=0.0001456, whisper_loss=0.09022, over 3864590.66 frames. ], batch size: 90, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:58:14,639 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.68 vs. limit=15.0 2024-08-15 06:58:36,990 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3059870.0, ans=0.1 2024-08-15 06:58:38,991 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3059970.0, ans=0.1 2024-08-15 06:59:02,764 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3060070.0, ans=0.0 2024-08-15 06:59:18,079 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.97 vs. limit=15.0 2024-08-15 06:59:20,884 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.89 vs. limit=10.0 2024-08-15 06:59:27,096 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.86 vs. limit=15.0 2024-08-15 06:59:28,026 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3060170.0, ans=0.0 2024-08-15 06:59:30,751 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 1700, loss[loss=0.0807, beats_loss=0.01209, ecapa_loss=0.0001146, whisper_loss=0.06747, over 16167.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01053, ecapa_loss=0.0001458, whisper_loss=0.09037, over 3876376.65 frames. ], batch size: 61, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:59:45,738 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.62 vs. limit=22.5 2024-08-15 06:59:47,717 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3060370.0, ans=0.0 2024-08-15 06:59:56,447 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3060370.0, ans=0.125 2024-08-15 06:59:59,017 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 17 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-15 07:00:00,331 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 15 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-15 07:00:03,775 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 24 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-15 07:00:04,122 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3060470.0, ans=0.125 2024-08-15 07:00:10,984 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.62 vs. limit=15.0 2024-08-15 07:00:14,202 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.22 vs. limit=15.0 2024-08-15 07:00:15,374 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3060470.0, ans=0.0 2024-08-15 07:00:15,461 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3060470.0, ans=0.2 2024-08-15 07:00:17,288 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3060470.0, ans=0.0 2024-08-15 07:00:43,482 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 21 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-15 07:00:51,274 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+01 2.353e+01 2.609e+01 2.862e+01 3.979e+01, threshold=5.218e+01, percent-clipped=0.0 2024-08-15 07:00:54,835 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 1750, loss[loss=0.106, beats_loss=0.01162, ecapa_loss=0.0001169, whisper_loss=0.09319, over 21369.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01058, ecapa_loss=0.0001458, whisper_loss=0.08915, over 3864016.68 frames. ], batch size: 82, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 07:00:56,660 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3060770.0, ans=0.2 2024-08-15 07:01:01,551 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3060770.0, ans=0.125 2024-08-15 07:01:19,970 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 28 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-15 07:01:24,750 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 26 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-15 07:01:26,012 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 17 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-15 07:01:31,273 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 27 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-15 07:01:32,505 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 20 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-15 07:01:49,208 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3061070.0, ans=0.2 2024-08-15 07:02:08,734 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 22 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-15 07:02:15,283 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 1800, loss[loss=0.108, beats_loss=0.01096, ecapa_loss=0.0001514, whisper_loss=0.09556, over 23199.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01055, ecapa_loss=0.0001465, whisper_loss=0.08913, over 3869439.04 frames. ], batch size: 92, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 07:02:46,802 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3061470.0, ans=0.1 2024-08-15 07:03:02,459 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.71 vs. limit=15.0 2024-08-15 07:03:06,306 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 21 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-15 07:03:24,742 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.87 vs. limit=12.0 2024-08-15 07:03:31,475 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.694e+01 2.281e+01 2.524e+01 2.715e+01 4.496e+01, threshold=5.048e+01, percent-clipped=0.0 2024-08-15 07:03:31,958 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3061670.0, ans=0.125 2024-08-15 07:03:33,933 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.32 vs. limit=15.0 2024-08-15 07:03:35,156 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 1850, loss[loss=0.1082, beats_loss=0.008536, ecapa_loss=0.000169, whisper_loss=0.098, over 21345.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01046, ecapa_loss=0.0001482, whisper_loss=0.08904, over 3839920.47 frames. ], batch size: 89, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 07:03:35,583 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 26 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-15 07:03:45,304 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.48 vs. limit=15.0 2024-08-15 07:03:53,297 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3061870.0, ans=0.2 2024-08-15 07:03:57,288 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3061870.0, ans=0.2 2024-08-15 07:03:59,611 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 22 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-15 07:04:06,605 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3061970.0, ans=0.1 2024-08-15 07:04:10,136 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 32 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-15 07:04:17,274 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.45 vs. limit=15.0 2024-08-15 07:04:19,424 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 16 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-15 07:04:32,970 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3062070.0, ans=0.125 2024-08-15 07:04:39,435 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3062170.0, ans=0.125 2024-08-15 07:04:39,933 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.29 vs. limit=12.0 2024-08-15 07:04:42,268 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3062170.0, ans=0.125 2024-08-15 07:04:53,528 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 23 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-15 07:04:55,908 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 1900, loss[loss=0.09844, beats_loss=0.01005, ecapa_loss=0.0001506, whisper_loss=0.08688, over 23309.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01045, ecapa_loss=0.0001478, whisper_loss=0.08923, over 3862179.10 frames. ], batch size: 94, lr: 2.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 07:04:56,963 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3062270.0, ans=0.0 2024-08-15 07:05:09,382 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-15 07:05:35,629 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3062470.0, ans=0.125 2024-08-15 07:05:39,210 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3062470.0, ans=0.0 2024-08-15 07:05:39,415 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.45 vs. limit=12.0 2024-08-15 07:05:41,798 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 18 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-15 07:06:00,718 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.72 vs. limit=15.0 2024-08-15 07:06:12,427 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.752e+01 2.324e+01 2.578e+01 2.950e+01 1.570e+02, threshold=5.156e+01, percent-clipped=1.0 2024-08-15 07:06:14,692 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 24 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-15 07:06:15,755 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 1950, loss[loss=0.09335, beats_loss=0.01159, ecapa_loss=0.000136, whisper_loss=0.08041, over 22451.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.0105, ecapa_loss=0.0001479, whisper_loss=0.08892, over 3850419.47 frames. ], batch size: 88, lr: 2.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 07:06:16,541 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3062770.0, ans=0.125 2024-08-15 07:06:30,523 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3062870.0, ans=0.2 2024-08-15 07:06:39,974 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3062870.0, ans=0.0 2024-08-15 07:06:40,087 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.77 vs. limit=12.0 2024-08-15 07:06:41,369 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3062870.0, ans=0.1 2024-08-15 07:07:01,174 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3063070.0, ans=0.125 2024-08-15 07:07:04,093 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3063070.0, ans=0.1 2024-08-15 07:07:16,117 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-15 07:07:25,893 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3063170.0, ans=0.125 2024-08-15 07:07:30,760 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3063170.0, ans=0.1 2024-08-15 07:07:35,136 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 2000, loss[loss=0.09408, beats_loss=0.01078, ecapa_loss=0.0001513, whisper_loss=0.08179, over 19470.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01055, ecapa_loss=0.0001477, whisper_loss=0.08842, over 3846359.76 frames. ], batch size: 81, lr: 2.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 07:07:39,091 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.75 vs. limit=10.0 2024-08-15 07:07:44,966 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3063270.0, ans=0.0 2024-08-15 07:07:48,159 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 24 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-15 07:07:56,364 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3063370.0, ans=0.125 2024-08-15 07:07:57,889 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3063370.0, ans=0.0 2024-08-15 07:07:57,929 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3063370.0, ans=0.125 2024-08-15 07:08:11,116 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3063470.0, ans=0.0 2024-08-15 07:08:18,288 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 17 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-15 07:08:18,558 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3063470.0, ans=0.125 2024-08-15 07:08:32,702 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3063570.0, ans=0.125 2024-08-15 07:08:37,906 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.65 vs. limit=15.0 2024-08-15 07:08:46,871 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3063670.0, ans=0.0 2024-08-15 07:08:49,968 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3063670.0, ans=0.2 2024-08-15 07:08:51,030 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.304e+01 2.556e+01 2.865e+01 6.565e+01, threshold=5.113e+01, percent-clipped=1.0 2024-08-15 07:08:54,452 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 2050, loss[loss=0.1337, beats_loss=0.006788, ecapa_loss=0.0001933, whisper_loss=0.125, over 16761.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01064, ecapa_loss=0.0001462, whisper_loss=0.08809, over 3823639.68 frames. ], batch size: 66, lr: 2.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 07:09:01,524 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3063770.0, ans=0.0 2024-08-15 07:09:06,565 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 24 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-15 07:09:08,961 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.71 vs. limit=22.5 2024-08-15 07:09:17,784 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3063870.0, ans=10.0 2024-08-15 07:09:22,645 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3063870.0, ans=0.125 2024-08-15 07:09:22,996 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.78 vs. limit=15.0 2024-08-15 07:09:38,053 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.71 vs. limit=15.0 2024-08-15 07:09:38,886 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3063970.0, ans=0.0 2024-08-15 07:09:41,587 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 33 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-15 07:09:42,146 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3064070.0, ans=0.125 2024-08-15 07:09:46,543 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 29 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-15 07:09:53,505 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3064070.0, ans=10.0 2024-08-15 07:09:59,722 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.00 vs. limit=12.0 2024-08-15 07:10:12,791 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 2100, loss[loss=0.1008, beats_loss=0.009161, ecapa_loss=0.0001601, whisper_loss=0.08999, over 20384.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01059, ecapa_loss=0.0001465, whisper_loss=0.08876, over 3806747.83 frames. ], batch size: 84, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:10:16,640 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 19 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-15 07:10:49,707 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3064470.0, ans=0.2 2024-08-15 07:10:55,736 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 21 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-15 07:10:56,089 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3064470.0, ans=0.025 2024-08-15 07:11:28,753 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.926e+01 2.342e+01 2.621e+01 2.964e+01 3.863e+02, threshold=5.241e+01, percent-clipped=3.0 2024-08-15 07:11:32,864 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 2150, loss[loss=0.1057, beats_loss=0.01165, ecapa_loss=0.0001073, whisper_loss=0.09299, over 20520.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01053, ecapa_loss=0.000147, whisper_loss=0.08996, over 3820647.69 frames. ], batch size: 77, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:11:33,122 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 28 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-15 07:11:40,451 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 19 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-15 07:11:43,456 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 29 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-15 07:11:53,379 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 9 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-15 07:11:54,165 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.93 vs. limit=22.5 2024-08-15 07:11:55,553 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3064870.0, ans=0.125 2024-08-15 07:11:57,125 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3064870.0, ans=0.125 2024-08-15 07:11:57,142 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3064870.0, ans=0.125 2024-08-15 07:12:00,103 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 30 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-15 07:12:03,606 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3064970.0, ans=0.1 2024-08-15 07:12:06,458 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 23 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-15 07:12:22,256 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3065070.0, ans=0.125 2024-08-15 07:12:25,099 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.13 vs. limit=22.5 2024-08-15 07:12:39,371 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 20 from LS+wenet, 28 from Vox, 45 fro AS 2024-08-15 07:12:48,149 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3065170.0, ans=0.125 2024-08-15 07:12:50,000 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3065170.0, ans=0.125 2024-08-15 07:12:50,321 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.73 vs. limit=15.0 2024-08-15 07:12:53,949 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 2200, loss[loss=0.09379, beats_loss=0.01182, ecapa_loss=0.0001555, whisper_loss=0.08041, over 19515.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01059, ecapa_loss=0.0001458, whisper_loss=0.08961, over 3812104.40 frames. ], batch size: 82, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:12:56,060 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3065270.0, ans=0.125 2024-08-15 07:12:59,257 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.02 vs. limit=6.0 2024-08-15 07:13:01,340 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 24 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-15 07:13:24,371 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3065470.0, ans=0.125 2024-08-15 07:13:50,400 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.51 vs. limit=15.0 2024-08-15 07:14:09,137 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.96 vs. limit=15.0 2024-08-15 07:14:09,516 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.737e+01 2.287e+01 2.531e+01 2.773e+01 4.088e+01, threshold=5.062e+01, percent-clipped=0.0 2024-08-15 07:14:12,891 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 2250, loss[loss=0.1004, beats_loss=0.009466, ecapa_loss=0.0001714, whisper_loss=0.08926, over 17519.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01061, ecapa_loss=0.0001467, whisper_loss=0.08993, over 3835294.61 frames. ], batch size: 71, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:14:21,178 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-15 07:14:30,137 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3065870.0, ans=0.125 2024-08-15 07:14:30,151 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3065870.0, ans=0.2 2024-08-15 07:14:39,426 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3065870.0, ans=0.1 2024-08-15 07:15:34,200 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 2300, loss[loss=0.09064, beats_loss=0.01108, ecapa_loss=0.000158, whisper_loss=0.07798, over 20318.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01071, ecapa_loss=0.0001459, whisper_loss=0.09069, over 3867305.60 frames. ], batch size: 85, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:15:45,944 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3066270.0, ans=0.125 2024-08-15 07:15:51,208 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3066370.0, ans=0.125 2024-08-15 07:15:57,116 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 31 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-15 07:16:08,422 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3066470.0, ans=0.125 2024-08-15 07:16:08,450 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3066470.0, ans=0.125 2024-08-15 07:16:30,560 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 21 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-15 07:16:48,371 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3066670.0, ans=0.125 2024-08-15 07:16:50,539 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.738e+01 2.352e+01 2.574e+01 2.900e+01 4.946e+01, threshold=5.147e+01, percent-clipped=0.0 2024-08-15 07:16:52,518 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 28 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-15 07:16:53,866 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 2350, loss[loss=0.1002, beats_loss=0.01128, ecapa_loss=0.0001247, whisper_loss=0.0877, over 24027.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01061, ecapa_loss=0.0001472, whisper_loss=0.09125, over 3857501.14 frames. ], batch size: 94, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:16:54,058 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 32 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-15 07:16:57,434 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 28 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-15 07:17:12,436 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3066870.0, ans=0.0 2024-08-15 07:17:17,045 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 23 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-15 07:17:45,204 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 29 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-15 07:17:54,715 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3067070.0, ans=0.2 2024-08-15 07:18:06,065 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.27 vs. limit=15.0 2024-08-15 07:18:07,245 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.76 vs. limit=22.5 2024-08-15 07:18:11,851 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3067270.0, ans=0.0 2024-08-15 07:18:12,590 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 2400, loss[loss=0.1007, beats_loss=0.01094, ecapa_loss=0.0001743, whisper_loss=0.08804, over 16901.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01058, ecapa_loss=0.0001486, whisper_loss=0.09156, over 3888442.47 frames. ], batch size: 69, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:18:46,844 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3067470.0, ans=0.1 2024-08-15 07:18:53,289 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3067470.0, ans=0.0 2024-08-15 07:18:58,016 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.33 vs. limit=10.0 2024-08-15 07:19:13,973 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.93 vs. limit=12.0 2024-08-15 07:19:26,463 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.781e+01 2.236e+01 2.451e+01 2.781e+01 1.373e+02, threshold=4.902e+01, percent-clipped=2.0 2024-08-15 07:19:29,695 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 2450, loss[loss=0.1111, beats_loss=0.0103, ecapa_loss=0.0001248, whisper_loss=0.09952, over 22865.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01061, ecapa_loss=0.0001478, whisper_loss=0.09109, over 3892570.12 frames. ], batch size: 90, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:19:29,974 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-15 07:19:43,576 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3067770.0, ans=0.2 2024-08-15 07:20:07,021 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 21 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-15 07:20:34,561 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 25 from LS+wenet, 10 from Vox, 36 fro AS 2024-08-15 07:20:48,718 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 2500, loss[loss=0.1273, beats_loss=0.007296, ecapa_loss=0.0002069, whisper_loss=0.118, over 17944.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01046, ecapa_loss=0.0001503, whisper_loss=0.09148, over 3875831.06 frames. ], batch size: 74, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:20:49,648 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.48 vs. limit=15.0 2024-08-15 07:20:56,051 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3068270.0, ans=10.0 2024-08-15 07:21:29,786 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 21 from LS+wenet, 18 from Vox, 52 fro AS 2024-08-15 07:21:52,270 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 23 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-15 07:21:54,110 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3068670.0, ans=0.0 2024-08-15 07:21:58,913 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.05 vs. limit=15.0 2024-08-15 07:22:03,702 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3068670.0, ans=0.125 2024-08-15 07:22:04,493 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.249e+01 2.498e+01 2.918e+01 4.518e+01, threshold=4.995e+01, percent-clipped=0.0 2024-08-15 07:22:07,189 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 2550, loss[loss=0.1197, beats_loss=0.009336, ecapa_loss=0.0001639, whisper_loss=0.1087, over 22505.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0105, ecapa_loss=0.0001494, whisper_loss=0.09145, over 3872851.67 frames. ], batch size: 92, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:22:30,819 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 23 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-15 07:22:37,110 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 24 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-15 07:22:45,546 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3068970.0, ans=0.025 2024-08-15 07:23:16,239 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 17 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-15 07:23:20,100 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.05 vs. limit=15.0 2024-08-15 07:23:25,419 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 2600, loss[loss=0.09879, beats_loss=0.01026, ecapa_loss=0.0001793, whisper_loss=0.08673, over 15954.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01058, ecapa_loss=0.0001498, whisper_loss=0.09076, over 3882125.63 frames. ], batch size: 66, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:23:55,882 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3069470.0, ans=0.2 2024-08-15 07:24:07,258 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3069470.0, ans=0.1 2024-08-15 07:24:10,468 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3069470.0, ans=0.1 2024-08-15 07:24:12,288 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.55 vs. limit=15.0 2024-08-15 07:24:30,126 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 18 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-15 07:24:35,223 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3069670.0, ans=0.125 2024-08-15 07:24:36,394 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3069670.0, ans=0.125 2024-08-15 07:24:40,331 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.949e+01 2.364e+01 2.551e+01 2.920e+01 2.244e+02, threshold=5.103e+01, percent-clipped=2.0 2024-08-15 07:24:43,304 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 2650, loss[loss=0.09326, beats_loss=0.01137, ecapa_loss=0.0001721, whisper_loss=0.08017, over 21670.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01058, ecapa_loss=0.0001495, whisper_loss=0.09028, over 3841180.71 frames. ], batch size: 92, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:26:02,132 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 2700, loss[loss=0.117, beats_loss=0.01003, ecapa_loss=0.0001536, whisper_loss=0.1054, over 20260.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01068, ecapa_loss=0.0001498, whisper_loss=0.08983, over 3865411.92 frames. ], batch size: 79, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:26:07,176 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-15 07:26:15,480 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 23 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-15 07:26:19,940 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 16 from LS+wenet, 29 from Vox, 25 fro AS 2024-08-15 07:26:20,521 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.30 vs. limit=12.0 2024-08-15 07:26:39,108 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 28 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-15 07:26:54,090 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.32 vs. limit=15.0 2024-08-15 07:26:55,098 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3070570.0, ans=0.0 2024-08-15 07:27:04,816 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3070670.0, ans=0.125 2024-08-15 07:27:17,844 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.813e+01 2.236e+01 2.522e+01 2.732e+01 2.329e+02, threshold=5.045e+01, percent-clipped=1.0 2024-08-15 07:27:21,388 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 2750, loss[loss=0.09503, beats_loss=0.008928, ecapa_loss=0.0001597, whisper_loss=0.08451, over 17857.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0107, ecapa_loss=0.000149, whisper_loss=0.09004, over 3853836.40 frames. ], batch size: 70, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:27:28,571 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.56 vs. limit=12.0 2024-08-15 07:27:54,087 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 22 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-15 07:27:55,498 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-15 07:28:00,673 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3070970.0, ans=0.125 2024-08-15 07:28:01,234 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.48 vs. limit=15.0 2024-08-15 07:28:02,818 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.44 vs. limit=15.0 2024-08-15 07:28:15,819 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 27 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-15 07:28:28,260 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3071170.0, ans=0.125 2024-08-15 07:28:28,353 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3071170.0, ans=0.05 2024-08-15 07:28:39,941 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 2800, loss[loss=0.1181, beats_loss=0.008539, ecapa_loss=0.0001704, whisper_loss=0.1079, over 22372.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01061, ecapa_loss=0.0001492, whisper_loss=0.09079, over 3883824.36 frames. ], batch size: 89, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:28:40,294 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 28 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-15 07:28:40,538 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3071270.0, ans=0.0 2024-08-15 07:29:06,471 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3071370.0, ans=0.0 2024-08-15 07:29:20,181 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3071470.0, ans=0.125 2024-08-15 07:29:22,000 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3071470.0, ans=0.125 2024-08-15 07:29:32,618 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3071570.0, ans=0.0 2024-08-15 07:29:40,735 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 36 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-15 07:29:41,027 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3071570.0, ans=0.125 2024-08-15 07:29:51,600 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3071670.0, ans=0.0 2024-08-15 07:29:57,128 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.358e+01 2.558e+01 2.902e+01 4.968e+01, threshold=5.116e+01, percent-clipped=0.0 2024-08-15 07:29:59,473 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3071770.0, ans=0.0 2024-08-15 07:30:00,190 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 2850, loss[loss=0.1072, beats_loss=0.01026, ecapa_loss=0.0001488, whisper_loss=0.09547, over 18819.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01059, ecapa_loss=0.0001503, whisper_loss=0.09075, over 3889052.96 frames. ], batch size: 74, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:30:26,980 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.22 vs. limit=15.0 2024-08-15 07:30:28,728 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3071870.0, ans=0.0 2024-08-15 07:30:42,432 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3071970.0, ans=0.125 2024-08-15 07:30:58,384 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 21 from LS+wenet, 29 from Vox, 43 fro AS 2024-08-15 07:31:18,337 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 2900, loss[loss=0.1039, beats_loss=0.01091, ecapa_loss=0.0001218, whisper_loss=0.09178, over 22759.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01059, ecapa_loss=0.0001511, whisper_loss=0.09058, over 3855343.52 frames. ], batch size: 86, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:31:26,758 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3072270.0, ans=0.2 2024-08-15 07:31:45,798 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3072370.0, ans=0.04949747468305833 2024-08-15 07:32:14,039 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.01 vs. limit=12.0 2024-08-15 07:32:15,135 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 25 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-15 07:32:32,613 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.806e+01 2.406e+01 2.599e+01 2.781e+01 4.544e+01, threshold=5.197e+01, percent-clipped=0.0 2024-08-15 07:32:35,983 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 2950, loss[loss=0.115, beats_loss=0.006923, ecapa_loss=0.0001506, whisper_loss=0.1065, over 21013.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01054, ecapa_loss=0.0001508, whisper_loss=0.0913, over 3883849.12 frames. ], batch size: 80, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:32:39,858 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3072770.0, ans=0.0 2024-08-15 07:32:42,567 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3072770.0, ans=0.0 2024-08-15 07:32:54,715 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3072870.0, ans=0.125 2024-08-15 07:32:56,362 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3072870.0, ans=0.1 2024-08-15 07:33:03,717 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.69 vs. limit=15.0 2024-08-15 07:33:26,578 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3073070.0, ans=0.125 2024-08-15 07:33:29,593 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 15 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-15 07:33:48,956 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 3000, loss[loss=0.07539, beats_loss=0.01255, ecapa_loss=0.000164, whisper_loss=0.0612, over 17197.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01056, ecapa_loss=0.0001506, whisper_loss=0.09108, over 3897479.76 frames. ], batch size: 73, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:33:48,957 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-15 07:34:30,938 INFO [train_multi_KD3.py:1149] (1/4) Epoch 22, validation on ASR_libri: loss=0.2522, beats_loss=0, ecapa_loss=0.0005255, whisper_loss=0.2469, over 922467.00 frames. 2024-08-15 07:34:46,440 INFO [train_multi_KD3.py:1149] (1/4) Epoch 22, validation on SV_voxceleb1: loss=0.004113, beats_loss=0, ecapa_loss=0.0004113, whisper_loss=0, over 939242.00 frames. 2024-08-15 07:36:48,571 INFO [train_multi_KD3.py:1149] (1/4) Epoch 22, validation on AT_audioset: loss=0.02334, beats_loss=0.02334, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 07:36:48,575 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 30639MB 2024-08-15 07:36:49,194 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3073270.0, ans=0.09899494936611666 2024-08-15 07:37:19,570 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 37 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-15 07:37:41,182 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 27 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-15 07:37:44,486 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3073570.0, ans=0.125 2024-08-15 07:37:47,334 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3073670.0, ans=0.125 2024-08-15 07:37:51,765 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.87 vs. limit=22.5 2024-08-15 07:37:59,411 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.79 vs. limit=15.0 2024-08-15 07:37:59,789 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.341e+01 2.586e+01 2.957e+01 4.096e+01, threshold=5.172e+01, percent-clipped=0.0 2024-08-15 07:38:02,523 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 3050, loss[loss=0.09277, beats_loss=0.01313, ecapa_loss=0.0001038, whisper_loss=0.0786, over 22447.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01054, ecapa_loss=0.0001508, whisper_loss=0.0916, over 3937788.25 frames. ], batch size: 88, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:38:23,369 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-15 07:38:24,795 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3073870.0, ans=0.125 2024-08-15 07:38:27,550 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-15 07:38:34,615 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 22 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-15 07:38:36,026 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 14 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-15 07:38:37,690 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3073970.0, ans=0.0 2024-08-15 07:38:41,885 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 17 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-15 07:38:42,353 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3073970.0, ans=0.125 2024-08-15 07:38:46,664 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3074070.0, ans=0.125 2024-08-15 07:38:58,437 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3074070.0, ans=0.125 2024-08-15 07:39:09,504 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 18 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-15 07:39:11,224 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 23 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-15 07:39:15,044 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 3100, loss[loss=0.1037, beats_loss=0.0111, ecapa_loss=0.0001679, whisper_loss=0.09094, over 22214.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01054, ecapa_loss=0.0001519, whisper_loss=0.09162, over 3923024.39 frames. ], batch size: 94, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:39:22,357 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3074270.0, ans=0.0 2024-08-15 07:39:26,792 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3074270.0, ans=0.125 2024-08-15 07:39:35,676 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3074370.0, ans=0.0 2024-08-15 07:39:38,770 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.22 vs. limit=15.0 2024-08-15 07:39:41,618 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.97 vs. limit=10.0 2024-08-15 07:39:45,486 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0 2024-08-15 07:39:49,558 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3074470.0, ans=0.0 2024-08-15 07:39:49,593 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3074470.0, ans=0.125 2024-08-15 07:39:58,514 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3074570.0, ans=0.0 2024-08-15 07:40:17,045 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3074670.0, ans=0.0 2024-08-15 07:40:18,422 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.222e+01 2024-08-15 07:40:23,592 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.721e+01 2.264e+01 2.554e+01 2.842e+01 4.812e+01, threshold=5.107e+01, percent-clipped=0.0 2024-08-15 07:40:26,731 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 3150, loss[loss=0.1101, beats_loss=0.009949, ecapa_loss=0.0001787, whisper_loss=0.09838, over 22249.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01057, ecapa_loss=0.0001526, whisper_loss=0.0915, over 3932019.46 frames. ], batch size: 90, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:40:56,261 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 20 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-15 07:41:13,402 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.44 vs. limit=15.0 2024-08-15 07:41:16,958 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 23 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-15 07:41:27,283 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3075170.0, ans=0.0 2024-08-15 07:41:33,484 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.55 vs. limit=15.0 2024-08-15 07:41:38,383 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 20 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-15 07:41:39,642 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 3200, loss[loss=0.1056, beats_loss=0.009918, ecapa_loss=0.0001615, whisper_loss=0.09402, over 15839.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01062, ecapa_loss=0.0001516, whisper_loss=0.09157, over 3896862.56 frames. ], batch size: 63, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:42:06,703 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3075370.0, ans=0.1 2024-08-15 07:42:25,095 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 32 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-15 07:42:48,106 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3075670.0, ans=0.0 2024-08-15 07:42:48,802 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.795e+01 2.323e+01 2.639e+01 2.854e+01 4.930e+01, threshold=5.277e+01, percent-clipped=0.0 2024-08-15 07:42:49,217 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3075670.0, ans=0.2 2024-08-15 07:42:51,934 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 3250, loss[loss=0.09343, beats_loss=0.01142, ecapa_loss=0.0001521, whisper_loss=0.0805, over 21481.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01063, ecapa_loss=0.0001521, whisper_loss=0.09146, over 3903589.48 frames. ], batch size: 87, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:42:55,041 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 35 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-15 07:43:00,709 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3075770.0, ans=0.0 2024-08-15 07:43:02,611 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=17.47 vs. limit=15.0 2024-08-15 07:43:19,663 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 23 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-15 07:43:23,908 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3075970.0, ans=0.125 2024-08-15 07:43:29,752 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3075970.0, ans=0.0 2024-08-15 07:43:32,330 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3076070.0, ans=0.0 2024-08-15 07:43:51,785 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 22 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-15 07:43:52,962 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 28 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-15 07:44:00,610 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 14 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-15 07:44:01,711 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 3300, loss[loss=0.08763, beats_loss=0.01281, ecapa_loss=0.0001288, whisper_loss=0.07354, over 13790.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01063, ecapa_loss=0.0001527, whisper_loss=0.0911, over 3887005.52 frames. ], batch size: 54, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:44:10,194 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.32 vs. limit=15.0 2024-08-15 07:44:20,400 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3076370.0, ans=0.125 2024-08-15 07:44:27,589 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3076370.0, ans=0.2 2024-08-15 07:44:31,990 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3076470.0, ans=0.0 2024-08-15 07:44:32,017 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3076470.0, ans=0.125 2024-08-15 07:44:42,033 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3076470.0, ans=0.125 2024-08-15 07:44:53,899 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.82 vs. limit=10.0 2024-08-15 07:44:56,192 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 27 from LS+wenet, 17 from Vox, 49 fro AS 2024-08-15 07:45:06,473 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.08 vs. limit=12.0 2024-08-15 07:45:11,272 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.736e+01 2.361e+01 2.617e+01 2.908e+01 9.847e+01, threshold=5.233e+01, percent-clipped=1.0 2024-08-15 07:45:13,919 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 3350, loss[loss=0.0972, beats_loss=0.01369, ecapa_loss=0.0001595, whisper_loss=0.08192, over 21895.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0107, ecapa_loss=0.0001511, whisper_loss=0.09064, over 3880596.24 frames. ], batch size: 94, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:45:37,488 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3076870.0, ans=0.0 2024-08-15 07:45:43,465 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.26 vs. limit=10.0 2024-08-15 07:45:44,027 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 27 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-15 07:46:04,959 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.56 vs. limit=15.0 2024-08-15 07:46:24,649 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 3400, loss[loss=0.07936, beats_loss=0.01112, ecapa_loss=0.0001515, whisper_loss=0.06673, over 15204.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01066, ecapa_loss=0.0001505, whisper_loss=0.09074, over 3893176.64 frames. ], batch size: 60, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:46:33,402 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 30 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-15 07:46:46,969 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 22 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-15 07:47:07,167 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.97 vs. limit=5.0 2024-08-15 07:47:07,679 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 23 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-15 07:47:13,839 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3077570.0, ans=0.125 2024-08-15 07:47:17,738 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 18 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-15 07:47:24,805 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 29 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-15 07:47:26,260 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-15 07:47:27,578 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 07:47:32,383 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.822e+01 2.325e+01 2.575e+01 2.904e+01 4.420e+01, threshold=5.150e+01, percent-clipped=0.0 2024-08-15 07:47:35,142 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 3450, loss[loss=0.1015, beats_loss=0.01301, ecapa_loss=0.0001166, whisper_loss=0.08733, over 23075.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01071, ecapa_loss=0.0001506, whisper_loss=0.09057, over 3918541.60 frames. ], batch size: 90, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:47:51,885 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3077870.0, ans=0.1 2024-08-15 07:48:13,277 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3077970.0, ans=0.125 2024-08-15 07:48:22,644 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.48 vs. limit=15.0 2024-08-15 07:48:25,330 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3078070.0, ans=0.125 2024-08-15 07:48:33,322 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3078170.0, ans=0.1 2024-08-15 07:48:49,216 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 3500, loss[loss=0.1223, beats_loss=0.005706, ecapa_loss=0.0001811, whisper_loss=0.1148, over 16849.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01058, ecapa_loss=0.0001511, whisper_loss=0.09112, over 3904439.87 frames. ], batch size: 63, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:49:01,317 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3078270.0, ans=0.1 2024-08-15 07:49:31,319 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.71 vs. limit=22.5 2024-08-15 07:49:34,073 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3078570.0, ans=0.0 2024-08-15 07:49:44,140 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3078570.0, ans=0.125 2024-08-15 07:49:55,084 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3078670.0, ans=0.125 2024-08-15 07:49:57,452 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.884e+01 2.337e+01 2.595e+01 2.865e+01 3.542e+01, threshold=5.191e+01, percent-clipped=0.0 2024-08-15 07:49:57,666 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 32 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-15 07:49:59,964 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 3550, loss[loss=0.08316, beats_loss=0.01387, ecapa_loss=0.0001284, whisper_loss=0.068, over 21962.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01064, ecapa_loss=0.0001509, whisper_loss=0.09002, over 3898234.97 frames. ], batch size: 88, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:50:07,167 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 25 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-15 07:50:21,059 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 27 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-15 07:50:39,570 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3078970.0, ans=0.125 2024-08-15 07:51:02,440 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3079170.0, ans=0.2 2024-08-15 07:51:12,187 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 3600, loss[loss=0.09058, beats_loss=0.01016, ecapa_loss=0.0001617, whisper_loss=0.0788, over 16416.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01059, ecapa_loss=0.0001515, whisper_loss=0.09059, over 3909438.46 frames. ], batch size: 67, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:51:12,783 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3079270.0, ans=0.125 2024-08-15 07:51:19,189 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3079270.0, ans=0.125 2024-08-15 07:51:37,252 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.40 vs. limit=10.0 2024-08-15 07:51:46,150 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3079470.0, ans=0.125 2024-08-15 07:51:48,453 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 35 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-15 07:51:57,861 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3079570.0, ans=0.0 2024-08-15 07:52:07,732 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 25 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-15 07:52:09,321 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3079670.0, ans=0.2 2024-08-15 07:52:09,578 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.69 vs. limit=6.0 2024-08-15 07:52:10,429 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3079670.0, ans=0.0 2024-08-15 07:52:10,486 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3079670.0, ans=0.0 2024-08-15 07:52:10,521 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3079670.0, ans=0.125 2024-08-15 07:52:21,264 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.76 vs. limit=6.0 2024-08-15 07:52:21,678 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.633e+01 2.253e+01 2.466e+01 2.769e+01 4.270e+01, threshold=4.932e+01, percent-clipped=0.0 2024-08-15 07:52:21,967 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-15 07:52:22,206 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3079670.0, ans=0.2 2024-08-15 07:52:24,722 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 3650, loss[loss=0.08533, beats_loss=0.0137, ecapa_loss=0.0001281, whisper_loss=0.07035, over 17146.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01071, ecapa_loss=0.000151, whisper_loss=0.09014, over 3897893.08 frames. ], batch size: 71, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:52:32,012 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3079770.0, ans=0.0 2024-08-15 07:52:40,361 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 23 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-15 07:52:43,524 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3079870.0, ans=0.125 2024-08-15 07:53:09,153 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 24 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-15 07:53:38,186 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 3700, loss[loss=0.1044, beats_loss=0.01238, ecapa_loss=0.0001495, whisper_loss=0.09052, over 20259.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01071, ecapa_loss=0.0001511, whisper_loss=0.09043, over 3885524.29 frames. ], batch size: 82, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:53:45,492 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3080270.0, ans=0.125 2024-08-15 07:53:51,504 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.23 vs. limit=15.0 2024-08-15 07:53:57,766 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 22 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-15 07:53:58,114 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3080370.0, ans=0.125 2024-08-15 07:53:59,364 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3080370.0, ans=0.1 2024-08-15 07:54:05,830 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 21 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-15 07:54:16,558 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.95 vs. limit=15.0 2024-08-15 07:54:20,420 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3080570.0, ans=0.125 2024-08-15 07:54:21,839 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3080570.0, ans=0.125 2024-08-15 07:54:32,817 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3080670.0, ans=0.125 2024-08-15 07:54:35,551 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3080670.0, ans=0.125 2024-08-15 07:54:39,692 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 23 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-15 07:54:45,096 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.763e+01 2.379e+01 2.603e+01 2.924e+01 1.234e+02, threshold=5.207e+01, percent-clipped=1.0 2024-08-15 07:54:46,755 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-15 07:54:47,835 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 3750, loss[loss=0.1035, beats_loss=0.01079, ecapa_loss=0.000144, whisper_loss=0.09127, over 22211.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01075, ecapa_loss=0.0001514, whisper_loss=0.08954, over 3872234.15 frames. ], batch size: 88, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:54:48,039 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 24 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-15 07:55:00,343 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 23 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-15 07:55:03,609 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.63 vs. limit=15.0 2024-08-15 07:55:10,049 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3080870.0, ans=0.125 2024-08-15 07:55:10,400 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.60 vs. limit=15.0 2024-08-15 07:55:38,003 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 20 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-15 07:55:48,157 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.48 vs. limit=6.0 2024-08-15 07:55:56,978 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 3800, loss[loss=0.09532, beats_loss=0.009812, ecapa_loss=0.0001909, whisper_loss=0.0836, over 21168.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01079, ecapa_loss=0.0001515, whisper_loss=0.08956, over 3912961.29 frames. ], batch size: 89, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:55:57,535 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3081270.0, ans=0.125 2024-08-15 07:56:16,759 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3081370.0, ans=0.2 2024-08-15 07:56:19,521 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3081370.0, ans=0.125 2024-08-15 07:56:21,979 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-15 07:56:22,915 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.28 vs. limit=15.0 2024-08-15 07:56:44,520 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.59 vs. limit=22.5 2024-08-15 07:56:47,096 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3081570.0, ans=0.1 2024-08-15 07:57:00,365 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3081670.0, ans=0.2 2024-08-15 07:57:02,313 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.893e+01 2.325e+01 2.607e+01 2.959e+01 1.127e+02, threshold=5.215e+01, percent-clipped=1.0 2024-08-15 07:57:05,429 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 3850, loss[loss=0.117, beats_loss=0.009328, ecapa_loss=0.0001487, whisper_loss=0.1062, over 22830.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01076, ecapa_loss=0.0001503, whisper_loss=0.0903, over 3915920.13 frames. ], batch size: 91, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:57:12,311 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 32 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-15 07:57:12,585 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3081770.0, ans=10.0 2024-08-15 07:57:14,140 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3081770.0, ans=0.125 2024-08-15 07:57:14,181 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3081770.0, ans=0.0 2024-08-15 07:57:25,908 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 18 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-15 07:57:27,680 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3081870.0, ans=0.125 2024-08-15 07:57:43,383 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.90 vs. limit=22.5 2024-08-15 07:57:44,041 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 32 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-15 07:57:54,146 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3082070.0, ans=0.125 2024-08-15 07:58:07,753 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3082170.0, ans=0.0 2024-08-15 07:58:14,029 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 3900, loss[loss=0.1063, beats_loss=0.009669, ecapa_loss=0.0001557, whisper_loss=0.0951, over 16567.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01068, ecapa_loss=0.0001522, whisper_loss=0.09061, over 3880237.97 frames. ], batch size: 66, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:58:23,730 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 17 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-15 07:58:37,524 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.64 vs. limit=22.5 2024-08-15 07:58:58,609 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 21 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-15 07:59:01,611 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-15 07:59:15,577 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3082670.0, ans=0.1 2024-08-15 07:59:19,089 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.886e+01 2.371e+01 2.557e+01 2.985e+01 4.331e+01, threshold=5.113e+01, percent-clipped=0.0 2024-08-15 07:59:22,057 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 3950, loss[loss=0.09066, beats_loss=0.0118, ecapa_loss=0.0001383, whisper_loss=0.07748, over 14153.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01061, ecapa_loss=0.000154, whisper_loss=0.09068, over 3879419.80 frames. ], batch size: 56, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:59:39,259 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3082870.0, ans=0.0 2024-08-15 07:59:45,079 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3082870.0, ans=0.125 2024-08-15 07:59:56,396 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3082970.0, ans=0.0 2024-08-15 07:59:58,980 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3082970.0, ans=0.125 2024-08-15 08:00:09,721 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3083070.0, ans=0.125 2024-08-15 08:00:19,840 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-15 08:00:21,260 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3083170.0, ans=0.0 2024-08-15 08:00:31,779 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 4000, loss[loss=0.08574, beats_loss=0.01284, ecapa_loss=0.0001314, whisper_loss=0.07158, over 21964.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01061, ecapa_loss=0.0001535, whisper_loss=0.09037, over 3856300.68 frames. ], batch size: 89, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:00:31,963 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 22 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-15 08:00:35,836 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.62 vs. limit=15.0 2024-08-15 08:00:40,917 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3083270.0, ans=0.125 2024-08-15 08:00:50,428 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3083370.0, ans=0.0 2024-08-15 08:00:51,524 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 13 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-15 08:01:18,739 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.609e-03 2024-08-15 08:01:20,144 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3083570.0, ans=0.125 2024-08-15 08:01:33,330 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.55 vs. limit=15.0 2024-08-15 08:01:34,595 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.44 vs. limit=15.0 2024-08-15 08:01:39,014 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.961e+01 2.375e+01 2.633e+01 2.839e+01 4.243e+01, threshold=5.267e+01, percent-clipped=0.0 2024-08-15 08:01:41,367 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3083770.0, ans=0.0 2024-08-15 08:01:42,093 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 4050, loss[loss=0.1015, beats_loss=0.009892, ecapa_loss=0.0001829, whisper_loss=0.08974, over 15367.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01062, ecapa_loss=0.000153, whisper_loss=0.09011, over 3847559.89 frames. ], batch size: 61, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:01:47,882 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3083770.0, ans=0.125 2024-08-15 08:01:48,396 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.36 vs. limit=15.0 2024-08-15 08:01:50,686 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3083770.0, ans=0.0 2024-08-15 08:01:53,358 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 33 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-15 08:02:04,881 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3083870.0, ans=0.125 2024-08-15 08:02:10,233 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3083970.0, ans=0.125 2024-08-15 08:02:14,500 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3083970.0, ans=0.0 2024-08-15 08:02:19,769 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3083970.0, ans=0.1 2024-08-15 08:02:50,776 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 4100, loss[loss=0.0885, beats_loss=0.009156, ecapa_loss=0.0002007, whisper_loss=0.07734, over 21063.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01067, ecapa_loss=0.0001519, whisper_loss=0.09035, over 3866723.93 frames. ], batch size: 90, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:02:53,779 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 27 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-15 08:03:09,071 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 27 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-15 08:03:12,556 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.93 vs. limit=22.5 2024-08-15 08:03:38,721 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.78 vs. limit=15.0 2024-08-15 08:03:42,223 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 20 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-15 08:03:55,410 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=3084670.0, ans=22.5 2024-08-15 08:03:56,612 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3084670.0, ans=0.1 2024-08-15 08:03:57,375 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.396e+01 2.573e+01 2.888e+01 2.852e+02, threshold=5.147e+01, percent-clipped=1.0 2024-08-15 08:04:00,156 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 4150, loss[loss=0.102, beats_loss=0.01274, ecapa_loss=0.0001187, whisper_loss=0.08807, over 19491.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01055, ecapa_loss=0.0001518, whisper_loss=0.0912, over 3868989.25 frames. ], batch size: 79, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:04:00,340 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-15 08:04:17,274 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 19 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-15 08:04:35,581 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3084970.0, ans=0.025 2024-08-15 08:04:39,229 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 32 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-15 08:04:50,904 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3085070.0, ans=0.025 2024-08-15 08:04:51,991 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3085070.0, ans=0.0 2024-08-15 08:04:52,281 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2024-08-15 08:05:09,060 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 4200, loss[loss=0.1135, beats_loss=0.0078, ecapa_loss=0.0001684, whisper_loss=0.1041, over 20462.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01065, ecapa_loss=0.0001519, whisper_loss=0.09044, over 3870462.41 frames. ], batch size: 81, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:05:15,424 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 08:05:16,423 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 28 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-15 08:05:18,013 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3085270.0, ans=0.0 2024-08-15 08:05:22,220 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.02 vs. limit=12.0 2024-08-15 08:05:28,908 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3085370.0, ans=0.125 2024-08-15 08:05:31,211 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 19 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-15 08:05:31,464 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3085370.0, ans=0.125 2024-08-15 08:05:59,958 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-15 08:06:01,337 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3085570.0, ans=0.125 2024-08-15 08:06:12,454 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 28 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-15 08:06:15,236 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.924e+01 2.224e+01 2.413e+01 2.831e+01 9.655e+01, threshold=4.827e+01, percent-clipped=1.0 2024-08-15 08:06:18,122 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 4250, loss[loss=0.08805, beats_loss=0.0131, ecapa_loss=0.0001449, whisper_loss=0.0735, over 22135.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01066, ecapa_loss=0.000152, whisper_loss=0.0902, over 3863075.13 frames. ], batch size: 94, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:06:29,688 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.24 vs. limit=15.0 2024-08-15 08:06:46,438 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 08:07:22,052 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3086170.0, ans=0.2 2024-08-15 08:07:28,338 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 4300, loss[loss=0.08938, beats_loss=0.01225, ecapa_loss=0.0001749, whisper_loss=0.07538, over 14885.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01063, ecapa_loss=0.0001523, whisper_loss=0.09059, over 3860262.41 frames. ], batch size: 61, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:07:31,505 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3086270.0, ans=0.0 2024-08-15 08:07:32,936 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 08:07:40,735 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 14 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-15 08:07:41,052 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3086370.0, ans=0.2 2024-08-15 08:08:10,838 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3086570.0, ans=0.2 2024-08-15 08:08:20,673 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3086570.0, ans=0.1 2024-08-15 08:08:22,134 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3086570.0, ans=0.1 2024-08-15 08:08:24,768 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3086670.0, ans=0.125 2024-08-15 08:08:33,135 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3086670.0, ans=0.0 2024-08-15 08:08:35,112 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.282e+01 2.452e+01 2.695e+01 5.506e+01, threshold=4.904e+01, percent-clipped=1.0 2024-08-15 08:08:38,012 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 4350, loss[loss=0.1209, beats_loss=0.009247, ecapa_loss=0.0001705, whisper_loss=0.11, over 22197.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0106, ecapa_loss=0.000153, whisper_loss=0.08985, over 3828080.17 frames. ], batch size: 89, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:08:42,653 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3086770.0, ans=0.5 2024-08-15 08:08:52,413 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3086870.0, ans=0.125 2024-08-15 08:08:56,714 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3086870.0, ans=0.1 2024-08-15 08:09:10,408 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3086970.0, ans=0.125 2024-08-15 08:09:12,921 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3086970.0, ans=0.0 2024-08-15 08:09:23,652 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 20 from LS+wenet, 9 from Vox, 28 fro AS 2024-08-15 08:09:30,939 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3087070.0, ans=0.125 2024-08-15 08:09:37,772 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3087170.0, ans=0.2 2024-08-15 08:09:41,663 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-15 08:09:43,420 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3087170.0, ans=0.2 2024-08-15 08:09:46,900 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 4400, loss[loss=0.1044, beats_loss=0.009407, ecapa_loss=0.0002173, whisper_loss=0.09283, over 18323.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01058, ecapa_loss=0.0001529, whisper_loss=0.09088, over 3854314.74 frames. ], batch size: 78, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:09:54,540 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3087270.0, ans=0.1 2024-08-15 08:10:05,838 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 24 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-15 08:10:22,301 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 14 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-15 08:10:29,239 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 23 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-15 08:10:37,938 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3087570.0, ans=0.125 2024-08-15 08:10:39,800 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3087570.0, ans=0.2 2024-08-15 08:10:42,574 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3087670.0, ans=0.07 2024-08-15 08:10:45,032 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3087670.0, ans=0.1 2024-08-15 08:10:46,330 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3087670.0, ans=0.0 2024-08-15 08:10:52,410 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.331e+01 2.564e+01 2.900e+01 4.263e+01, threshold=5.127e+01, percent-clipped=0.0 2024-08-15 08:10:55,210 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 4450, loss[loss=0.1049, beats_loss=0.01134, ecapa_loss=0.0001522, whisper_loss=0.092, over 22267.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01055, ecapa_loss=0.0001521, whisper_loss=0.09084, over 3880626.21 frames. ], batch size: 93, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:10:57,117 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3087770.0, ans=0.5 2024-08-15 08:10:58,587 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3087770.0, ans=0.0 2024-08-15 08:11:06,850 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3087770.0, ans=0.125 2024-08-15 08:11:19,078 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 25 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-15 08:11:37,658 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3088070.0, ans=0.125 2024-08-15 08:11:39,147 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3088070.0, ans=0.125 2024-08-15 08:11:43,149 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 25 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-15 08:11:51,505 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 27 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-15 08:11:55,889 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 25 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-15 08:12:05,041 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 4500, loss[loss=0.09742, beats_loss=0.009554, ecapa_loss=0.0001191, whisper_loss=0.08668, over 15379.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01058, ecapa_loss=0.0001523, whisper_loss=0.0902, over 3875137.73 frames. ], batch size: 58, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:12:16,493 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3088270.0, ans=0.125 2024-08-15 08:12:20,622 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 23 from LS+wenet, 31 from Vox, 40 fro AS 2024-08-15 08:12:40,403 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3088470.0, ans=0.125 2024-08-15 08:12:45,618 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 25 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-15 08:12:50,338 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.62 vs. limit=22.5 2024-08-15 08:12:54,123 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3088570.0, ans=0.07 2024-08-15 08:13:03,438 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 21 from LS+wenet, 14 from Vox, 46 fro AS 2024-08-15 08:13:11,200 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.772e+01 2.272e+01 2.536e+01 2.739e+01 4.209e+01, threshold=5.072e+01, percent-clipped=0.0 2024-08-15 08:13:13,968 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 4550, loss[loss=0.1158, beats_loss=0.01027, ecapa_loss=0.000137, whisper_loss=0.1041, over 22810.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01062, ecapa_loss=0.000153, whisper_loss=0.08951, over 3880234.42 frames. ], batch size: 91, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:13:21,524 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-15 08:13:28,496 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=25.76 vs. limit=22.5 2024-08-15 08:13:34,971 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3088870.0, ans=0.0 2024-08-15 08:13:38,285 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.19 vs. limit=15.0 2024-08-15 08:13:40,039 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 21 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-15 08:13:45,556 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 17 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-15 08:14:13,332 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 29 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-15 08:14:18,254 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.62 vs. limit=12.0 2024-08-15 08:14:23,252 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 4600, loss[loss=0.1079, beats_loss=0.01094, ecapa_loss=0.0001456, whisper_loss=0.09548, over 22055.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01068, ecapa_loss=0.000152, whisper_loss=0.08902, over 3853331.55 frames. ], batch size: 88, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:14:24,976 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3089270.0, ans=0.125 2024-08-15 08:14:30,864 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 26 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-15 08:14:31,126 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3089270.0, ans=0.2 2024-08-15 08:14:33,008 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.65 vs. limit=6.0 2024-08-15 08:14:41,653 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=3089370.0, ans=0.1 2024-08-15 08:14:44,655 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3089370.0, ans=0.2 2024-08-15 08:14:46,220 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3089370.0, ans=0.125 2024-08-15 08:14:56,871 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 28 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-15 08:15:09,796 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 20 from LS+wenet, 30 from Vox, 41 fro AS 2024-08-15 08:15:13,246 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3089570.0, ans=0.125 2024-08-15 08:15:15,793 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 25 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-15 08:15:26,448 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3089670.0, ans=0.125 2024-08-15 08:15:29,897 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3089670.0, ans=0.0 2024-08-15 08:15:33,662 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 23 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-15 08:15:34,014 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3089670.0, ans=0.125 2024-08-15 08:15:34,711 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.809e+01 2.352e+01 2.617e+01 2.995e+01 7.008e+01, threshold=5.233e+01, percent-clipped=1.0 2024-08-15 08:15:37,802 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 4650, loss[loss=0.116, beats_loss=0.01016, ecapa_loss=0.0001341, whisper_loss=0.1045, over 19646.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01066, ecapa_loss=0.0001512, whisper_loss=0.0897, over 3865936.18 frames. ], batch size: 75, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:15:38,640 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3089770.0, ans=0.95 2024-08-15 08:15:49,202 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3089770.0, ans=0.125 2024-08-15 08:16:03,832 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 18 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-15 08:16:18,975 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 20 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-15 08:16:20,645 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3089970.0, ans=0.125 2024-08-15 08:16:21,811 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 28 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-15 08:16:54,917 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 4700, loss[loss=0.06708, beats_loss=0.01156, ecapa_loss=0.0001114, whisper_loss=0.0544, over 21009.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01067, ecapa_loss=0.0001501, whisper_loss=0.09009, over 3876746.63 frames. ], batch size: 81, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:16:55,825 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3090270.0, ans=0.0 2024-08-15 08:16:57,111 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3090270.0, ans=0.2 2024-08-15 08:17:51,266 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3090570.0, ans=0.0 2024-08-15 08:18:07,806 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3090670.0, ans=0.125 2024-08-15 08:18:12,461 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.393e+01 2.597e+01 3.087e+01 7.330e+01, threshold=5.194e+01, percent-clipped=1.0 2024-08-15 08:18:15,799 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 4750, loss[loss=0.1148, beats_loss=0.01197, ecapa_loss=0.0001265, whisper_loss=0.1015, over 20811.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01072, ecapa_loss=0.0001505, whisper_loss=0.0902, over 3887313.73 frames. ], batch size: 80, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:18:16,260 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 22 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-15 08:18:28,147 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3090770.0, ans=0.0 2024-08-15 08:18:42,256 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 19 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-15 08:19:13,571 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.11 vs. limit=12.0 2024-08-15 08:19:14,712 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 20 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-15 08:19:32,936 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 4800, loss[loss=0.1058, beats_loss=0.009425, ecapa_loss=0.0001782, whisper_loss=0.09463, over 22090.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01073, ecapa_loss=0.0001518, whisper_loss=0.09021, over 3907764.39 frames. ], batch size: 91, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:19:47,028 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3091370.0, ans=0.125 2024-08-15 08:20:17,874 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 37 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-15 08:20:22,221 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 23 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-15 08:20:29,992 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 21 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-15 08:20:39,043 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3091670.0, ans=0.0 2024-08-15 08:20:40,764 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.27 vs. limit=15.0 2024-08-15 08:20:41,871 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3091670.0, ans=0.09899494936611666 2024-08-15 08:20:44,334 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.93 vs. limit=6.0 2024-08-15 08:20:50,970 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.905e+01 2.389e+01 2.640e+01 2.976e+01 3.395e+02, threshold=5.281e+01, percent-clipped=5.0 2024-08-15 08:20:51,603 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3091770.0, ans=0.125 2024-08-15 08:20:52,425 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 4850, loss[loss=0.1064, beats_loss=0.01094, ecapa_loss=0.0001742, whisper_loss=0.09375, over 21404.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01078, ecapa_loss=0.0001518, whisper_loss=0.08974, over 3937372.68 frames. ], batch size: 88, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:21:25,821 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3091970.0, ans=0.1 2024-08-15 08:21:28,729 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3091970.0, ans=0.125 2024-08-15 08:21:33,118 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 31 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-15 08:21:41,427 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3092070.0, ans=0.125 2024-08-15 08:21:45,332 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 28 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-15 08:21:59,725 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 33 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-15 08:22:04,296 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 33 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-15 08:22:12,357 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 4900, loss[loss=0.1095, beats_loss=0.009195, ecapa_loss=0.0001567, whisper_loss=0.09873, over 22132.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01074, ecapa_loss=0.0001508, whisper_loss=0.09037, over 3930734.88 frames. ], batch size: 88, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:22:48,672 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-15 08:23:04,478 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3092570.0, ans=0.1 2024-08-15 08:23:06,042 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3092570.0, ans=0.125 2024-08-15 08:23:12,382 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-15 08:23:12,602 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3092570.0, ans=0.0 2024-08-15 08:23:15,520 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 25 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-15 08:23:21,366 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3092670.0, ans=0.0 2024-08-15 08:23:28,417 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3092670.0, ans=0.125 2024-08-15 08:23:30,348 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.893e+01 2.221e+01 2.391e+01 2.667e+01 3.893e+01, threshold=4.783e+01, percent-clipped=0.0 2024-08-15 08:23:32,564 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 4950, loss[loss=0.108, beats_loss=0.008159, ecapa_loss=0.000146, whisper_loss=0.09841, over 21009.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0107, ecapa_loss=0.0001509, whisper_loss=0.09054, over 3932527.86 frames. ], batch size: 81, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:23:34,684 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.63 vs. limit=15.0 2024-08-15 08:23:39,003 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3092770.0, ans=0.07 2024-08-15 08:23:40,419 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 26 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-15 08:23:57,598 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3092870.0, ans=0.0 2024-08-15 08:24:17,121 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3093070.0, ans=0.1 2024-08-15 08:24:18,713 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3093070.0, ans=0.125 2024-08-15 08:24:18,787 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3093070.0, ans=0.0 2024-08-15 08:24:24,698 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-15 08:24:27,902 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3093070.0, ans=0.07 2024-08-15 08:24:29,247 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3093070.0, ans=0.0 2024-08-15 08:24:30,719 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3093070.0, ans=0.0 2024-08-15 08:24:33,482 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3093170.0, ans=0.1 2024-08-15 08:24:47,210 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3093270.0, ans=0.125 2024-08-15 08:24:48,033 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 5000, loss[loss=0.1336, beats_loss=0.009088, ecapa_loss=0.0001629, whisper_loss=0.1228, over 19842.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01066, ecapa_loss=0.0001514, whisper_loss=0.09129, over 3918271.17 frames. ], batch size: 79, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:24:50,314 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3093270.0, ans=0.125 2024-08-15 08:24:51,951 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3093270.0, ans=0.2 2024-08-15 08:24:55,360 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.85 vs. limit=15.0 2024-08-15 08:24:58,321 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-15 08:25:01,449 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 27 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-15 08:25:10,492 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3093370.0, ans=0.125 2024-08-15 08:25:11,655 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3093370.0, ans=0.125 2024-08-15 08:25:16,605 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3093370.0, ans=0.125 2024-08-15 08:25:27,589 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 30 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-15 08:25:41,519 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.30 vs. limit=15.0 2024-08-15 08:25:43,204 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.99 vs. limit=10.0 2024-08-15 08:26:03,527 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 23 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-15 08:26:07,467 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.695e+01 2.349e+01 2.649e+01 2.919e+01 1.466e+02, threshold=5.298e+01, percent-clipped=4.0 2024-08-15 08:26:09,247 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 5050, loss[loss=0.1032, beats_loss=0.01173, ecapa_loss=0.000154, whisper_loss=0.08989, over 21959.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01074, ecapa_loss=0.0001512, whisper_loss=0.09092, over 3931738.35 frames. ], batch size: 90, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:26:09,368 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 18 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-15 08:26:18,326 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3093770.0, ans=0.0 2024-08-15 08:26:50,726 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 34 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-15 08:26:52,259 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3093970.0, ans=0.125 2024-08-15 08:27:08,020 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3094070.0, ans=0.125 2024-08-15 08:27:25,502 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3094170.0, ans=0.1 2024-08-15 08:27:25,605 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=3.742e+00 2024-08-15 08:27:30,210 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 5100, loss[loss=0.1111, beats_loss=0.009672, ecapa_loss=0.0001406, whisper_loss=0.09998, over 22654.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01075, ecapa_loss=0.0001508, whisper_loss=0.09121, over 3934297.67 frames. ], batch size: 88, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:27:47,043 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3094370.0, ans=0.125 2024-08-15 08:27:50,182 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3094370.0, ans=0.1 2024-08-15 08:28:01,904 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.50 vs. limit=15.0 2024-08-15 08:28:03,802 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3094470.0, ans=0.125 2024-08-15 08:28:05,314 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.48 vs. limit=22.5 2024-08-15 08:28:21,717 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 17 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-15 08:28:50,037 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.866e+01 2.353e+01 2.564e+01 3.013e+01 4.662e+01, threshold=5.127e+01, percent-clipped=0.0 2024-08-15 08:28:50,453 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3094770.0, ans=0.125 2024-08-15 08:28:50,487 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3094770.0, ans=0.09899494936611666 2024-08-15 08:28:51,415 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 5150, loss[loss=0.09945, beats_loss=0.009067, ecapa_loss=0.000173, whisper_loss=0.08865, over 14781.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01073, ecapa_loss=0.0001495, whisper_loss=0.09145, over 3927911.13 frames. ], batch size: 63, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:29:08,417 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3094870.0, ans=0.125 2024-08-15 08:29:27,733 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 34 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-15 08:29:38,734 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3095070.0, ans=0.125 2024-08-15 08:29:42,471 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3095070.0, ans=0.125 2024-08-15 08:29:43,968 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 19 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-15 08:30:00,495 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 33 from Vox, 29 fro AS 2024-08-15 08:30:12,661 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 5200, loss[loss=0.0908, beats_loss=0.009032, ecapa_loss=0.0001769, whisper_loss=0.08, over 14914.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01068, ecapa_loss=0.0001485, whisper_loss=0.09184, over 3939247.19 frames. ], batch size: 58, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:30:16,016 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3095270.0, ans=0.1 2024-08-15 08:30:37,108 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.26 vs. limit=22.5 2024-08-15 08:30:44,112 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.34 vs. limit=22.5 2024-08-15 08:31:18,788 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 18 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-15 08:31:31,057 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.296e+01 2.558e+01 2.889e+01 4.447e+01, threshold=5.115e+01, percent-clipped=0.0 2024-08-15 08:31:32,978 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 5250, loss[loss=0.09268, beats_loss=0.014, ecapa_loss=0.0001198, whisper_loss=0.07748, over 19263.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01073, ecapa_loss=0.0001489, whisper_loss=0.09081, over 3899454.78 frames. ], batch size: 76, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:31:33,234 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-15 08:31:46,405 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3095770.0, ans=0.0 2024-08-15 08:31:47,945 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3095870.0, ans=0.125 2024-08-15 08:31:49,512 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3095870.0, ans=0.035 2024-08-15 08:31:51,149 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.74 vs. limit=10.0 2024-08-15 08:31:53,733 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3095870.0, ans=0.0 2024-08-15 08:31:55,307 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3095870.0, ans=0.125 2024-08-15 08:31:56,903 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 19 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-15 08:31:57,239 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3095870.0, ans=0.125 2024-08-15 08:32:01,815 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 21 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-15 08:32:48,722 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 21 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-15 08:32:51,562 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 5300, loss[loss=0.101, beats_loss=0.01292, ecapa_loss=0.0001365, whisper_loss=0.08675, over 23490.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01064, ecapa_loss=0.0001508, whisper_loss=0.09089, over 3872022.66 frames. ], batch size: 96, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:32:55,258 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.31 vs. limit=15.0 2024-08-15 08:33:00,152 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3096270.0, ans=0.2 2024-08-15 08:33:24,610 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3096470.0, ans=0.125 2024-08-15 08:33:31,743 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.57 vs. limit=6.0 2024-08-15 08:33:47,154 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 13 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-15 08:33:59,074 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3096670.0, ans=0.125 2024-08-15 08:34:08,758 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3096670.0, ans=0.0 2024-08-15 08:34:11,420 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.942e+01 2.309e+01 2.587e+01 2.813e+01 1.007e+02, threshold=5.174e+01, percent-clipped=2.0 2024-08-15 08:34:11,916 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3096770.0, ans=0.125 2024-08-15 08:34:12,749 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 5350, loss[loss=0.1124, beats_loss=0.01087, ecapa_loss=0.0001606, whisper_loss=0.09994, over 21576.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01067, ecapa_loss=0.0001495, whisper_loss=0.0905, over 3885610.53 frames. ], batch size: 86, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:34:15,391 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 22 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-15 08:34:42,980 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3096870.0, ans=0.125 2024-08-15 08:34:52,915 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 20 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-15 08:34:53,918 WARNING [optim.py:496] (1/4) Scaling gradients by 0.02987569198012352, model_norm_threshold=51.74189376831055 2024-08-15 08:34:54,105 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.43, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.299e+06, grad_sumsq=1.297e+08, orig_rms_sq=1.001e-02 2024-08-15 08:35:04,608 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3097070.0, ans=0.125 2024-08-15 08:35:18,648 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 34 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-15 08:35:27,661 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 21 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-15 08:35:34,272 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 5400, loss[loss=0.1094, beats_loss=0.008965, ecapa_loss=0.0001436, whisper_loss=0.09899, over 24441.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01064, ecapa_loss=0.0001492, whisper_loss=0.09042, over 3897380.60 frames. ], batch size: 96, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:35:48,926 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3097270.0, ans=0.07 2024-08-15 08:36:02,683 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3097370.0, ans=0.0 2024-08-15 08:36:21,426 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3097470.0, ans=0.0 2024-08-15 08:36:27,307 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3097470.0, ans=0.07 2024-08-15 08:36:44,862 WARNING [optim.py:496] (1/4) Scaling gradients by 0.07409150153398514, model_norm_threshold=51.74189376831055 2024-08-15 08:36:45,033 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.08, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.947e+04, grad_sumsq=3.947e+04, orig_rms_sq=1.000e+00 2024-08-15 08:36:45,298 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 24 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-15 08:36:54,548 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3097670.0, ans=0.0 2024-08-15 08:36:59,504 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.767e+01 2.348e+01 2.686e+01 2.991e+01 1.732e+03, threshold=5.372e+01, percent-clipped=3.0 2024-08-15 08:37:00,930 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 5450, loss[loss=0.09207, beats_loss=0.008763, ecapa_loss=0.0002237, whisper_loss=0.08107, over 20996.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01068, ecapa_loss=0.0001489, whisper_loss=0.08999, over 3887495.76 frames. ], batch size: 92, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:37:03,994 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-15 08:37:47,039 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 24 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-15 08:37:49,806 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3097970.0, ans=0.0 2024-08-15 08:37:50,959 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 24 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-15 08:37:54,196 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3097970.0, ans=0.125 2024-08-15 08:37:58,169 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3097970.0, ans=0.125 2024-08-15 08:38:43,644 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 5500, loss[loss=0.104, beats_loss=0.01136, ecapa_loss=0.0001325, whisper_loss=0.09131, over 19287.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01067, ecapa_loss=0.0001501, whisper_loss=0.09046, over 3881523.80 frames. ], batch size: 72, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:39:02,048 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 13 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-15 08:39:13,056 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3098370.0, ans=0.0 2024-08-15 08:40:09,213 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 34 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-15 08:40:11,553 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 16 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-15 08:40:19,646 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 22 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-15 08:40:22,131 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3098670.0, ans=0.125 2024-08-15 08:40:24,823 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.774e+01 2.188e+01 2.412e+01 2.681e+01 4.153e+01, threshold=4.824e+01, percent-clipped=0.0 2024-08-15 08:40:28,345 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 5550, loss[loss=0.08499, beats_loss=0.008129, ecapa_loss=0.0001789, whisper_loss=0.07507, over 17274.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01065, ecapa_loss=0.0001499, whisper_loss=0.09025, over 3865317.10 frames. ], batch size: 68, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:40:29,673 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3098770.0, ans=0.125 2024-08-15 08:40:48,482 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.41 vs. limit=15.0 2024-08-15 08:41:02,376 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 22 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-15 08:41:07,594 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 21 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-15 08:41:15,008 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=3098870.0, ans=6.0 2024-08-15 08:41:58,989 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 23 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-15 08:42:04,411 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 20 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-15 08:42:09,333 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3099170.0, ans=0.125 2024-08-15 08:42:26,336 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.52 vs. limit=6.0 2024-08-15 08:42:29,195 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 5600, loss[loss=0.1031, beats_loss=0.0107, ecapa_loss=0.0001436, whisper_loss=0.09101, over 20376.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01068, ecapa_loss=0.0001496, whisper_loss=0.09023, over 3895734.38 frames. ], batch size: 82, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:42:44,322 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3099270.0, ans=0.1 2024-08-15 08:42:46,246 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3099270.0, ans=0.1 2024-08-15 08:42:49,099 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 22 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-15 08:43:05,166 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.93 vs. limit=15.0 2024-08-15 08:43:18,432 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3099370.0, ans=0.0 2024-08-15 08:43:32,124 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 26 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-15 08:44:14,191 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.20 vs. limit=15.0 2024-08-15 08:44:26,093 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 34 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-15 08:44:35,209 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.681e+01 2.363e+01 2.562e+01 2.966e+01 4.681e+01, threshold=5.125e+01, percent-clipped=0.0 2024-08-15 08:44:36,797 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 5650, loss[loss=0.09989, beats_loss=0.0124, ecapa_loss=0.0001509, whisper_loss=0.08598, over 21839.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01067, ecapa_loss=0.000151, whisper_loss=0.09021, over 3938383.57 frames. ], batch size: 88, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:44:44,457 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3099770.0, ans=0.5 2024-08-15 08:44:54,312 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 22 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-15 08:45:03,077 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 08:45:22,069 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3099870.0, ans=0.125 2024-08-15 08:45:51,793 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3100070.0, ans=0.125 2024-08-15 08:46:03,625 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 23 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-15 08:46:19,938 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 16 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-15 08:46:23,612 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 5700, loss[loss=0.1025, beats_loss=0.01066, ecapa_loss=0.0001529, whisper_loss=0.09031, over 18576.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01064, ecapa_loss=0.0001519, whisper_loss=0.09065, over 3925967.97 frames. ], batch size: 76, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:46:31,549 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3100270.0, ans=0.125 2024-08-15 08:46:32,998 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3100270.0, ans=0.1 2024-08-15 08:46:35,558 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-15 08:46:38,989 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3100370.0, ans=0.125 2024-08-15 08:47:03,136 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.17 vs. limit=15.0 2024-08-15 08:47:05,219 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=3100470.0, ans=15.0 2024-08-15 08:47:14,621 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3100570.0, ans=0.0 2024-08-15 08:47:20,186 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-15 08:47:21,999 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3100570.0, ans=0.0 2024-08-15 08:47:29,467 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.02 vs. limit=15.0 2024-08-15 08:47:33,646 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 25 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-15 08:47:42,523 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3100670.0, ans=0.0 2024-08-15 08:47:43,317 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.415e+01 2.624e+01 2.975e+01 2.275e+02, threshold=5.249e+01, percent-clipped=3.0 2024-08-15 08:47:44,784 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 5750, loss[loss=0.07609, beats_loss=0.01207, ecapa_loss=0.0001345, whisper_loss=0.06267, over 18865.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01069, ecapa_loss=0.0001512, whisper_loss=0.09056, over 3901140.88 frames. ], batch size: 76, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:47:57,478 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3100770.0, ans=0.2 2024-08-15 08:48:03,158 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3100870.0, ans=0.1 2024-08-15 08:48:26,221 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 27 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-15 08:48:38,616 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 25 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-15 08:48:49,190 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3101170.0, ans=0.1 2024-08-15 08:49:01,477 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 33 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-15 08:49:05,615 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 5800, loss[loss=0.112, beats_loss=0.01002, ecapa_loss=0.0001677, whisper_loss=0.1003, over 19416.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01072, ecapa_loss=0.0001503, whisper_loss=0.09013, over 3905373.87 frames. ], batch size: 80, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:49:25,688 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3101370.0, ans=0.125 2024-08-15 08:49:28,106 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 18 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-15 08:49:50,658 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 22 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-15 08:50:03,775 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3101570.0, ans=0.125 2024-08-15 08:50:03,847 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3101570.0, ans=0.0 2024-08-15 08:50:23,930 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.830e+01 2.392e+01 2.742e+01 3.107e+01 2.079e+02, threshold=5.485e+01, percent-clipped=4.0 2024-08-15 08:50:24,522 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3101770.0, ans=0.125 2024-08-15 08:50:25,289 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 5850, loss[loss=0.09715, beats_loss=0.01199, ecapa_loss=0.0001743, whisper_loss=0.08342, over 15927.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01069, ecapa_loss=0.0001512, whisper_loss=0.09082, over 3894478.59 frames. ], batch size: 67, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:50:37,752 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3101770.0, ans=0.2 2024-08-15 08:50:51,990 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3101870.0, ans=0.0 2024-08-15 08:50:52,422 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.63 vs. limit=10.0 2024-08-15 08:50:55,656 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3101870.0, ans=0.0 2024-08-15 08:51:03,289 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3101970.0, ans=0.125 2024-08-15 08:51:09,509 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 24 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-15 08:51:12,838 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 23 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-15 08:51:17,365 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 27 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-15 08:51:27,940 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 17 from LS+wenet, 11 from Vox, 47 fro AS 2024-08-15 08:51:34,759 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3102170.0, ans=0.04949747468305833 2024-08-15 08:51:39,179 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 19 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-15 08:51:44,944 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 5900, loss[loss=0.09545, beats_loss=0.01078, ecapa_loss=0.000161, whisper_loss=0.08306, over 20624.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0107, ecapa_loss=0.0001517, whisper_loss=0.09027, over 3891374.67 frames. ], batch size: 82, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:51:59,832 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3102370.0, ans=0.125 2024-08-15 08:52:01,071 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3102370.0, ans=0.1 2024-08-15 08:52:19,209 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 25 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-15 08:52:22,360 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3102470.0, ans=0.0 2024-08-15 08:52:32,349 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3102570.0, ans=0.1 2024-08-15 08:52:46,832 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3102670.0, ans=0.1 2024-08-15 08:52:50,273 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 26 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-15 08:52:59,506 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.266e+01 2.479e+01 2.808e+01 3.444e+02, threshold=4.958e+01, percent-clipped=1.0 2024-08-15 08:53:01,399 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 5950, loss[loss=0.09981, beats_loss=0.0113, ecapa_loss=0.0001248, whisper_loss=0.08727, over 21904.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0108, ecapa_loss=0.000151, whisper_loss=0.08995, over 3897412.24 frames. ], batch size: 84, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:53:03,610 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3102770.0, ans=0.125 2024-08-15 08:53:05,194 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-15 08:53:36,486 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 23 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-15 08:53:43,426 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3102970.0, ans=0.2 2024-08-15 08:53:49,024 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 22 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-15 08:53:51,098 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3103070.0, ans=0.125 2024-08-15 08:53:56,989 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 18 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-15 08:53:58,736 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3103070.0, ans=0.125 2024-08-15 08:54:15,277 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.24 vs. limit=10.0 2024-08-15 08:54:18,664 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 6000, loss[loss=0.1175, beats_loss=0.007902, ecapa_loss=0.0001954, whisper_loss=0.1076, over 18535.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01077, ecapa_loss=0.0001499, whisper_loss=0.09061, over 3906088.33 frames. ], batch size: 76, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:54:18,664 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-15 08:54:59,182 INFO [train_multi_KD3.py:1149] (1/4) Epoch 22, validation on ASR_libri: loss=0.2524, beats_loss=0, ecapa_loss=0.0005326, whisper_loss=0.2471, over 922467.00 frames. 2024-08-15 08:55:14,694 INFO [train_multi_KD3.py:1149] (1/4) Epoch 22, validation on SV_voxceleb1: loss=0.004204, beats_loss=0, ecapa_loss=0.0004204, whisper_loss=0, over 939242.00 frames. 2024-08-15 08:57:13,978 INFO [train_multi_KD3.py:1149] (1/4) Epoch 22, validation on AT_audioset: loss=0.02337, beats_loss=0.02337, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 08:57:13,982 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 30639MB 2024-08-15 08:57:24,847 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3103270.0, ans=0.2 2024-08-15 08:57:24,863 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3103270.0, ans=0.125 2024-08-15 08:57:40,073 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3103370.0, ans=0.025 2024-08-15 08:57:52,661 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3103470.0, ans=0.125 2024-08-15 08:58:00,186 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3103570.0, ans=0.125 2024-08-15 08:58:13,608 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 41 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-15 08:58:16,791 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.53 vs. limit=15.0 2024-08-15 08:58:19,693 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3103670.0, ans=0.1 2024-08-15 08:58:28,671 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.350e+01 2.585e+01 2.886e+01 6.077e+01, threshold=5.169e+01, percent-clipped=1.0 2024-08-15 08:58:30,789 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 6050, loss[loss=0.1076, beats_loss=0.009669, ecapa_loss=0.0002051, whisper_loss=0.09588, over 20341.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01078, ecapa_loss=0.0001486, whisper_loss=0.09082, over 3926080.34 frames. ], batch size: 88, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:58:32,686 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3103770.0, ans=0.125 2024-08-15 08:58:38,669 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3103770.0, ans=0.0 2024-08-15 08:58:48,009 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3103870.0, ans=0.125 2024-08-15 08:59:03,033 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3103970.0, ans=0.125 2024-08-15 08:59:06,997 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 20 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-15 08:59:15,921 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3104070.0, ans=0.07 2024-08-15 08:59:24,957 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=3104070.0, ans=0.05 2024-08-15 08:59:37,317 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.09 vs. limit=15.0 2024-08-15 08:59:39,840 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-15 08:59:42,288 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.24 vs. limit=15.0 2024-08-15 08:59:45,333 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 6100, loss[loss=0.1128, beats_loss=0.012, ecapa_loss=0.0001488, whisper_loss=0.0993, over 23161.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01079, ecapa_loss=0.0001491, whisper_loss=0.09039, over 3889780.83 frames. ], batch size: 94, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:59:55,336 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3104270.0, ans=0.025 2024-08-15 08:59:59,129 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 24 from LS+wenet, 18 from Vox, 17 fro AS 2024-08-15 09:00:09,513 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 26 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-15 09:00:12,994 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3104370.0, ans=0.125 2024-08-15 09:00:22,698 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3104470.0, ans=0.125 2024-08-15 09:00:27,143 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=15.09 vs. limit=15.0 2024-08-15 09:00:45,562 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 30 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-15 09:00:57,405 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.721e+01 2.233e+01 2.517e+01 2.744e+01 4.126e+01, threshold=5.033e+01, percent-clipped=0.0 2024-08-15 09:00:58,739 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 6150, loss[loss=0.0922, beats_loss=0.01215, ecapa_loss=0.0001457, whisper_loss=0.0786, over 21563.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01074, ecapa_loss=0.0001498, whisper_loss=0.09085, over 3911845.37 frames. ], batch size: 89, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:01:03,256 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.61 vs. limit=15.0 2024-08-15 09:01:10,321 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 20 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-15 09:01:14,752 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3104870.0, ans=0.125 2024-08-15 09:01:27,158 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.62 vs. limit=15.0 2024-08-15 09:01:32,433 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.060e+05 2024-08-15 09:01:35,689 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3104970.0, ans=0.125 2024-08-15 09:01:57,148 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 22 from LS+wenet, 18 from Vox, 53 fro AS 2024-08-15 09:02:00,515 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3105170.0, ans=0.0 2024-08-15 09:02:03,556 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3105170.0, ans=0.125 2024-08-15 09:02:11,826 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 36 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-15 09:02:13,079 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 6200, loss[loss=0.1198, beats_loss=0.008316, ecapa_loss=0.0001498, whisper_loss=0.1099, over 23074.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01067, ecapa_loss=0.000149, whisper_loss=0.09115, over 3919936.96 frames. ], batch size: 90, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:02:14,796 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3105270.0, ans=0.125 2024-08-15 09:02:23,581 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 22 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-15 09:02:37,631 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-15 09:02:56,098 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3105570.0, ans=0.0 2024-08-15 09:03:04,288 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 30 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-15 09:03:26,003 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.684e+01 2.264e+01 2.437e+01 2.763e+01 4.898e+01, threshold=4.875e+01, percent-clipped=0.0 2024-08-15 09:03:28,389 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 6250, loss[loss=0.1134, beats_loss=0.01147, ecapa_loss=0.0001423, whisper_loss=0.1006, over 20767.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01059, ecapa_loss=0.0001491, whisper_loss=0.09166, over 3924995.06 frames. ], batch size: 81, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:03:42,434 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.47 vs. limit=15.0 2024-08-15 09:03:49,576 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 21 from LS+wenet, 29 from Vox, 44 fro AS 2024-08-15 09:04:00,575 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=3105970.0, ans=0.05 2024-08-15 09:04:09,467 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3105970.0, ans=0.125 2024-08-15 09:04:45,447 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 6300, loss[loss=0.09957, beats_loss=0.01232, ecapa_loss=0.0001437, whisper_loss=0.08582, over 22877.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01065, ecapa_loss=0.0001494, whisper_loss=0.09133, over 3930154.69 frames. ], batch size: 92, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:05:10,129 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=3106370.0, ans=15.0 2024-08-15 09:05:37,640 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3106570.0, ans=0.0 2024-08-15 09:05:55,986 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3106670.0, ans=0.0 2024-08-15 09:06:02,712 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3106670.0, ans=0.125 2024-08-15 09:06:03,121 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.14 vs. limit=6.0 2024-08-15 09:06:07,831 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.954e+01 2.366e+01 2.627e+01 2.982e+01 5.649e+01, threshold=5.254e+01, percent-clipped=1.0 2024-08-15 09:06:09,648 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 6350, loss[loss=0.111, beats_loss=0.009577, ecapa_loss=0.0001702, whisper_loss=0.09975, over 15998.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01062, ecapa_loss=0.0001504, whisper_loss=0.09122, over 3923045.93 frames. ], batch size: 64, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:06:12,422 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3106770.0, ans=0.1 2024-08-15 09:06:18,738 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3106770.0, ans=0.1 2024-08-15 09:06:27,606 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3106870.0, ans=0.125 2024-08-15 09:06:34,032 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 27 from LS+wenet, 32 from Vox, 35 fro AS 2024-08-15 09:06:35,198 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.60 vs. limit=15.0 2024-08-15 09:06:51,898 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.99 vs. limit=15.0 2024-08-15 09:06:56,004 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3107070.0, ans=0.125 2024-08-15 09:06:59,954 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3107070.0, ans=0.1 2024-08-15 09:07:04,643 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3107070.0, ans=0.2 2024-08-15 09:07:07,239 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-15 09:07:18,967 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3107170.0, ans=0.0 2024-08-15 09:07:30,271 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 6400, loss[loss=0.1127, beats_loss=0.01001, ecapa_loss=0.0001594, whisper_loss=0.1011, over 22040.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01063, ecapa_loss=0.000151, whisper_loss=0.09115, over 3927289.89 frames. ], batch size: 87, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:07:30,444 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 26 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-15 09:07:54,609 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3107370.0, ans=0.1 2024-08-15 09:07:56,406 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3107370.0, ans=0.125 2024-08-15 09:08:03,326 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.83 vs. limit=10.0 2024-08-15 09:08:27,057 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3107570.0, ans=0.125 2024-08-15 09:08:35,162 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.46 vs. limit=6.0 2024-08-15 09:08:35,793 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-15 09:08:52,623 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.707e+01 2.320e+01 2.533e+01 2.838e+01 5.335e+01, threshold=5.066e+01, percent-clipped=1.0 2024-08-15 09:08:54,039 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 6450, loss[loss=0.1022, beats_loss=0.01066, ecapa_loss=0.0001596, whisper_loss=0.0899, over 18701.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01069, ecapa_loss=0.0001518, whisper_loss=0.09069, over 3930325.38 frames. ], batch size: 75, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:08:56,248 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3107770.0, ans=0.125 2024-08-15 09:09:11,549 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3107870.0, ans=0.2 2024-08-15 09:09:21,443 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3107870.0, ans=0.2 2024-08-15 09:09:22,721 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3107870.0, ans=0.0 2024-08-15 09:09:27,735 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3107970.0, ans=0.125 2024-08-15 09:09:29,721 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-15 09:09:35,078 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 35 from Vox, 29 fro AS 2024-08-15 09:09:48,397 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3108070.0, ans=0.125 2024-08-15 09:10:00,647 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3108170.0, ans=0.0 2024-08-15 09:10:02,475 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3108170.0, ans=0.0 2024-08-15 09:10:16,111 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 6500, loss[loss=0.1293, beats_loss=0.007893, ecapa_loss=0.0001547, whisper_loss=0.1199, over 23072.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01062, ecapa_loss=0.000152, whisper_loss=0.09133, over 3928789.32 frames. ], batch size: 89, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:10:18,006 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-15 09:10:39,750 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.01 vs. limit=22.5 2024-08-15 09:10:43,357 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3108370.0, ans=0.95 2024-08-15 09:11:06,484 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.54 vs. limit=6.0 2024-08-15 09:11:07,960 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3108570.0, ans=0.0 2024-08-15 09:11:22,818 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-15 09:11:23,494 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 09:11:33,004 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.929e+01 2.377e+01 2.603e+01 2.970e+01 3.973e+01, threshold=5.206e+01, percent-clipped=0.0 2024-08-15 09:11:34,753 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 6550, loss[loss=0.09398, beats_loss=0.01233, ecapa_loss=0.0001397, whisper_loss=0.08025, over 22409.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01068, ecapa_loss=0.0001513, whisper_loss=0.09112, over 3922279.59 frames. ], batch size: 92, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:11:37,060 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3108770.0, ans=0.1 2024-08-15 09:11:47,415 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-15 09:11:48,681 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 30 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-15 09:11:52,419 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3108870.0, ans=0.2 2024-08-15 09:12:04,915 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3108870.0, ans=0.125 2024-08-15 09:12:39,411 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3109170.0, ans=0.0 2024-08-15 09:12:53,755 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 6600, loss[loss=0.08468, beats_loss=0.01155, ecapa_loss=0.0001455, whisper_loss=0.07168, over 22149.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01064, ecapa_loss=0.0001519, whisper_loss=0.09111, over 3944749.30 frames. ], batch size: 93, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:12:56,847 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3109270.0, ans=0.125 2024-08-15 09:13:03,833 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-15 09:13:13,871 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-15 09:13:16,224 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3109370.0, ans=0.0 2024-08-15 09:13:16,284 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3109370.0, ans=0.1 2024-08-15 09:13:32,611 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 35 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-15 09:13:50,476 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-15 09:14:00,337 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=12.66 vs. limit=12.0 2024-08-15 09:14:02,956 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3109670.0, ans=0.125 2024-08-15 09:14:08,203 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3109670.0, ans=0.0 2024-08-15 09:14:10,258 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.840e+01 2.330e+01 2.492e+01 2.798e+01 4.030e+01, threshold=4.985e+01, percent-clipped=0.0 2024-08-15 09:14:11,815 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 6650, loss[loss=0.1079, beats_loss=0.009367, ecapa_loss=0.0001401, whisper_loss=0.09711, over 17066.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01059, ecapa_loss=0.000152, whisper_loss=0.09136, over 3936034.11 frames. ], batch size: 66, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:14:14,054 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3109770.0, ans=0.1 2024-08-15 09:14:18,852 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3109770.0, ans=0.09899494936611666 2024-08-15 09:14:24,029 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3109770.0, ans=0.0 2024-08-15 09:14:34,455 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3109870.0, ans=0.0 2024-08-15 09:15:03,631 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.30 vs. limit=6.0 2024-08-15 09:15:22,851 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-15 09:15:23,233 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3110170.0, ans=0.125 2024-08-15 09:15:26,097 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3110170.0, ans=0.125 2024-08-15 09:15:31,280 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 6700, loss[loss=0.1288, beats_loss=0.00859, ecapa_loss=0.0001506, whisper_loss=0.1187, over 22426.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01054, ecapa_loss=0.0001523, whisper_loss=0.09116, over 3899705.62 frames. ], batch size: 87, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:15:48,970 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.38 vs. limit=6.0 2024-08-15 09:15:59,550 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3110370.0, ans=0.125 2024-08-15 09:16:00,659 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 20 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-15 09:16:41,695 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3110670.0, ans=0.1 2024-08-15 09:16:55,263 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.969e+01 2.348e+01 2.579e+01 2.866e+01 4.401e+01, threshold=5.159e+01, percent-clipped=0.0 2024-08-15 09:16:56,755 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 6750, loss[loss=0.1064, beats_loss=0.009628, ecapa_loss=0.0001495, whisper_loss=0.09527, over 22847.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01046, ecapa_loss=0.0001531, whisper_loss=0.09181, over 3896585.76 frames. ], batch size: 90, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:17:03,453 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3110770.0, ans=0.125 2024-08-15 09:17:03,554 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.786e-01 2024-08-15 09:17:05,916 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-15 09:17:06,782 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.791e+01 2024-08-15 09:17:23,119 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3110870.0, ans=0.0 2024-08-15 09:17:31,662 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 24 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-15 09:17:32,798 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3110970.0, ans=0.2 2024-08-15 09:17:40,856 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 25 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-15 09:18:05,326 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3111170.0, ans=0.1 2024-08-15 09:18:09,592 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3111170.0, ans=0.2 2024-08-15 09:18:10,156 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.52 vs. limit=10.0 2024-08-15 09:18:21,467 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 6800, loss[loss=0.09492, beats_loss=0.01192, ecapa_loss=0.0001634, whisper_loss=0.08137, over 19983.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01047, ecapa_loss=0.0001531, whisper_loss=0.09142, over 3893204.33 frames. ], batch size: 79, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:18:55,007 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 09:18:56,254 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3111470.0, ans=0.125 2024-08-15 09:19:07,441 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.51 vs. limit=22.5 2024-08-15 09:19:19,485 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 33 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-15 09:19:41,867 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.946e+01 2.366e+01 2.737e+01 3.020e+01 4.133e+01, threshold=5.473e+01, percent-clipped=0.0 2024-08-15 09:19:42,158 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 20 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-15 09:19:43,248 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 6850, loss[loss=0.09421, beats_loss=0.01141, ecapa_loss=0.0001292, whisper_loss=0.08151, over 17997.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01054, ecapa_loss=0.0001531, whisper_loss=0.09058, over 3847942.19 frames. ], batch size: 70, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:19:49,755 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3111770.0, ans=0.2 2024-08-15 09:20:02,732 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 14 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-15 09:20:21,963 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 25 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-15 09:20:22,319 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3111970.0, ans=0.125 2024-08-15 09:20:27,162 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 19 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-15 09:20:28,359 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.05 vs. limit=15.0 2024-08-15 09:20:32,322 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-15 09:20:43,739 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3112070.0, ans=0.0 2024-08-15 09:20:54,743 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 19 from LS+wenet, 23 from Vox, 15 fro AS 2024-08-15 09:21:02,363 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 16 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-15 09:21:05,630 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 6900, loss[loss=0.09973, beats_loss=0.009534, ecapa_loss=0.0001812, whisper_loss=0.08838, over 14347.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01059, ecapa_loss=0.0001529, whisper_loss=0.09044, over 3848599.56 frames. ], batch size: 58, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:21:19,571 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3112270.0, ans=0.1 2024-08-15 09:21:29,115 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 33 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-15 09:21:37,405 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 30 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-15 09:21:53,743 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 18 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-15 09:21:57,680 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 24 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-15 09:22:07,442 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 30 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-15 09:22:07,918 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3112570.0, ans=0.2 2024-08-15 09:22:12,078 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 21 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-15 09:22:18,187 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 15 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-15 09:22:25,828 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.697e+01 2.330e+01 2.607e+01 2.906e+01 3.903e+01, threshold=5.213e+01, percent-clipped=0.0 2024-08-15 09:22:27,869 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 6950, loss[loss=0.1118, beats_loss=0.01055, ecapa_loss=0.0001575, whisper_loss=0.09965, over 22376.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01065, ecapa_loss=0.0001515, whisper_loss=0.0902, over 3840508.03 frames. ], batch size: 90, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:22:35,159 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=3112770.0, ans=0.05 2024-08-15 09:22:38,792 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3112770.0, ans=0.05 2024-08-15 09:22:42,344 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3112770.0, ans=0.09899494936611666 2024-08-15 09:22:47,193 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3112870.0, ans=0.125 2024-08-15 09:22:52,480 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3112870.0, ans=0.1 2024-08-15 09:23:20,866 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3113070.0, ans=0.0 2024-08-15 09:23:33,531 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3113070.0, ans=0.125 2024-08-15 09:23:33,585 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3113070.0, ans=0.2 2024-08-15 09:23:53,452 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 7000, loss[loss=0.09798, beats_loss=0.01227, ecapa_loss=0.0001161, whisper_loss=0.08455, over 17523.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01064, ecapa_loss=0.0001518, whisper_loss=0.09036, over 3850446.01 frames. ], batch size: 69, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:23:56,252 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3113270.0, ans=0.0 2024-08-15 09:24:12,441 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3113370.0, ans=0.125 2024-08-15 09:24:43,910 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3113570.0, ans=0.125 2024-08-15 09:24:54,212 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 17 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-15 09:25:09,796 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.83 vs. limit=10.0 2024-08-15 09:25:11,774 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.285e+01 2.515e+01 2.817e+01 4.322e+01, threshold=5.031e+01, percent-clipped=0.0 2024-08-15 09:25:13,419 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 7050, loss[loss=0.09828, beats_loss=0.01206, ecapa_loss=0.0001205, whisper_loss=0.08501, over 17471.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01062, ecapa_loss=0.0001516, whisper_loss=0.09009, over 3857893.13 frames. ], batch size: 68, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:25:16,266 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.02 vs. limit=15.0 2024-08-15 09:25:28,505 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3113770.0, ans=0.0 2024-08-15 09:25:37,845 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3113870.0, ans=0.0 2024-08-15 09:25:57,033 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.01 vs. limit=10.0 2024-08-15 09:26:02,023 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 20 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-15 09:26:08,182 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 22 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-15 09:26:21,166 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 21 from LS+wenet, 28 from Vox, 25 fro AS 2024-08-15 09:26:37,071 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 7100, loss[loss=0.1109, beats_loss=0.01002, ecapa_loss=0.0001388, whisper_loss=0.09953, over 20293.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01069, ecapa_loss=0.0001521, whisper_loss=0.08959, over 3852469.07 frames. ], batch size: 79, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:26:41,216 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3114270.0, ans=0.0 2024-08-15 09:26:45,411 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 28 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-15 09:26:50,204 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3114270.0, ans=0.125 2024-08-15 09:26:55,544 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3114370.0, ans=0.125 2024-08-15 09:27:00,786 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 33 from LS+wenet, 12 from Vox, 43 fro AS 2024-08-15 09:27:09,408 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3114470.0, ans=0.0 2024-08-15 09:27:16,819 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 19 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-15 09:27:21,076 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 26 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-15 09:27:27,823 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3114570.0, ans=0.2 2024-08-15 09:27:38,838 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3114670.0, ans=0.1 2024-08-15 09:27:47,399 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 09:27:53,915 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.689e+01 2.259e+01 2.510e+01 2.858e+01 3.355e+02, threshold=5.020e+01, percent-clipped=2.0 2024-08-15 09:27:54,394 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3114770.0, ans=0.1 2024-08-15 09:27:55,107 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.05 vs. limit=22.5 2024-08-15 09:27:55,407 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 7150, loss[loss=0.1106, beats_loss=0.009732, ecapa_loss=0.0001677, whisper_loss=0.0992, over 19849.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01066, ecapa_loss=0.0001517, whisper_loss=0.09006, over 3857385.30 frames. ], batch size: 79, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:28:04,243 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3114770.0, ans=0.125 2024-08-15 09:28:11,225 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3114870.0, ans=0.125 2024-08-15 09:28:17,341 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3114870.0, ans=0.125 2024-08-15 09:28:31,783 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 21 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-15 09:28:35,612 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 16 from LS+wenet, 27 from Vox, 20 fro AS 2024-08-15 09:29:00,116 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 34 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-15 09:29:01,548 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3115170.0, ans=0.0 2024-08-15 09:29:06,757 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-15 09:29:18,989 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 7200, loss[loss=0.1038, beats_loss=0.00804, ecapa_loss=0.000159, whisper_loss=0.09418, over 18030.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01065, ecapa_loss=0.0001512, whisper_loss=0.0903, over 3882533.50 frames. ], batch size: 69, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:29:30,596 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3115270.0, ans=0.1 2024-08-15 09:29:38,568 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.86 vs. limit=15.0 2024-08-15 09:29:42,774 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.89 vs. limit=15.0 2024-08-15 09:29:43,480 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 12 from LS+wenet, 10 from Vox, 33 fro AS 2024-08-15 09:29:50,001 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3115370.0, ans=0.125 2024-08-15 09:29:51,918 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 09:30:00,167 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3115470.0, ans=0.07 2024-08-15 09:30:04,526 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.15 vs. limit=22.5 2024-08-15 09:30:06,762 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-15 09:30:13,392 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.93 vs. limit=10.0 2024-08-15 09:30:21,183 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3115570.0, ans=0.125 2024-08-15 09:30:23,004 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3115570.0, ans=0.125 2024-08-15 09:30:30,256 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3115670.0, ans=0.04949747468305833 2024-08-15 09:30:41,796 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.990e+01 2.339e+01 2.550e+01 2.963e+01 5.481e+01, threshold=5.099e+01, percent-clipped=2.0 2024-08-15 09:30:42,364 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3115770.0, ans=0.2 2024-08-15 09:30:43,349 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 7250, loss[loss=0.08889, beats_loss=0.01281, ecapa_loss=0.0001708, whisper_loss=0.07436, over 19540.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01072, ecapa_loss=0.0001505, whisper_loss=0.08972, over 3885145.64 frames. ], batch size: 80, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:30:45,248 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3115770.0, ans=0.125 2024-08-15 09:30:51,606 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3115770.0, ans=0.2 2024-08-15 09:31:16,568 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3115970.0, ans=0.035 2024-08-15 09:31:32,098 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3116070.0, ans=0.125 2024-08-15 09:31:53,776 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3116170.0, ans=0.0 2024-08-15 09:31:57,078 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3116170.0, ans=0.125 2024-08-15 09:32:02,223 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 7300, loss[loss=0.07759, beats_loss=0.01578, ecapa_loss=0.0001271, whisper_loss=0.06054, over 18712.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01066, ecapa_loss=0.0001487, whisper_loss=0.09047, over 3896335.62 frames. ], batch size: 79, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:32:17,643 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3116370.0, ans=0.125 2024-08-15 09:32:43,049 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 29 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-15 09:32:49,377 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.34 vs. limit=12.0 2024-08-15 09:33:02,271 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3116570.0, ans=0.09899494936611666 2024-08-15 09:33:19,927 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3116670.0, ans=0.125 2024-08-15 09:33:20,875 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.754e+01 2.398e+01 2.645e+01 3.010e+01 2.880e+02, threshold=5.290e+01, percent-clipped=2.0 2024-08-15 09:33:22,253 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 7350, loss[loss=0.08704, beats_loss=0.0114, ecapa_loss=0.00013, whisper_loss=0.07434, over 16445.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01068, ecapa_loss=0.0001492, whisper_loss=0.09051, over 3891508.58 frames. ], batch size: 65, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:33:25,806 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-15 09:33:44,032 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.59 vs. limit=15.0 2024-08-15 09:34:10,393 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3117070.0, ans=0.125 2024-08-15 09:34:22,179 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3117070.0, ans=0.0 2024-08-15 09:34:35,939 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3117170.0, ans=0.0 2024-08-15 09:34:39,615 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 7400, loss[loss=0.09354, beats_loss=0.01214, ecapa_loss=0.0001589, whisper_loss=0.07981, over 21586.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01071, ecapa_loss=0.0001502, whisper_loss=0.08994, over 3869124.59 frames. ], batch size: 89, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:34:45,226 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 22 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-15 09:34:46,403 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.66 vs. limit=15.0 2024-08-15 09:34:48,119 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3117270.0, ans=0.125 2024-08-15 09:35:01,149 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 21 from LS+wenet, 27 from Vox, 20 fro AS 2024-08-15 09:35:07,800 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 35 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-15 09:35:38,557 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.96 vs. limit=22.5 2024-08-15 09:35:43,690 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 22 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-15 09:35:48,617 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3117670.0, ans=0.1 2024-08-15 09:35:51,938 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3117670.0, ans=0.0 2024-08-15 09:35:58,767 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3117670.0, ans=0.0 2024-08-15 09:35:59,535 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.864e+01 2.404e+01 2.626e+01 2.946e+01 5.024e+01, threshold=5.253e+01, percent-clipped=0.0 2024-08-15 09:36:01,262 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 7450, loss[loss=0.1024, beats_loss=0.01124, ecapa_loss=0.000164, whisper_loss=0.08951, over 21407.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01064, ecapa_loss=0.0001503, whisper_loss=0.09027, over 3868488.06 frames. ], batch size: 89, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:36:01,993 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3117770.0, ans=0.125 2024-08-15 09:36:04,266 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 33 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-15 09:36:07,895 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3117770.0, ans=0.125 2024-08-15 09:36:08,827 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 19 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-15 09:36:12,570 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3117770.0, ans=0.125 2024-08-15 09:36:13,570 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 22 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-15 09:36:50,397 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 17 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-15 09:37:02,990 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3118170.0, ans=0.0 2024-08-15 09:37:09,222 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 17 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-15 09:37:17,955 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 7500, loss[loss=0.1222, beats_loss=0.008537, ecapa_loss=0.0001749, whisper_loss=0.1119, over 21586.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01055, ecapa_loss=0.0001505, whisper_loss=0.09142, over 3881184.86 frames. ], batch size: 87, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:37:28,781 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.36 vs. limit=15.0 2024-08-15 09:37:42,728 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-15 09:37:45,501 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3118370.0, ans=0.2 2024-08-15 09:37:51,078 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 23 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-15 09:37:51,342 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3118470.0, ans=0.125 2024-08-15 09:38:01,559 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3118570.0, ans=0.125 2024-08-15 09:38:07,670 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3118570.0, ans=0.125 2024-08-15 09:38:15,492 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.76 vs. limit=22.5 2024-08-15 09:38:25,669 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3118670.0, ans=0.125 2024-08-15 09:38:31,273 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 17 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-15 09:38:32,520 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.772e+01 2.370e+01 2.711e+01 2.994e+01 4.451e+02, threshold=5.422e+01, percent-clipped=4.0 2024-08-15 09:38:32,542 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 7550, loss[loss=0.08964, beats_loss=0.008819, ecapa_loss=0.0001498, whisper_loss=0.07932, over 15246.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01058, ecapa_loss=0.0001505, whisper_loss=0.09055, over 3869438.52 frames. ], batch size: 58, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:38:36,602 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3118770.0, ans=0.125 2024-08-15 09:38:48,005 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 21 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-15 09:39:07,274 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3118970.0, ans=0.1 2024-08-15 09:39:08,314 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-15 09:39:31,675 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 30 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-15 09:39:52,961 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 7600, loss[loss=0.09605, beats_loss=0.01267, ecapa_loss=0.0001535, whisper_loss=0.08185, over 21526.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0106, ecapa_loss=0.0001513, whisper_loss=0.09036, over 3874122.60 frames. ], batch size: 88, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:40:03,212 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3119270.0, ans=0.125 2024-08-15 09:40:07,877 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3119270.0, ans=0.125 2024-08-15 09:40:15,966 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3119370.0, ans=0.1 2024-08-15 09:40:29,842 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 18 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-15 09:40:33,219 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.751e+00 2024-08-15 09:40:35,755 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-15 09:40:54,774 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 18 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-15 09:41:07,579 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3119670.0, ans=0.125 2024-08-15 09:41:09,873 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.254e+01 2.450e+01 2.637e+01 4.565e+01, threshold=4.900e+01, percent-clipped=0.0 2024-08-15 09:41:09,895 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 7650, loss[loss=0.08855, beats_loss=0.01259, ecapa_loss=0.0001605, whisper_loss=0.07435, over 21794.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01055, ecapa_loss=0.0001518, whisper_loss=0.09042, over 3864182.33 frames. ], batch size: 92, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:41:20,819 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 24 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-15 09:41:48,178 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3119970.0, ans=0.05 2024-08-15 09:42:03,319 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 15 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-15 09:42:05,543 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3120070.0, ans=0.04949747468305833 2024-08-15 09:42:18,004 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.48 vs. limit=15.0 2024-08-15 09:42:21,180 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 25 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-15 09:42:27,108 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 7700, loss[loss=0.09594, beats_loss=0.01008, ecapa_loss=0.0001374, whisper_loss=0.08449, over 19078.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0105, ecapa_loss=0.0001521, whisper_loss=0.09046, over 3878033.18 frames. ], batch size: 76, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:42:48,536 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3120370.0, ans=0.09899494936611666 2024-08-15 09:42:49,535 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-15 09:43:06,495 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3120470.0, ans=0.1 2024-08-15 09:43:13,089 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 20 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-15 09:43:26,209 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3120570.0, ans=0.0 2024-08-15 09:43:32,174 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3120670.0, ans=0.1 2024-08-15 09:43:37,501 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 20 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-15 09:43:46,770 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.29 vs. limit=10.0 2024-08-15 09:43:47,272 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.349e+01 2.670e+01 3.107e+01 2.208e+02, threshold=5.341e+01, percent-clipped=1.0 2024-08-15 09:43:47,296 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 7750, loss[loss=0.1097, beats_loss=0.01053, ecapa_loss=0.0001598, whisper_loss=0.09757, over 22358.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0105, ecapa_loss=0.0001522, whisper_loss=0.09023, over 3879533.27 frames. ], batch size: 94, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:43:52,821 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.64 vs. limit=10.0 2024-08-15 09:43:56,484 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 18 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-15 09:44:30,161 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3120970.0, ans=0.125 2024-08-15 09:44:31,681 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3120970.0, ans=0.0 2024-08-15 09:44:37,308 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3121070.0, ans=0.2 2024-08-15 09:44:38,433 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-15 09:44:38,684 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3121070.0, ans=0.0 2024-08-15 09:44:46,499 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 14 from LS+wenet, 25 from Vox, 20 fro AS 2024-08-15 09:44:49,424 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 24 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-15 09:44:58,178 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3121170.0, ans=0.125 2024-08-15 09:45:05,857 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 7800, loss[loss=0.09953, beats_loss=0.01209, ecapa_loss=0.0001656, whisper_loss=0.08578, over 22634.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01059, ecapa_loss=0.0001515, whisper_loss=0.09031, over 3854338.05 frames. ], batch size: 93, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:45:22,413 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3121370.0, ans=0.125 2024-08-15 09:45:23,344 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 33 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-15 09:45:31,135 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 10 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-15 09:45:49,009 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3121470.0, ans=0.125 2024-08-15 09:45:58,029 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=3121570.0, ans=15.0 2024-08-15 09:46:05,095 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3121570.0, ans=0.1 2024-08-15 09:46:14,629 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3121670.0, ans=0.125 2024-08-15 09:46:24,286 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3121770.0, ans=0.07 2024-08-15 09:46:25,112 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.314e+01 2.626e+01 3.075e+01 3.695e+02, threshold=5.252e+01, percent-clipped=2.0 2024-08-15 09:46:25,138 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 7850, loss[loss=0.1042, beats_loss=0.007698, ecapa_loss=0.0001794, whisper_loss=0.09473, over 17577.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01058, ecapa_loss=0.0001512, whisper_loss=0.09051, over 3851949.02 frames. ], batch size: 66, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:46:36,814 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3121770.0, ans=0.125 2024-08-15 09:46:52,138 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3121970.0, ans=0.05 2024-08-15 09:47:05,246 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 31 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-15 09:47:24,699 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.20 vs. limit=15.0 2024-08-15 09:47:33,750 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 7900, loss[loss=0.1171, beats_loss=0.01086, ecapa_loss=0.0001648, whisper_loss=0.1046, over 22486.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01067, ecapa_loss=0.0001504, whisper_loss=0.09032, over 3870193.58 frames. ], batch size: 91, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:47:38,298 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-15 09:47:42,294 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 18 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-15 09:47:42,658 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.601e+05 2024-08-15 09:47:45,413 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.28 vs. limit=22.5 2024-08-15 09:47:50,332 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 34 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-15 09:47:51,922 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3122370.0, ans=0.1 2024-08-15 09:47:53,301 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3122370.0, ans=0.125 2024-08-15 09:48:25,106 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 34 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-15 09:48:30,378 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3122670.0, ans=0.1 2024-08-15 09:48:33,758 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 09:48:36,654 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=3122670.0, ans=0.2 2024-08-15 09:48:44,830 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.277e+01 2.656e+01 2.964e+01 2.137e+02, threshold=5.312e+01, percent-clipped=1.0 2024-08-15 09:48:44,850 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 7950, loss[loss=0.09639, beats_loss=0.009006, ecapa_loss=0.0001646, whisper_loss=0.08574, over 17511.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01062, ecapa_loss=0.0001514, whisper_loss=0.09021, over 3841733.27 frames. ], batch size: 72, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:48:50,282 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3122770.0, ans=0.125 2024-08-15 09:48:57,038 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.74 vs. limit=15.0 2024-08-15 09:49:21,299 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 15 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-15 09:49:24,551 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3122970.0, ans=0.125 2024-08-15 09:49:28,490 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 33 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-15 09:49:48,377 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.01 vs. limit=15.0 2024-08-15 09:49:56,513 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3123170.0, ans=0.2 2024-08-15 09:49:58,895 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 8000, loss[loss=0.08002, beats_loss=0.01554, ecapa_loss=0.0001382, whisper_loss=0.0631, over 21991.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01062, ecapa_loss=0.0001502, whisper_loss=0.09106, over 3874566.17 frames. ], batch size: 91, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:50:07,925 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-15 09:50:21,018 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 33 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-15 09:50:28,728 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.14 vs. limit=15.0 2024-08-15 09:50:34,192 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 21 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-15 09:50:41,703 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3123570.0, ans=0.125 2024-08-15 09:51:13,537 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.705e+01 2.280e+01 2.544e+01 2.885e+01 5.910e+01, threshold=5.088e+01, percent-clipped=1.0 2024-08-15 09:51:13,557 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 8050, loss[loss=0.09787, beats_loss=0.01212, ecapa_loss=0.0001515, whisper_loss=0.08423, over 16088.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01056, ecapa_loss=0.0001498, whisper_loss=0.09168, over 3877677.07 frames. ], batch size: 64, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:51:13,935 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-15 09:51:14,465 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.28 vs. limit=10.0 2024-08-15 09:51:16,663 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 25 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-15 09:51:52,610 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 22 from LS+wenet, 26 from Vox, 45 fro AS 2024-08-15 09:51:52,827 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3123970.0, ans=0.125 2024-08-15 09:51:52,915 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3123970.0, ans=0.04949747468305833 2024-08-15 09:51:55,259 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 24 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-15 09:51:58,432 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 28 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-15 09:51:59,110 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.84 vs. limit=22.5 2024-08-15 09:52:00,187 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3124070.0, ans=0.1 2024-08-15 09:52:14,305 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 19 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-15 09:52:25,003 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 8100, loss[loss=0.1133, beats_loss=0.009512, ecapa_loss=0.0001395, whisper_loss=0.1024, over 23466.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01063, ecapa_loss=0.0001475, whisper_loss=0.09125, over 3872561.66 frames. ], batch size: 92, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:53:05,649 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 09:53:06,862 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3124470.0, ans=0.125 2024-08-15 09:53:19,184 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3124570.0, ans=0.125 2024-08-15 09:53:20,743 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.42 vs. limit=12.0 2024-08-15 09:53:22,053 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3124570.0, ans=0.125 2024-08-15 09:53:38,502 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3124670.0, ans=0.0 2024-08-15 09:53:40,756 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.747e+01 2.386e+01 2.609e+01 2.958e+01 3.972e+01, threshold=5.217e+01, percent-clipped=0.0 2024-08-15 09:53:40,781 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 8150, loss[loss=0.1128, beats_loss=0.007669, ecapa_loss=0.0001723, whisper_loss=0.1034, over 17414.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01048, ecapa_loss=0.0001485, whisper_loss=0.09207, over 3854202.52 frames. ], batch size: 69, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:53:49,848 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 23 from LS+wenet, 25 from Vox, 46 fro AS 2024-08-15 09:53:54,035 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_na.min_abs, batch_count=3124770.0, ans=0.02 2024-08-15 09:53:55,313 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3124770.0, ans=0.125 2024-08-15 09:53:58,959 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3124870.0, ans=0.2 2024-08-15 09:54:30,943 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3124970.0, ans=0.2 2024-08-15 09:54:31,087 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.72 vs. limit=15.0 2024-08-15 09:54:41,592 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3125070.0, ans=0.125 2024-08-15 09:54:44,413 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.49 vs. limit=15.0 2024-08-15 09:54:45,838 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=14.03 vs. limit=15.0 2024-08-15 09:54:47,194 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3125070.0, ans=0.05 2024-08-15 09:54:48,191 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-15 09:54:56,256 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.20 vs. limit=15.0 2024-08-15 09:54:59,345 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 23 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-15 09:55:06,859 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 8200, loss[loss=0.1094, beats_loss=0.01151, ecapa_loss=0.000136, whisper_loss=0.09649, over 21331.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01055, ecapa_loss=0.0001486, whisper_loss=0.0915, over 3898154.19 frames. ], batch size: 84, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:55:33,496 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3125370.0, ans=0.1 2024-08-15 09:55:35,903 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-15 09:55:38,982 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.04 vs. limit=5.0 2024-08-15 09:55:40,485 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3125470.0, ans=0.125 2024-08-15 09:55:47,882 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 23 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-15 09:55:51,382 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3125470.0, ans=0.2 2024-08-15 09:56:05,830 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3125570.0, ans=0.1 2024-08-15 09:56:20,942 WARNING [optim.py:496] (1/4) Scaling gradients by 0.08162933588027954, model_norm_threshold=52.17145538330078 2024-08-15 09:56:21,112 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.08, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.265e+04, grad_sumsq=3.250e+06, orig_rms_sq=1.005e-02 2024-08-15 09:56:23,718 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.208e+01 2.504e+01 2.791e+01 6.391e+02, threshold=5.008e+01, percent-clipped=1.0 2024-08-15 09:56:23,738 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 8250, loss[loss=0.09363, beats_loss=0.01046, ecapa_loss=0.0001686, whisper_loss=0.08148, over 21762.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01055, ecapa_loss=0.0001502, whisper_loss=0.09134, over 3870439.85 frames. ], batch size: 90, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:56:41,040 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3125870.0, ans=0.125 2024-08-15 09:56:43,017 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.16 vs. limit=22.5 2024-08-15 09:56:45,931 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.89 vs. limit=6.0 2024-08-15 09:56:46,807 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-15 09:56:47,103 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3125870.0, ans=0.0 2024-08-15 09:56:51,272 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3125870.0, ans=0.0 2024-08-15 09:57:04,758 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.16 vs. limit=15.0 2024-08-15 09:57:18,603 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3126070.0, ans=0.1 2024-08-15 09:57:21,241 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 22 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-15 09:57:27,256 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 23 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-15 09:57:37,708 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 8300, loss[loss=0.09813, beats_loss=0.008011, ecapa_loss=0.0001734, whisper_loss=0.08839, over 15081.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01053, ecapa_loss=0.0001502, whisper_loss=0.09145, over 3862531.84 frames. ], batch size: 59, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:57:38,718 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3126270.0, ans=0.0 2024-08-15 09:57:38,759 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.054e+01 2024-08-15 09:57:39,697 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 30 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-15 09:57:46,327 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3126270.0, ans=0.0 2024-08-15 09:57:47,336 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 20 from LS+wenet, 10 from Vox, 35 fro AS 2024-08-15 09:58:03,585 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3126370.0, ans=0.125 2024-08-15 09:58:07,064 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3126370.0, ans=0.125 2024-08-15 09:58:11,584 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.77 vs. limit=15.0 2024-08-15 09:58:13,151 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3126470.0, ans=0.125 2024-08-15 09:58:23,401 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3126470.0, ans=0.09899494936611666 2024-08-15 09:58:23,419 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3126470.0, ans=0.125 2024-08-15 09:58:23,453 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3126470.0, ans=0.2 2024-08-15 09:58:33,089 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 21 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-15 09:58:40,854 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3126570.0, ans=0.1 2024-08-15 09:58:44,084 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3126670.0, ans=0.0 2024-08-15 09:59:00,225 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.746e+01 2.411e+01 2.673e+01 3.013e+01 2.459e+02, threshold=5.345e+01, percent-clipped=1.0 2024-08-15 09:59:00,260 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 8350, loss[loss=0.1087, beats_loss=0.01082, ecapa_loss=0.0001687, whisper_loss=0.09616, over 22162.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01052, ecapa_loss=0.0001505, whisper_loss=0.09189, over 3871855.41 frames. ], batch size: 92, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:59:10,356 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 18 from LS+wenet, 32 from Vox, 43 fro AS 2024-08-15 09:59:11,240 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.63 vs. limit=15.0 2024-08-15 09:59:32,595 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 21 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-15 09:59:34,590 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.40 vs. limit=15.0 2024-08-15 09:59:56,790 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3127070.0, ans=0.0 2024-08-15 10:00:12,377 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 15 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-15 10:00:16,552 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 8400, loss[loss=0.09582, beats_loss=0.0123, ecapa_loss=0.0001419, whisper_loss=0.0821, over 21721.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01062, ecapa_loss=0.0001504, whisper_loss=0.09127, over 3893542.40 frames. ], batch size: 90, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:00:24,612 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3127270.0, ans=0.2 2024-08-15 10:00:27,270 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 19 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-15 10:00:30,110 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 27 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-15 10:00:34,372 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 23 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-15 10:00:40,419 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 30 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-15 10:00:46,973 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3127470.0, ans=0.1 2024-08-15 10:00:48,315 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3127470.0, ans=0.0 2024-08-15 10:01:02,373 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 26 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-15 10:01:05,569 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.whiten.whitening_limit, batch_count=3127570.0, ans=12.0 2024-08-15 10:01:08,625 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.34 vs. limit=12.0 2024-08-15 10:01:26,809 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 15 from Vox, 48 fro AS 2024-08-15 10:01:31,246 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.918e+01 2.355e+01 2.603e+01 2.827e+01 7.121e+01, threshold=5.205e+01, percent-clipped=1.0 2024-08-15 10:01:31,266 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 8450, loss[loss=0.08484, beats_loss=0.01043, ecapa_loss=0.0001319, whisper_loss=0.07309, over 14583.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01052, ecapa_loss=0.0001516, whisper_loss=0.0917, over 3922629.34 frames. ], batch size: 56, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:01:52,325 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3127870.0, ans=0.1 2024-08-15 10:02:25,011 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 22 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-15 10:02:29,822 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3128070.0, ans=0.125 2024-08-15 10:02:41,714 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3128170.0, ans=0.125 2024-08-15 10:02:52,273 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 8500, loss[loss=0.08514, beats_loss=0.01112, ecapa_loss=0.0001479, whisper_loss=0.07255, over 17020.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01052, ecapa_loss=0.0001526, whisper_loss=0.09135, over 3925929.54 frames. ], batch size: 69, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:02:52,545 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 23 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-15 10:02:54,809 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.82 vs. limit=22.5 2024-08-15 10:03:17,279 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 26 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-15 10:03:33,442 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 23 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-15 10:03:50,087 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 30 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-15 10:03:50,735 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3128570.0, ans=0.2 2024-08-15 10:04:00,025 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3128670.0, ans=0.125 2024-08-15 10:04:08,713 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=3128670.0, ans=15.0 2024-08-15 10:04:11,092 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.981e+01 2.400e+01 2.720e+01 3.121e+01 2.458e+02, threshold=5.440e+01, percent-clipped=2.0 2024-08-15 10:04:11,113 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 8550, loss[loss=0.1276, beats_loss=0.009996, ecapa_loss=0.0001638, whisper_loss=0.116, over 15703.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01045, ecapa_loss=0.0001519, whisper_loss=0.09198, over 3906167.67 frames. ], batch size: 63, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:04:29,584 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3128870.0, ans=0.1 2024-08-15 10:04:39,017 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.81 vs. limit=15.0 2024-08-15 10:04:42,693 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-15 10:05:06,724 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.08 vs. limit=15.0 2024-08-15 10:05:12,191 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.47 vs. limit=15.0 2024-08-15 10:05:16,305 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3129170.0, ans=0.125 2024-08-15 10:05:25,960 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 8600, loss[loss=0.09027, beats_loss=0.01276, ecapa_loss=0.0001394, whisper_loss=0.07612, over 20911.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01047, ecapa_loss=0.0001519, whisper_loss=0.09192, over 3914474.27 frames. ], batch size: 87, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:05:44,939 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3129370.0, ans=0.125 2024-08-15 10:05:59,900 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 25 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-15 10:06:01,738 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3129470.0, ans=0.125 2024-08-15 10:06:03,146 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3129470.0, ans=0.1 2024-08-15 10:06:03,539 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.95 vs. limit=10.0 2024-08-15 10:06:16,148 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3129570.0, ans=0.0 2024-08-15 10:06:16,474 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3129570.0, ans=0.125 2024-08-15 10:06:32,629 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3129670.0, ans=0.2 2024-08-15 10:06:37,930 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.410e+01 2.647e+01 2.940e+01 4.400e+01, threshold=5.294e+01, percent-clipped=0.0 2024-08-15 10:06:37,951 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 8650, loss[loss=0.1073, beats_loss=0.01342, ecapa_loss=0.0001295, whisper_loss=0.09263, over 23091.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01045, ecapa_loss=0.0001523, whisper_loss=0.09191, over 3879425.58 frames. ], batch size: 94, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:06:39,072 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3129770.0, ans=0.125 2024-08-15 10:06:49,588 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.548e-03 2024-08-15 10:06:52,238 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-15 10:06:54,217 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3129870.0, ans=0.0 2024-08-15 10:07:16,647 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3129970.0, ans=0.0 2024-08-15 10:07:17,593 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 23 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-15 10:07:37,265 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 21 from LS+wenet, 11 from Vox, 38 fro AS 2024-08-15 10:07:43,172 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3130170.0, ans=0.2 2024-08-15 10:08:00,911 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 8700, loss[loss=0.09367, beats_loss=0.01292, ecapa_loss=0.0001384, whisper_loss=0.07937, over 22752.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01061, ecapa_loss=0.0001512, whisper_loss=0.0907, over 3892132.88 frames. ], batch size: 94, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:08:04,458 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3130270.0, ans=0.125 2024-08-15 10:08:05,688 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3130270.0, ans=0.125 2024-08-15 10:08:05,712 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3130270.0, ans=0.125 2024-08-15 10:08:09,238 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.whiten.whitening_limit, batch_count=3130270.0, ans=15.0 2024-08-15 10:08:10,223 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3130270.0, ans=0.125 2024-08-15 10:08:15,123 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 15 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-15 10:08:17,523 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3130370.0, ans=0.125 2024-08-15 10:08:26,987 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3130370.0, ans=0.125 2024-08-15 10:08:31,091 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.32 vs. limit=12.0 2024-08-15 10:08:45,344 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3130470.0, ans=0.09899494936611666 2024-08-15 10:08:50,791 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.04 vs. limit=8.0 2024-08-15 10:08:53,303 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=6.0 2024-08-15 10:09:14,937 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2024-08-15 10:09:22,507 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.333e+01 2.545e+01 2.859e+01 2.640e+02, threshold=5.090e+01, percent-clipped=1.0 2024-08-15 10:09:22,527 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 8750, loss[loss=0.1089, beats_loss=0.008465, ecapa_loss=0.0001304, whisper_loss=0.09917, over 15698.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01054, ecapa_loss=0.0001519, whisper_loss=0.09077, over 3865012.47 frames. ], batch size: 56, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:09:24,441 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 34 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-15 10:09:41,898 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.71 vs. limit=15.0 2024-08-15 10:09:52,097 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3130870.0, ans=0.125 2024-08-15 10:09:53,433 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3130870.0, ans=0.1 2024-08-15 10:10:19,072 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3131070.0, ans=0.0 2024-08-15 10:10:29,444 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3131170.0, ans=0.1 2024-08-15 10:10:36,309 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 20 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-15 10:10:37,634 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 22 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-15 10:10:41,460 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 8800, loss[loss=0.1169, beats_loss=0.009269, ecapa_loss=0.0001817, whisper_loss=0.1058, over 18675.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0106, ecapa_loss=0.0001525, whisper_loss=0.0909, over 3878122.97 frames. ], batch size: 76, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:10:42,225 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.71 vs. limit=22.5 2024-08-15 10:11:02,594 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.585e-01 2024-08-15 10:11:07,104 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3131370.0, ans=0.0 2024-08-15 10:11:17,882 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.15 vs. limit=15.0 2024-08-15 10:11:29,748 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3131570.0, ans=0.1 2024-08-15 10:11:34,925 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 23 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-15 10:11:40,890 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 21 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-15 10:11:43,845 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3131670.0, ans=0.0 2024-08-15 10:11:50,281 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 15 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-15 10:11:54,613 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.912e+01 2.280e+01 2.531e+01 2.796e+01 4.372e+01, threshold=5.061e+01, percent-clipped=0.0 2024-08-15 10:11:54,633 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 8850, loss[loss=0.1138, beats_loss=0.01155, ecapa_loss=0.00011, whisper_loss=0.1011, over 23481.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01066, ecapa_loss=0.0001503, whisper_loss=0.09123, over 3893904.61 frames. ], batch size: 89, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:11:56,569 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3131770.0, ans=0.0 2024-08-15 10:12:09,210 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 32 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-15 10:12:12,672 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3131870.0, ans=0.125 2024-08-15 10:12:27,716 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3131970.0, ans=0.125 2024-08-15 10:12:58,265 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3132170.0, ans=0.125 2024-08-15 10:13:01,080 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 29 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-15 10:13:01,579 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3132170.0, ans=0.125 2024-08-15 10:13:08,898 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 8900, loss[loss=0.07065, beats_loss=0.01098, ecapa_loss=0.0002044, whisper_loss=0.05763, over 16476.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01068, ecapa_loss=0.0001521, whisper_loss=0.09085, over 3876243.47 frames. ], batch size: 74, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:13:16,387 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3132270.0, ans=0.125 2024-08-15 10:13:24,752 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 14 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-15 10:13:30,577 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-15 10:13:34,134 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 23 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-15 10:13:56,700 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3132570.0, ans=0.0 2024-08-15 10:14:08,113 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 18 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-15 10:14:18,005 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3132770.0, ans=0.0 2024-08-15 10:14:18,807 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.768e+01 2.382e+01 2.533e+01 2.856e+01 1.200e+02, threshold=5.066e+01, percent-clipped=2.0 2024-08-15 10:14:18,830 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 8950, loss[loss=0.1149, beats_loss=0.01052, ecapa_loss=0.0001577, whisper_loss=0.1028, over 15212.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01065, ecapa_loss=0.0001517, whisper_loss=0.09095, over 3877157.03 frames. ], batch size: 60, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:14:48,877 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3132970.0, ans=0.2 2024-08-15 10:14:50,131 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 19 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-15 10:15:04,726 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3133070.0, ans=0.2 2024-08-15 10:15:16,046 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=3133170.0, ans=0.025 2024-08-15 10:15:20,307 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3133170.0, ans=0.0 2024-08-15 10:15:21,263 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 30 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-15 10:15:29,233 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 9000, loss[loss=0.07322, beats_loss=0.01409, ecapa_loss=0.0001645, whisper_loss=0.05749, over 12779.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01065, ecapa_loss=0.000152, whisper_loss=0.09052, over 3856596.01 frames. ], batch size: 55, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:15:29,233 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-15 10:16:12,835 INFO [train_multi_KD3.py:1149] (1/4) Epoch 22, validation on ASR_libri: loss=0.2522, beats_loss=0, ecapa_loss=0.0005364, whisper_loss=0.2468, over 922467.00 frames. 2024-08-15 10:16:35,677 INFO [train_multi_KD3.py:1149] (1/4) Epoch 22, validation on SV_voxceleb1: loss=0.004068, beats_loss=0, ecapa_loss=0.0004068, whisper_loss=0, over 939242.00 frames. 2024-08-15 10:18:37,890 INFO [train_multi_KD3.py:1149] (1/4) Epoch 22, validation on AT_audioset: loss=0.02332, beats_loss=0.02332, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 10:18:37,894 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 30639MB 2024-08-15 10:18:49,242 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 35 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-15 10:19:06,023 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3133470.0, ans=0.0 2024-08-15 10:19:14,397 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 29 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-15 10:19:23,208 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.21 vs. limit=10.0 2024-08-15 10:19:40,320 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3133670.0, ans=0.125 2024-08-15 10:19:48,043 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.756e+01 2.223e+01 2.558e+01 2.882e+01 3.996e+01, threshold=5.117e+01, percent-clipped=0.0 2024-08-15 10:19:48,067 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 9050, loss[loss=0.1388, beats_loss=0.009078, ecapa_loss=0.000135, whisper_loss=0.1284, over 23726.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01065, ecapa_loss=0.0001514, whisper_loss=0.09163, over 3862591.76 frames. ], batch size: 89, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:20:20,669 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3133970.0, ans=0.2 2024-08-15 10:20:31,809 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3134070.0, ans=0.125 2024-08-15 10:20:57,589 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 9100, loss[loss=0.1058, beats_loss=0.00988, ecapa_loss=0.0001747, whisper_loss=0.09421, over 19862.00 frames. ], tot_loss[loss=0.104, beats_loss=0.0106, ecapa_loss=0.0001528, whisper_loss=0.09192, over 3872099.59 frames. ], batch size: 82, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:21:02,471 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3134270.0, ans=0.125 2024-08-15 10:21:22,068 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=4.968e-02 2024-08-15 10:21:28,906 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3134470.0, ans=0.0 2024-08-15 10:21:40,521 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.16 vs. limit=15.0 2024-08-15 10:21:44,569 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3134570.0, ans=0.125 2024-08-15 10:21:44,607 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3134570.0, ans=0.0 2024-08-15 10:21:45,980 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.636e+01 2024-08-15 10:21:48,525 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 22 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-15 10:22:01,773 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 30 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-15 10:22:08,216 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3134770.0, ans=0.0 2024-08-15 10:22:08,890 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.360e+01 2.655e+01 2.981e+01 2.632e+02, threshold=5.310e+01, percent-clipped=2.0 2024-08-15 10:22:08,909 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 9150, loss[loss=0.08924, beats_loss=0.01187, ecapa_loss=0.0001427, whisper_loss=0.07595, over 14841.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01058, ecapa_loss=0.0001529, whisper_loss=0.09151, over 3857789.02 frames. ], batch size: 60, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:22:13,631 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 10:22:23,054 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 21 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-15 10:22:28,650 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3134870.0, ans=0.0 2024-08-15 10:22:34,595 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.89 vs. limit=15.0 2024-08-15 10:22:37,130 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3134970.0, ans=0.125 2024-08-15 10:22:38,889 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=5.327e-02 2024-08-15 10:22:44,621 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.53 vs. limit=12.0 2024-08-15 10:22:46,930 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 16 from LS+wenet, 23 from Vox, 20 fro AS 2024-08-15 10:23:09,367 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-15 10:23:17,362 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3135170.0, ans=0.0 2024-08-15 10:23:17,363 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3135170.0, ans=0.1 2024-08-15 10:23:20,710 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3135170.0, ans=0.0 2024-08-15 10:23:20,740 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3135170.0, ans=0.0 2024-08-15 10:23:23,191 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 9200, loss[loss=0.09358, beats_loss=0.01179, ecapa_loss=0.0001309, whisper_loss=0.08048, over 21378.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01058, ecapa_loss=0.0001531, whisper_loss=0.09102, over 3864133.25 frames. ], batch size: 84, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:23:31,731 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3135270.0, ans=0.035 2024-08-15 10:23:32,094 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.40 vs. limit=15.0 2024-08-15 10:23:46,541 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3135370.0, ans=0.0 2024-08-15 10:24:26,848 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 23 from LS+wenet, 30 from Vox, 39 fro AS 2024-08-15 10:24:46,683 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.313e+01 2.628e+01 2.847e+01 1.469e+02, threshold=5.255e+01, percent-clipped=1.0 2024-08-15 10:24:46,703 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 9250, loss[loss=0.1037, beats_loss=0.01177, ecapa_loss=0.0001556, whisper_loss=0.09034, over 22208.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01053, ecapa_loss=0.0001535, whisper_loss=0.09139, over 3892619.63 frames. ], batch size: 90, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:24:46,872 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 14 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-15 10:25:16,390 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3135870.0, ans=0.125 2024-08-15 10:25:16,438 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3135870.0, ans=0.0 2024-08-15 10:25:40,458 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 22 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-15 10:25:45,184 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3136070.0, ans=0.125 2024-08-15 10:25:48,705 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3136070.0, ans=0.125 2024-08-15 10:25:54,035 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3136070.0, ans=0.1 2024-08-15 10:25:55,493 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3136170.0, ans=0.1 2024-08-15 10:25:57,371 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3136170.0, ans=0.0 2024-08-15 10:26:04,675 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.87 vs. limit=10.0 2024-08-15 10:26:08,816 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 27 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-15 10:26:13,807 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 9300, loss[loss=0.09305, beats_loss=0.01206, ecapa_loss=0.0001049, whisper_loss=0.07995, over 24281.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01049, ecapa_loss=0.0001519, whisper_loss=0.09147, over 3883174.66 frames. ], batch size: 94, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:26:26,887 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.whiten.whitening_limit, batch_count=3136270.0, ans=15.0 2024-08-15 10:26:33,765 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3136370.0, ans=0.1 2024-08-15 10:26:51,529 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.693e+01 2024-08-15 10:27:00,426 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 27 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-15 10:27:11,487 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 10:27:15,615 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-15 10:27:24,114 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3136670.0, ans=0.0 2024-08-15 10:27:26,479 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.790e+01 2.363e+01 2.589e+01 2.922e+01 5.036e+01, threshold=5.179e+01, percent-clipped=0.0 2024-08-15 10:27:26,498 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 9350, loss[loss=0.07181, beats_loss=0.01173, ecapa_loss=0.0001728, whisper_loss=0.05835, over 17140.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01046, ecapa_loss=0.0001531, whisper_loss=0.09133, over 3880615.51 frames. ], batch size: 71, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:27:39,174 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 28 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-15 10:27:50,992 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3136870.0, ans=0.125 2024-08-15 10:28:04,823 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 13 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-15 10:28:05,501 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.45 vs. limit=15.0 2024-08-15 10:28:09,276 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3137070.0, ans=0.125 2024-08-15 10:28:16,764 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.986e-01 2024-08-15 10:28:24,421 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=26.41 vs. limit=22.5 2024-08-15 10:28:35,959 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 9400, loss[loss=0.09684, beats_loss=0.01207, ecapa_loss=0.0001456, whisper_loss=0.08331, over 22739.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01054, ecapa_loss=0.0001523, whisper_loss=0.09082, over 3846858.47 frames. ], batch size: 91, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:28:36,441 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3137270.0, ans=0.04949747468305833 2024-08-15 10:28:38,800 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 20 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-15 10:28:51,333 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 31 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-15 10:28:57,437 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3137370.0, ans=0.0 2024-08-15 10:28:58,572 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 15 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-15 10:29:00,024 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3137370.0, ans=0.1 2024-08-15 10:29:01,210 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3137370.0, ans=0.125 2024-08-15 10:29:02,423 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 20 from LS+wenet, 14 from Vox, 43 fro AS 2024-08-15 10:29:06,903 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3137470.0, ans=0.0 2024-08-15 10:29:06,936 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3137470.0, ans=0.0 2024-08-15 10:29:19,314 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.24 vs. limit=15.0 2024-08-15 10:29:29,834 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 20 from LS+wenet, 29 from Vox, 25 fro AS 2024-08-15 10:29:37,347 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3137670.0, ans=0.125 2024-08-15 10:29:39,852 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3137670.0, ans=10.0 2024-08-15 10:29:45,426 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.347e+01 2.545e+01 2.871e+01 4.993e+01, threshold=5.089e+01, percent-clipped=0.0 2024-08-15 10:29:45,450 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 9450, loss[loss=0.1047, beats_loss=0.008547, ecapa_loss=0.0001605, whisper_loss=0.09452, over 19599.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0106, ecapa_loss=0.0001531, whisper_loss=0.09015, over 3831926.03 frames. ], batch size: 77, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:30:02,131 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 27 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-15 10:30:09,495 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3137870.0, ans=0.125 2024-08-15 10:30:11,010 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=6.546e-02 2024-08-15 10:30:25,719 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3138070.0, ans=0.125 2024-08-15 10:30:52,415 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3138170.0, ans=0.2 2024-08-15 10:30:54,639 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 9500, loss[loss=0.1097, beats_loss=0.009333, ecapa_loss=0.0001535, whisper_loss=0.09878, over 22679.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0106, ecapa_loss=0.0001526, whisper_loss=0.0906, over 3872344.68 frames. ], batch size: 89, lr: 2.82e-03, grad_scale: 1.152921504606847e+18 2024-08-15 10:31:03,488 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3138270.0, ans=0.125 2024-08-15 10:31:10,410 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.248e+00 2024-08-15 10:31:19,796 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3138370.0, ans=0.0 2024-08-15 10:31:27,844 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3138470.0, ans=0.2 2024-08-15 10:31:43,471 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.whiten.whitening_limit, batch_count=3138570.0, ans=12.0 2024-08-15 10:31:56,599 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 20 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-15 10:31:57,343 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.10 vs. limit=6.0 2024-08-15 10:31:59,732 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3138670.0, ans=0.125 2024-08-15 10:32:03,582 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.809e+01 2.464e+01 2.657e+01 3.059e+01 1.936e+02, threshold=5.313e+01, percent-clipped=3.0 2024-08-15 10:32:03,601 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 9550, loss[loss=0.09107, beats_loss=0.01105, ecapa_loss=0.0001463, whisper_loss=0.07856, over 21977.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01064, ecapa_loss=0.0001525, whisper_loss=0.09004, over 3877764.32 frames. ], batch size: 90, lr: 2.82e-03, grad_scale: 1.152921504606847e+18 2024-08-15 10:32:20,950 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3138870.0, ans=0.1 2024-08-15 10:32:23,418 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 13 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-15 10:32:25,492 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.19 vs. limit=15.0 2024-08-15 10:32:50,012 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.70 vs. limit=15.0 2024-08-15 10:32:54,495 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.25 vs. limit=22.5 2024-08-15 10:32:59,088 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3139070.0, ans=0.1 2024-08-15 10:33:01,613 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 19 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-15 10:33:13,342 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 24 from LS+wenet, 26 from Vox, 23 fro AS 2024-08-15 10:33:15,641 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 9600, loss[loss=0.1094, beats_loss=0.01224, ecapa_loss=0.0001399, whisper_loss=0.09573, over 22514.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01055, ecapa_loss=0.0001519, whisper_loss=0.09051, over 3885184.33 frames. ], batch size: 90, lr: 2.82e-03, grad_scale: 1.152921504606847e+18 2024-08-15 10:33:29,225 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 21 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-15 10:33:32,721 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3139370.0, ans=0.0 2024-08-15 10:33:34,394 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.95 vs. limit=10.0 2024-08-15 10:33:46,803 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3139470.0, ans=0.2 2024-08-15 10:34:00,612 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=15.37 vs. limit=15.0 2024-08-15 10:34:20,026 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 21 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-15 10:34:22,854 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 32 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-15 10:34:26,667 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 9650, loss[loss=0.1341, beats_loss=0.009435, ecapa_loss=0.0001353, whisper_loss=0.1233, over 17379.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01056, ecapa_loss=0.0001513, whisper_loss=0.09019, over 3843863.14 frames. ], batch size: 66, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:34:27,339 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.27 vs. limit=6.0 2024-08-15 10:34:27,875 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.910e+01 2.224e+01 2.493e+01 2.795e+01 4.633e+01, threshold=4.986e+01, percent-clipped=0.0 2024-08-15 10:34:28,666 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3139770.0, ans=0.0 2024-08-15 10:34:35,599 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3139770.0, ans=0.1 2024-08-15 10:34:47,984 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3139870.0, ans=0.125 2024-08-15 10:34:54,498 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 13 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-15 10:35:05,974 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3139970.0, ans=0.0 2024-08-15 10:35:33,881 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.08 vs. limit=12.0 2024-08-15 10:35:36,017 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 9700, loss[loss=0.09505, beats_loss=0.01242, ecapa_loss=0.000121, whisper_loss=0.08142, over 18625.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01054, ecapa_loss=0.0001511, whisper_loss=0.08948, over 3837542.04 frames. ], batch size: 73, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:35:46,203 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3140270.0, ans=0.125 2024-08-15 10:35:49,778 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 20 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-15 10:35:57,475 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.36 vs. limit=6.0 2024-08-15 10:35:57,590 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.02 vs. limit=15.0 2024-08-15 10:36:09,302 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 39 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-15 10:36:19,708 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.30 vs. limit=22.5 2024-08-15 10:36:27,544 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3140570.0, ans=0.0 2024-08-15 10:36:31,538 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-15 10:36:36,890 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 14 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-15 10:36:45,566 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 9750, loss[loss=0.09922, beats_loss=0.01033, ecapa_loss=0.0001679, whisper_loss=0.08721, over 16911.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01043, ecapa_loss=0.0001515, whisper_loss=0.09041, over 3845401.07 frames. ], batch size: 67, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:36:46,847 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.834e+01 2.354e+01 2.591e+01 2.841e+01 9.647e+01, threshold=5.183e+01, percent-clipped=2.0 2024-08-15 10:37:03,702 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 20 from LS+wenet, 38 from Vox, 35 fro AS 2024-08-15 10:37:13,263 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3140970.0, ans=0.0 2024-08-15 10:37:16,175 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3140970.0, ans=0.1 2024-08-15 10:37:27,742 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 38 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-15 10:37:38,953 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 16 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-15 10:37:46,302 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 14 from LS+wenet, 29 from Vox, 28 fro AS 2024-08-15 10:37:55,368 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.37 vs. limit=15.0 2024-08-15 10:37:55,827 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 9800, loss[loss=0.1146, beats_loss=0.0105, ecapa_loss=0.0001625, whisper_loss=0.1025, over 19098.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0105, ecapa_loss=0.0001518, whisper_loss=0.09033, over 3855331.12 frames. ], batch size: 77, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:38:01,732 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3141270.0, ans=0.125 2024-08-15 10:38:14,949 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3141370.0, ans=0.2 2024-08-15 10:38:18,646 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 22 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-15 10:38:24,218 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 19 from LS+wenet, 31 from Vox, 34 fro AS 2024-08-15 10:38:38,747 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3141570.0, ans=0.1 2024-08-15 10:38:39,922 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 19 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-15 10:38:42,559 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3141570.0, ans=0.2 2024-08-15 10:38:44,828 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 25 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-15 10:39:05,440 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 9850, loss[loss=0.07435, beats_loss=0.01158, ecapa_loss=0.000148, whisper_loss=0.06128, over 14985.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01051, ecapa_loss=0.0001524, whisper_loss=0.09049, over 3837853.15 frames. ], batch size: 60, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:39:06,746 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.699e+01 2.300e+01 2.506e+01 2.923e+01 9.908e+01, threshold=5.012e+01, percent-clipped=1.0 2024-08-15 10:39:11,434 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3141770.0, ans=0.0 2024-08-15 10:39:19,696 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3141870.0, ans=0.125 2024-08-15 10:39:25,165 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3141870.0, ans=0.2 2024-08-15 10:39:34,972 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=10.37 vs. limit=12.0 2024-08-15 10:39:45,178 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 16 from LS+wenet, 31 from Vox, 19 fro AS 2024-08-15 10:40:03,098 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3142170.0, ans=0.04949747468305833 2024-08-15 10:40:13,774 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 9900, loss[loss=0.09655, beats_loss=0.00969, ecapa_loss=0.0001804, whisper_loss=0.08506, over 22284.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0106, ecapa_loss=0.0001519, whisper_loss=0.08977, over 3820071.05 frames. ], batch size: 93, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:40:37,183 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-15 10:40:37,401 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3142370.0, ans=0.0 2024-08-15 10:40:47,182 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.86 vs. limit=15.0 2024-08-15 10:40:50,119 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.69 vs. limit=15.0 2024-08-15 10:40:52,368 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 27 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-15 10:41:05,968 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 17 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-15 10:41:19,247 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.76 vs. limit=8.0 2024-08-15 10:41:22,148 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 9950, loss[loss=0.1041, beats_loss=0.01082, ecapa_loss=0.0001448, whisper_loss=0.09185, over 19950.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01068, ecapa_loss=0.0001503, whisper_loss=0.08977, over 3843929.18 frames. ], batch size: 77, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:41:24,921 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.971e+01 2.404e+01 2.640e+01 2.920e+01 4.147e+01, threshold=5.279e+01, percent-clipped=0.0 2024-08-15 10:41:25,759 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 10:41:33,321 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3142770.0, ans=0.0 2024-08-15 10:41:49,365 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-15 10:41:53,157 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3142970.0, ans=0.2 2024-08-15 10:42:08,066 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3143070.0, ans=0.125 2024-08-15 10:42:09,548 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3143070.0, ans=0.1 2024-08-15 10:42:12,108 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3143070.0, ans=0.0 2024-08-15 10:42:21,602 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3143170.0, ans=0.1 2024-08-15 10:42:27,662 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3143170.0, ans=0.2 2024-08-15 10:42:30,404 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3143170.0, ans=0.125 2024-08-15 10:42:32,008 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3143270.0, ans=0.125 2024-08-15 10:42:32,803 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 10000, loss[loss=0.09643, beats_loss=0.01149, ecapa_loss=0.0001461, whisper_loss=0.08348, over 22981.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01068, ecapa_loss=0.0001508, whisper_loss=0.09, over 3827377.67 frames. ], batch size: 90, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:42:42,745 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-15 10:42:49,840 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3143370.0, ans=0.125 2024-08-15 10:43:14,217 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.08 vs. limit=15.0 2024-08-15 10:43:22,682 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3143570.0, ans=0.125 2024-08-15 10:43:29,240 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-15 10:43:32,983 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 22 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-15 10:43:36,401 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 23 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-15 10:43:42,838 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-15 10:43:50,147 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 10050, loss[loss=0.09796, beats_loss=0.01134, ecapa_loss=0.0001524, whisper_loss=0.0851, over 21617.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01063, ecapa_loss=0.0001507, whisper_loss=0.09064, over 3867872.83 frames. ], batch size: 89, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:43:53,508 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.380e+01 2.609e+01 2.956e+01 1.893e+02, threshold=5.219e+01, percent-clipped=1.0 2024-08-15 10:44:04,990 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 28 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-15 10:44:06,683 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3143870.0, ans=0.0 2024-08-15 10:44:06,777 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3143870.0, ans=0.125 2024-08-15 10:44:20,070 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3143870.0, ans=0.125 2024-08-15 10:44:23,500 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3143870.0, ans=0.125 2024-08-15 10:44:54,929 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3144070.0, ans=0.125 2024-08-15 10:45:23,651 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-15 10:45:28,056 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 10100, loss[loss=0.08736, beats_loss=0.01171, ecapa_loss=0.0001094, whisper_loss=0.07455, over 16339.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0106, ecapa_loss=0.0001522, whisper_loss=0.0916, over 3890610.68 frames. ], batch size: 60, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:45:28,696 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3144270.0, ans=0.125 2024-08-15 10:45:45,774 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3144270.0, ans=0.1 2024-08-15 10:46:12,783 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3144470.0, ans=0.125 2024-08-15 10:46:55,775 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 24 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-15 10:47:09,452 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3144670.0, ans=0.125 2024-08-15 10:47:11,824 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3144670.0, ans=0.0 2024-08-15 10:47:24,275 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 10150, loss[loss=0.1121, beats_loss=0.01006, ecapa_loss=0.000152, whisper_loss=0.1006, over 22739.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0106, ecapa_loss=0.0001533, whisper_loss=0.09097, over 3886827.42 frames. ], batch size: 92, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:47:29,575 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.324e+01 2.588e+01 2.924e+01 3.968e+01, threshold=5.175e+01, percent-clipped=0.0 2024-08-15 10:47:45,997 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-15 10:47:59,633 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3144870.0, ans=0.0 2024-08-15 10:48:05,796 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 18 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-15 10:48:14,140 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3144870.0, ans=0.125 2024-08-15 10:48:47,215 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-15 10:49:00,200 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 12 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-15 10:49:06,182 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 10200, loss[loss=0.1019, beats_loss=0.008835, ecapa_loss=0.0001819, whisper_loss=0.09122, over 19709.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0106, ecapa_loss=0.0001536, whisper_loss=0.09091, over 3891353.52 frames. ], batch size: 81, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:49:06,734 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 24 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-15 10:49:11,281 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 29 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-15 10:49:38,104 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 26 from LS+wenet, 13 from Vox, 43 fro AS 2024-08-15 10:49:44,282 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3145470.0, ans=0.125 2024-08-15 10:50:23,385 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 10250, loss[loss=0.1123, beats_loss=0.009155, ecapa_loss=0.0001777, whisper_loss=0.1014, over 21806.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01062, ecapa_loss=0.0001526, whisper_loss=0.09107, over 3876572.48 frames. ], batch size: 88, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:50:26,704 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.845e+01 2.258e+01 2.433e+01 2.798e+01 3.625e+01, threshold=4.866e+01, percent-clipped=0.0 2024-08-15 10:50:31,253 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 23 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-15 10:50:34,008 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3145770.0, ans=0.0 2024-08-15 10:50:45,142 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 26 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-15 10:50:51,137 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 20 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-15 10:50:51,486 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3145870.0, ans=0.95 2024-08-15 10:50:53,111 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3145970.0, ans=0.04949747468305833 2024-08-15 10:50:56,654 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3145970.0, ans=0.0 2024-08-15 10:51:13,570 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 26 from LS+wenet, 30 from Vox, 28 fro AS 2024-08-15 10:51:28,869 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3146170.0, ans=0.125 2024-08-15 10:51:36,117 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3146170.0, ans=0.0 2024-08-15 10:51:42,161 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 10300, loss[loss=0.09923, beats_loss=0.01072, ecapa_loss=0.0001759, whisper_loss=0.08675, over 16761.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01064, ecapa_loss=0.000153, whisper_loss=0.0908, over 3880699.67 frames. ], batch size: 69, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:51:50,658 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3146270.0, ans=0.0 2024-08-15 10:52:08,090 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3146370.0, ans=0.125 2024-08-15 10:52:25,354 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 20 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-15 10:52:49,622 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.86 vs. limit=12.0 2024-08-15 10:52:58,465 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3146670.0, ans=0.1 2024-08-15 10:53:05,764 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 10350, loss[loss=0.1041, beats_loss=0.01292, ecapa_loss=0.0001374, whisper_loss=0.08978, over 22160.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01065, ecapa_loss=0.000153, whisper_loss=0.09083, over 3909839.54 frames. ], batch size: 89, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:53:08,852 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.977e+01 2.405e+01 2.644e+01 3.063e+01 2.497e+02, threshold=5.287e+01, percent-clipped=1.0 2024-08-15 10:53:10,948 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3146770.0, ans=0.0 2024-08-15 10:53:15,607 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.28 vs. limit=12.0 2024-08-15 10:53:31,068 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 27 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-15 10:53:44,141 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3146970.0, ans=0.125 2024-08-15 10:54:18,477 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3147170.0, ans=0.125 2024-08-15 10:54:24,945 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 22 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-15 10:54:28,426 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 10400, loss[loss=0.1024, beats_loss=0.01052, ecapa_loss=0.0001877, whisper_loss=0.09004, over 20471.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01058, ecapa_loss=0.000152, whisper_loss=0.09116, over 3889802.89 frames. ], batch size: 88, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:54:32,060 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 20 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-15 10:55:04,457 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 18 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-15 10:55:04,811 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3147470.0, ans=0.07 2024-08-15 10:55:10,241 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 25 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-15 10:55:12,299 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3147470.0, ans=0.125 2024-08-15 10:55:44,204 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.41 vs. limit=22.5 2024-08-15 10:55:51,211 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3147770.0, ans=0.125 2024-08-15 10:55:52,234 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 10450, loss[loss=0.09428, beats_loss=0.01054, ecapa_loss=0.0001531, whisper_loss=0.08221, over 21145.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01065, ecapa_loss=0.0001507, whisper_loss=0.09063, over 3878831.82 frames. ], batch size: 83, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:55:54,997 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.272e+01 2.480e+01 2.758e+01 4.514e+01, threshold=4.959e+01, percent-clipped=0.0 2024-08-15 10:56:10,816 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3147870.0, ans=0.2 2024-08-15 10:56:24,116 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3147970.0, ans=0.125 2024-08-15 10:56:53,074 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-15 10:57:02,180 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 20 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-15 10:57:08,651 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 10500, loss[loss=0.09616, beats_loss=0.01092, ecapa_loss=0.0001362, whisper_loss=0.08388, over 18394.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01055, ecapa_loss=0.0001516, whisper_loss=0.09105, over 3892719.56 frames. ], batch size: 72, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:57:22,524 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3148270.0, ans=0.125 2024-08-15 10:57:23,638 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 29 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-15 10:57:29,020 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 24 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-15 10:57:35,118 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.51 vs. limit=15.0 2024-08-15 10:57:36,245 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3148370.0, ans=0.2 2024-08-15 10:57:45,361 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3148470.0, ans=0.125 2024-08-15 10:58:06,792 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 12 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-15 10:58:23,618 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3148670.0, ans=0.05 2024-08-15 10:58:27,718 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3148670.0, ans=0.125 2024-08-15 10:58:29,508 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3148670.0, ans=0.125 2024-08-15 10:58:30,662 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 24 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-15 10:58:31,822 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 10550, loss[loss=0.1142, beats_loss=0.009984, ecapa_loss=0.0001495, whisper_loss=0.1027, over 18153.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01063, ecapa_loss=0.0001513, whisper_loss=0.09049, over 3878099.63 frames. ], batch size: 69, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:58:34,821 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.372e+01 2.650e+01 2.883e+01 3.926e+01, threshold=5.299e+01, percent-clipped=0.0 2024-08-15 10:58:50,667 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3148870.0, ans=0.0 2024-08-15 10:58:53,117 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3148870.0, ans=0.125 2024-08-15 10:58:56,014 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3148870.0, ans=0.125 2024-08-15 10:59:12,092 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3148970.0, ans=0.1 2024-08-15 10:59:18,640 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.24 vs. limit=15.0 2024-08-15 10:59:19,525 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 24 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-15 10:59:21,640 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 21 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-15 10:59:23,757 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.61 vs. limit=15.0 2024-08-15 10:59:28,366 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.99 vs. limit=12.0 2024-08-15 10:59:34,635 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-15 10:59:37,488 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 21 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-15 10:59:38,507 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3149170.0, ans=0.1 2024-08-15 10:59:49,026 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 10600, loss[loss=0.1053, beats_loss=0.01053, ecapa_loss=0.0001478, whisper_loss=0.09332, over 22612.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01068, ecapa_loss=0.0001515, whisper_loss=0.09028, over 3929605.02 frames. ], batch size: 90, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:59:52,283 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3149270.0, ans=0.1 2024-08-15 11:00:19,439 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3149470.0, ans=0.2 2024-08-15 11:00:42,633 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3149570.0, ans=0.125 2024-08-15 11:00:51,665 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=3149670.0, ans=0.95 2024-08-15 11:01:06,367 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 10650, loss[loss=0.1051, beats_loss=0.01193, ecapa_loss=0.00012, whisper_loss=0.09199, over 22756.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01061, ecapa_loss=0.0001508, whisper_loss=0.09041, over 3916040.31 frames. ], batch size: 88, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:01:06,724 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3149770.0, ans=0.125 2024-08-15 11:01:06,799 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3149770.0, ans=0.125 2024-08-15 11:01:09,329 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.988e+01 2.413e+01 2.629e+01 2.898e+01 3.897e+01, threshold=5.257e+01, percent-clipped=0.0 2024-08-15 11:01:28,026 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3149870.0, ans=0.1 2024-08-15 11:02:04,681 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 33 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-15 11:02:05,258 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.24 vs. limit=22.5 2024-08-15 11:02:12,416 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-15 11:02:22,821 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 20 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-15 11:02:24,047 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.86 vs. limit=15.0 2024-08-15 11:02:30,768 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 10700, loss[loss=0.1047, beats_loss=0.01084, ecapa_loss=0.0001222, whisper_loss=0.09259, over 21532.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01064, ecapa_loss=0.0001503, whisper_loss=0.08986, over 3925005.88 frames. ], batch size: 83, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:02:46,187 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3150370.0, ans=0.0 2024-08-15 11:02:47,664 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3150370.0, ans=0.125 2024-08-15 11:02:50,113 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 25 from LS+wenet, 33 from Vox, 32 fro AS 2024-08-15 11:02:56,368 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3150370.0, ans=0.2 2024-08-15 11:03:03,862 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.01 vs. limit=15.0 2024-08-15 11:03:10,568 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3150470.0, ans=0.2 2024-08-15 11:03:23,034 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 27 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-15 11:03:30,651 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 27 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-15 11:03:39,052 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 20 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-15 11:03:44,586 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 10750, loss[loss=0.1185, beats_loss=0.007837, ecapa_loss=0.0002169, whisper_loss=0.1085, over 19126.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01056, ecapa_loss=0.0001504, whisper_loss=0.091, over 3937911.21 frames. ], batch size: 81, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:03:47,410 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.904e+01 2.262e+01 2.469e+01 2.772e+01 4.273e+01, threshold=4.939e+01, percent-clipped=0.0 2024-08-15 11:04:02,279 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 37 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-15 11:04:25,897 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3150970.0, ans=0.125 2024-08-15 11:04:27,144 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=3151070.0, ans=0.05 2024-08-15 11:04:28,560 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3151070.0, ans=0.125 2024-08-15 11:04:51,814 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=3151170.0, ans=6.0 2024-08-15 11:04:58,155 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 10800, loss[loss=0.1122, beats_loss=0.00966, ecapa_loss=0.0001595, whisper_loss=0.101, over 20723.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01058, ecapa_loss=0.00015, whisper_loss=0.09158, over 3933907.02 frames. ], batch size: 84, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:05:00,632 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.83 vs. limit=15.0 2024-08-15 11:05:06,114 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3151270.0, ans=10.0 2024-08-15 11:05:16,060 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 28 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-15 11:05:18,699 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.05 vs. limit=15.0 2024-08-15 11:05:44,625 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 21 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-15 11:06:02,936 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3151570.0, ans=0.07 2024-08-15 11:06:18,043 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3151670.0, ans=0.1 2024-08-15 11:06:22,950 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 10850, loss[loss=0.1063, beats_loss=0.008093, ecapa_loss=0.0001178, whisper_loss=0.09704, over 15954.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01053, ecapa_loss=0.0001501, whisper_loss=0.09226, over 3966686.79 frames. ], batch size: 55, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:06:25,481 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 20 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-15 11:06:26,989 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.371e+01 2.566e+01 2.885e+01 4.578e+01, threshold=5.132e+01, percent-clipped=0.0 2024-08-15 11:07:07,266 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3151970.0, ans=0.0 2024-08-15 11:07:31,911 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.80 vs. limit=15.0 2024-08-15 11:07:44,718 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 10900, loss[loss=0.09012, beats_loss=0.01233, ecapa_loss=0.000155, whisper_loss=0.07623, over 16810.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01059, ecapa_loss=0.0001497, whisper_loss=0.09169, over 3948684.83 frames. ], batch size: 67, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:07:55,937 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 25 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-15 11:08:05,308 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 18 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-15 11:08:26,962 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 15 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-15 11:08:46,326 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.65 vs. limit=15.0 2024-08-15 11:09:04,457 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3152670.0, ans=0.1 2024-08-15 11:09:07,102 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 20 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-15 11:09:09,554 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 10950, loss[loss=0.08696, beats_loss=0.01457, ecapa_loss=0.000128, whisper_loss=0.07111, over 21703.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01066, ecapa_loss=0.0001496, whisper_loss=0.09126, over 3965858.85 frames. ], batch size: 90, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:09:12,167 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.969e+01 2.384e+01 2.656e+01 2.933e+01 4.855e+01, threshold=5.312e+01, percent-clipped=0.0 2024-08-15 11:09:32,801 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 27 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-15 11:09:53,346 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3152970.0, ans=0.0 2024-08-15 11:09:59,048 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 34 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-15 11:10:25,262 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 11000, loss[loss=0.08091, beats_loss=0.01217, ecapa_loss=0.0001611, whisper_loss=0.06712, over 13496.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01066, ecapa_loss=0.0001503, whisper_loss=0.09105, over 3961075.07 frames. ], batch size: 55, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:10:32,354 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.25 vs. limit=15.0 2024-08-15 11:10:32,938 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 19 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-15 11:11:01,048 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3153470.0, ans=0.125 2024-08-15 11:11:13,780 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3153570.0, ans=0.1 2024-08-15 11:11:16,558 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3153570.0, ans=0.125 2024-08-15 11:11:23,147 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3153670.0, ans=0.125 2024-08-15 11:11:28,742 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3153670.0, ans=0.125 2024-08-15 11:11:31,504 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3153670.0, ans=0.2 2024-08-15 11:11:37,328 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 18 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-15 11:11:38,499 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 11050, loss[loss=0.09235, beats_loss=0.01319, ecapa_loss=0.0001167, whisper_loss=0.078, over 18995.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01068, ecapa_loss=0.0001496, whisper_loss=0.0902, over 3946410.16 frames. ], batch size: 75, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:11:41,496 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.292e+01 2.575e+01 2.942e+01 2.806e+02, threshold=5.150e+01, percent-clipped=2.0 2024-08-15 11:11:46,483 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 16 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-15 11:11:46,756 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3153770.0, ans=0.2 2024-08-15 11:11:51,096 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 19 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-15 11:12:22,103 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.55 vs. limit=12.0 2024-08-15 11:12:51,863 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3154170.0, ans=0.1 2024-08-15 11:13:00,772 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 11100, loss[loss=0.1186, beats_loss=0.009448, ecapa_loss=0.0001279, whisper_loss=0.1079, over 15897.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0107, ecapa_loss=0.0001497, whisper_loss=0.09035, over 3957390.78 frames. ], batch size: 58, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:13:06,906 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3154270.0, ans=0.125 2024-08-15 11:13:08,787 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.56 vs. limit=15.0 2024-08-15 11:13:10,377 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.72 vs. limit=15.0 2024-08-15 11:13:33,596 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 30 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-15 11:13:59,807 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.92 vs. limit=15.0 2024-08-15 11:14:09,289 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 20 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-15 11:14:16,355 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 11150, loss[loss=0.1182, beats_loss=0.01029, ecapa_loss=0.000122, whisper_loss=0.1067, over 22840.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01061, ecapa_loss=0.0001488, whisper_loss=0.09095, over 3948597.09 frames. ], batch size: 87, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:14:18,199 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 36 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-15 11:14:19,203 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.773e+01 2.361e+01 2.547e+01 2.785e+01 4.285e+01, threshold=5.094e+01, percent-clipped=0.0 2024-08-15 11:14:32,939 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.09 vs. limit=15.0 2024-08-15 11:14:43,172 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 21 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-15 11:15:10,287 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3155070.0, ans=0.125 2024-08-15 11:15:11,768 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3155070.0, ans=0.0 2024-08-15 11:15:15,078 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.91 vs. limit=22.5 2024-08-15 11:15:22,646 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 15 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-15 11:15:23,128 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3155170.0, ans=0.125 2024-08-15 11:15:25,767 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 22 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-15 11:15:31,182 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 11200, loss[loss=0.1262, beats_loss=0.008592, ecapa_loss=0.0001524, whisper_loss=0.1161, over 22680.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01056, ecapa_loss=0.000149, whisper_loss=0.09109, over 3937700.36 frames. ], batch size: 88, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:15:55,912 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3155370.0, ans=0.125 2024-08-15 11:16:04,366 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 34 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-15 11:16:05,126 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.07 vs. limit=22.5 2024-08-15 11:16:05,163 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.31 vs. limit=10.0 2024-08-15 11:16:16,902 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 25 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-15 11:16:20,471 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.94 vs. limit=15.0 2024-08-15 11:16:27,011 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3155570.0, ans=0.125 2024-08-15 11:16:43,220 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3155770.0, ans=0.125 2024-08-15 11:16:43,990 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 11250, loss[loss=0.09646, beats_loss=0.007604, ecapa_loss=0.0002054, whisper_loss=0.0868, over 15878.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01052, ecapa_loss=0.0001499, whisper_loss=0.09118, over 3911401.69 frames. ], batch size: 67, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:16:46,928 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.971e+01 2.380e+01 2.622e+01 3.019e+01 1.107e+02, threshold=5.243e+01, percent-clipped=1.0 2024-08-15 11:17:04,630 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.23 vs. limit=22.5 2024-08-15 11:17:10,103 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3155870.0, ans=0.125 2024-08-15 11:17:10,215 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3155870.0, ans=0.125 2024-08-15 11:17:13,213 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 22 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-15 11:17:32,761 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=5.903e+00 2024-08-15 11:17:34,245 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3156070.0, ans=0.2 2024-08-15 11:17:35,271 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 26 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-15 11:17:53,720 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3156170.0, ans=0.125 2024-08-15 11:18:00,541 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 11300, loss[loss=0.1203, beats_loss=0.007755, ecapa_loss=0.0001967, whisper_loss=0.1106, over 17099.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01045, ecapa_loss=0.0001504, whisper_loss=0.09131, over 3887094.20 frames. ], batch size: 64, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:18:10,094 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 25 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-15 11:19:13,677 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 26 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-15 11:19:15,880 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3156670.0, ans=0.0 2024-08-15 11:19:26,087 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 11350, loss[loss=0.07411, beats_loss=0.01253, ecapa_loss=0.0001236, whisper_loss=0.06034, over 15023.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01036, ecapa_loss=0.00015, whisper_loss=0.09148, over 3871863.39 frames. ], batch size: 60, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:19:29,192 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.900e+01 2.374e+01 2.563e+01 2.940e+01 7.855e+01, threshold=5.126e+01, percent-clipped=1.0 2024-08-15 11:19:48,342 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-15 11:20:12,402 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3157070.0, ans=0.2 2024-08-15 11:20:28,862 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 20 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-15 11:20:39,069 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-15 11:20:40,172 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 11400, loss[loss=0.1084, beats_loss=0.009204, ecapa_loss=0.0001434, whisper_loss=0.0978, over 22593.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01038, ecapa_loss=0.0001492, whisper_loss=0.09164, over 3840279.20 frames. ], batch size: 89, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:20:48,670 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3157270.0, ans=0.07 2024-08-15 11:21:04,761 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3157370.0, ans=0.2 2024-08-15 11:21:10,840 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3157470.0, ans=0.0 2024-08-15 11:21:18,626 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 20 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-15 11:21:18,960 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3157470.0, ans=0.1 2024-08-15 11:21:23,031 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3157470.0, ans=0.07 2024-08-15 11:21:35,246 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.85 vs. limit=22.5 2024-08-15 11:21:57,656 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3157670.0, ans=0.125 2024-08-15 11:22:01,580 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 11450, loss[loss=0.1029, beats_loss=0.01193, ecapa_loss=0.0001449, whisper_loss=0.08953, over 22236.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01042, ecapa_loss=0.0001499, whisper_loss=0.09186, over 3842576.52 frames. ], batch size: 91, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:22:04,386 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.396e+01 2.613e+01 2.879e+01 7.410e+02, threshold=5.227e+01, percent-clipped=0.0 2024-08-15 11:22:04,387 WARNING [optim.py:496] (1/4) Scaling gradients by 0.07053599506616592, model_norm_threshold=52.26521682739258 2024-08-15 11:22:04,562 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.2.out_combiner.bypass_scale with proportion 0.10, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=5.421e+04, grad_sumsq=9.420e+04, orig_rms_sq=5.754e-01 2024-08-15 11:22:12,386 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3157770.0, ans=0.125 2024-08-15 11:22:20,918 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 23 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-15 11:22:32,123 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3157870.0, ans=0.125 2024-08-15 11:22:36,557 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 18 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-15 11:22:43,702 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3157970.0, ans=0.2 2024-08-15 11:22:48,731 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 16 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-15 11:23:09,009 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3158170.0, ans=0.0 2024-08-15 11:23:22,817 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.09 vs. limit=22.5 2024-08-15 11:23:23,247 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 11500, loss[loss=0.1135, beats_loss=0.009536, ecapa_loss=0.0001598, whisper_loss=0.1023, over 21882.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01044, ecapa_loss=0.0001502, whisper_loss=0.09164, over 3858500.94 frames. ], batch size: 89, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:23:57,059 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 20 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-15 11:24:03,115 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.35 vs. limit=22.5 2024-08-15 11:24:12,846 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-15 11:24:22,353 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3158570.0, ans=0.125 2024-08-15 11:24:31,210 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3158670.0, ans=0.07 2024-08-15 11:24:33,806 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 22 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-15 11:24:40,601 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 11550, loss[loss=0.1149, beats_loss=0.01024, ecapa_loss=0.0001363, whisper_loss=0.1033, over 18685.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.0104, ecapa_loss=0.0001497, whisper_loss=0.09204, over 3862433.01 frames. ], batch size: 72, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:24:41,760 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.08 vs. limit=15.0 2024-08-15 11:24:44,456 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.899e+01 2.411e+01 2.579e+01 2.880e+01 5.127e+01, threshold=5.159e+01, percent-clipped=1.0 2024-08-15 11:24:49,332 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 30 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-15 11:24:50,537 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-15 11:24:55,513 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3158870.0, ans=0.125 2024-08-15 11:24:57,278 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.87 vs. limit=22.5 2024-08-15 11:25:03,251 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 26 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-15 11:25:11,403 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3158970.0, ans=0.125 2024-08-15 11:25:24,599 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3158970.0, ans=0.0 2024-08-15 11:25:43,205 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3159170.0, ans=0.125 2024-08-15 11:25:43,245 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3159170.0, ans=0.0 2024-08-15 11:25:44,634 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3159170.0, ans=0.1 2024-08-15 11:25:48,707 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 28 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-15 11:25:54,227 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3159170.0, ans=0.0 2024-08-15 11:25:57,934 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 11600, loss[loss=0.1209, beats_loss=0.009905, ecapa_loss=0.0001475, whisper_loss=0.1095, over 18511.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01046, ecapa_loss=0.00015, whisper_loss=0.09157, over 3887928.34 frames. ], batch size: 74, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:26:13,797 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-15 11:26:22,844 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 36 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-15 11:26:32,113 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 18 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-15 11:26:44,692 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3159570.0, ans=0.125 2024-08-15 11:26:54,731 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 17 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-15 11:27:16,701 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 11650, loss[loss=0.1069, beats_loss=0.01346, ecapa_loss=0.0001369, whisper_loss=0.09207, over 17351.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01041, ecapa_loss=0.000151, whisper_loss=0.09164, over 3874908.09 frames. ], batch size: 70, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:27:19,968 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.980e+01 2.447e+01 2.693e+01 2.991e+01 1.020e+02, threshold=5.386e+01, percent-clipped=2.0 2024-08-15 11:27:26,819 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3159770.0, ans=0.1 2024-08-15 11:27:28,014 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 33 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-15 11:27:36,150 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3159870.0, ans=0.0 2024-08-15 11:27:38,658 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-15 11:27:49,202 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3159970.0, ans=0.0 2024-08-15 11:28:15,601 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3160070.0, ans=0.1 2024-08-15 11:28:24,776 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 24 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-15 11:28:27,799 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3160170.0, ans=0.125 2024-08-15 11:28:29,596 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=3160170.0, ans=22.5 2024-08-15 11:28:32,131 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3160170.0, ans=0.125 2024-08-15 11:28:32,524 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.67 vs. limit=15.0 2024-08-15 11:28:33,906 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.86 vs. limit=15.0 2024-08-15 11:28:34,465 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 11700, loss[loss=0.124, beats_loss=0.009736, ecapa_loss=0.000183, whisper_loss=0.1125, over 22951.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0105, ecapa_loss=0.000151, whisper_loss=0.09136, over 3922535.18 frames. ], batch size: 91, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:28:50,994 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 30 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-15 11:29:48,604 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 11750, loss[loss=0.09895, beats_loss=0.01288, ecapa_loss=0.0001059, whisper_loss=0.08501, over 15654.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01057, ecapa_loss=0.0001512, whisper_loss=0.09097, over 3938921.56 frames. ], batch size: 59, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:29:52,022 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.895e+01 2.446e+01 2.685e+01 3.012e+01 3.635e+02, threshold=5.370e+01, percent-clipped=2.0 2024-08-15 11:30:11,786 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3160870.0, ans=0.125 2024-08-15 11:30:29,636 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3160970.0, ans=0.05 2024-08-15 11:30:34,266 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3161070.0, ans=0.0 2024-08-15 11:30:46,014 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3161070.0, ans=0.0 2024-08-15 11:31:03,125 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 11800, loss[loss=0.1044, beats_loss=0.0109, ecapa_loss=0.0001699, whisper_loss=0.09183, over 19707.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01063, ecapa_loss=0.00015, whisper_loss=0.09065, over 3904083.99 frames. ], batch size: 81, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:31:07,685 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 23 from LS+wenet, 29 from Vox, 40 fro AS 2024-08-15 11:31:09,187 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 26 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-15 11:31:10,663 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-15 11:31:29,877 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3161370.0, ans=0.0 2024-08-15 11:31:32,714 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3161470.0, ans=0.0 2024-08-15 11:31:51,463 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3161570.0, ans=0.125 2024-08-15 11:31:52,936 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3161570.0, ans=0.2 2024-08-15 11:32:01,550 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3161670.0, ans=0.125 2024-08-15 11:32:04,601 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3161670.0, ans=0.1 2024-08-15 11:32:04,685 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3161670.0, ans=0.125 2024-08-15 11:32:13,108 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3161670.0, ans=0.0 2024-08-15 11:32:15,555 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 11850, loss[loss=0.1058, beats_loss=0.009156, ecapa_loss=0.0001839, whisper_loss=0.09482, over 17517.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01066, ecapa_loss=0.0001498, whisper_loss=0.09023, over 3916297.99 frames. ], batch size: 72, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:32:17,965 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.442e+01 2.720e+01 2.983e+01 2.168e+02, threshold=5.440e+01, percent-clipped=2.0 2024-08-15 11:32:19,513 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 25 from LS+wenet, 7 from Vox, 34 fro AS 2024-08-15 11:32:25,402 WARNING [optim.py:496] (1/4) Scaling gradients by 0.029826095327734947, model_norm_threshold=54.40060806274414 2024-08-15 11:32:25,583 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.2.self_attn_weights.linear_pos.weight with proportion 0.21, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.872e+05, grad_sumsq=7.654e+04, orig_rms_sq=8.977e+00 2024-08-15 11:32:26,774 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.70 vs. limit=15.0 2024-08-15 11:32:38,124 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3161870.0, ans=0.2 2024-08-15 11:32:41,135 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3161870.0, ans=0.0 2024-08-15 11:32:48,830 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.50 vs. limit=12.0 2024-08-15 11:32:59,685 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 22 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-15 11:33:01,431 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3162070.0, ans=0.2 2024-08-15 11:33:13,366 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3162170.0, ans=0.125 2024-08-15 11:33:23,699 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3162170.0, ans=0.125 2024-08-15 11:33:26,324 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 24 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-15 11:33:29,027 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 11900, loss[loss=0.07614, beats_loss=0.01336, ecapa_loss=0.000151, whisper_loss=0.06128, over 21702.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01067, ecapa_loss=0.0001506, whisper_loss=0.09033, over 3920383.23 frames. ], batch size: 90, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:34:07,387 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 15 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-15 11:34:08,962 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3162470.0, ans=0.0 2024-08-15 11:34:10,045 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 38 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-15 11:34:14,647 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 20 from LS+wenet, 27 from Vox, 22 fro AS 2024-08-15 11:34:17,529 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 17 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-15 11:34:19,395 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3162570.0, ans=0.0 2024-08-15 11:34:24,050 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.56 vs. limit=6.0 2024-08-15 11:34:31,978 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 16 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-15 11:34:43,955 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 11950, loss[loss=0.09248, beats_loss=0.01221, ecapa_loss=0.0001226, whisper_loss=0.07905, over 18582.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01061, ecapa_loss=0.0001503, whisper_loss=0.09028, over 3883559.37 frames. ], batch size: 73, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:34:46,928 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.257e+01 2.477e+01 2.736e+01 1.824e+03, threshold=4.954e+01, percent-clipped=1.0 2024-08-15 11:35:13,516 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3162970.0, ans=0.125 2024-08-15 11:35:22,135 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3162970.0, ans=0.125 2024-08-15 11:35:35,398 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 23 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-15 11:35:53,870 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3163170.0, ans=0.0 2024-08-15 11:35:54,075 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.77 vs. limit=15.0 2024-08-15 11:35:57,244 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 12000, loss[loss=0.08492, beats_loss=0.0105, ecapa_loss=0.0001307, whisper_loss=0.07312, over 18751.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01057, ecapa_loss=0.0001505, whisper_loss=0.09029, over 3870762.82 frames. ], batch size: 72, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:35:57,244 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-15 11:36:35,803 INFO [train_multi_KD3.py:1149] (1/4) Epoch 22, validation on ASR_libri: loss=0.2516, beats_loss=0, ecapa_loss=0.0005396, whisper_loss=0.2462, over 922467.00 frames. 2024-08-15 11:36:55,973 INFO [train_multi_KD3.py:1149] (1/4) Epoch 22, validation on SV_voxceleb1: loss=0.004196, beats_loss=0, ecapa_loss=0.0004196, whisper_loss=0, over 939242.00 frames. 2024-08-15 11:38:26,596 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.1228, 3.2184, 3.2863, 3.1344], device='cuda:1') 2024-08-15 11:38:51,375 INFO [train_multi_KD3.py:1149] (1/4) Epoch 22, validation on AT_audioset: loss=0.02333, beats_loss=0.02333, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 11:38:51,378 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 30639MB 2024-08-15 11:38:55,172 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3163270.0, ans=0.2 2024-08-15 11:39:02,203 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-15 11:39:04,577 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.79 vs. limit=15.0 2024-08-15 11:39:09,809 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 27 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-15 11:39:21,266 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 22 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-15 11:39:21,900 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.42 vs. limit=15.0 2024-08-15 11:39:28,652 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 15 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-15 11:39:31,453 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 16 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-15 11:39:46,557 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-15 11:39:51,083 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3163670.0, ans=0.1 2024-08-15 11:40:04,839 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 12050, loss[loss=0.09516, beats_loss=0.01183, ecapa_loss=0.0001365, whisper_loss=0.08196, over 13828.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01055, ecapa_loss=0.0001509, whisper_loss=0.09024, over 3838566.29 frames. ], batch size: 56, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:40:07,898 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.917e+01 2.427e+01 2.583e+01 3.021e+01 1.024e+02, threshold=5.165e+01, percent-clipped=2.0 2024-08-15 11:40:14,370 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.869e+00 2024-08-15 11:40:21,559 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3163870.0, ans=0.1 2024-08-15 11:40:21,581 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3163870.0, ans=0.125 2024-08-15 11:40:23,255 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3163870.0, ans=0.125 2024-08-15 11:41:13,513 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3164170.0, ans=0.125 2024-08-15 11:41:14,827 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3164170.0, ans=0.125 2024-08-15 11:41:18,355 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3164270.0, ans=0.1 2024-08-15 11:41:19,161 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 12100, loss[loss=0.1046, beats_loss=0.01162, ecapa_loss=0.000155, whisper_loss=0.09146, over 17254.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01055, ecapa_loss=0.0001515, whisper_loss=0.09007, over 3825232.51 frames. ], batch size: 71, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:41:35,622 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.23 vs. limit=15.0 2024-08-15 11:41:39,813 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.76 vs. limit=15.0 2024-08-15 11:41:42,404 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3164370.0, ans=0.0 2024-08-15 11:41:53,647 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 19 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-15 11:42:02,895 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3164570.0, ans=0.125 2024-08-15 11:42:03,173 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=3164570.0, ans=15.0 2024-08-15 11:42:08,808 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3164570.0, ans=0.125 2024-08-15 11:42:14,664 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3164570.0, ans=0.125 2024-08-15 11:42:25,369 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=9.60 vs. limit=12.0 2024-08-15 11:42:31,612 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 12150, loss[loss=0.1157, beats_loss=0.0111, ecapa_loss=0.0001481, whisper_loss=0.1032, over 19756.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01058, ecapa_loss=0.0001515, whisper_loss=0.0901, over 3852937.39 frames. ], batch size: 79, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:42:31,878 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 16 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-15 11:42:34,332 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.817e+01 2.205e+01 2.453e+01 2.798e+01 9.875e+01, threshold=4.907e+01, percent-clipped=1.0 2024-08-15 11:42:45,766 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3164870.0, ans=0.125 2024-08-15 11:43:13,575 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-15 11:43:17,146 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3165070.0, ans=0.125 2024-08-15 11:43:18,209 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 28 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-15 11:43:26,005 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.19 vs. limit=15.0 2024-08-15 11:43:31,684 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3165170.0, ans=0.0 2024-08-15 11:43:32,002 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.24 vs. limit=15.0 2024-08-15 11:43:39,046 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 30 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-15 11:43:46,726 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 12200, loss[loss=0.09511, beats_loss=0.01082, ecapa_loss=0.0001421, whisper_loss=0.08287, over 21388.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01056, ecapa_loss=0.0001508, whisper_loss=0.09031, over 3868274.51 frames. ], batch size: 85, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:43:53,017 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 23 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-15 11:44:19,467 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3165470.0, ans=0.0 2024-08-15 11:44:24,517 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.67 vs. limit=15.0 2024-08-15 11:44:27,512 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.69 vs. limit=15.0 2024-08-15 11:44:29,793 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 24 from LS+wenet, 33 from Vox, 31 fro AS 2024-08-15 11:44:55,184 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3165670.0, ans=0.0 2024-08-15 11:45:01,761 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 12250, loss[loss=0.1057, beats_loss=0.01049, ecapa_loss=0.0001621, whisper_loss=0.09361, over 21033.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01052, ecapa_loss=0.0001514, whisper_loss=0.09036, over 3846622.25 frames. ], batch size: 90, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:45:04,731 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.957e+01 2.418e+01 2.740e+01 3.244e+01 5.356e+01, threshold=5.480e+01, percent-clipped=1.0 2024-08-15 11:46:02,076 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3166170.0, ans=0.125 2024-08-15 11:46:16,380 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 12300, loss[loss=0.1006, beats_loss=0.01033, ecapa_loss=0.0001653, whisper_loss=0.08862, over 19590.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01053, ecapa_loss=0.0001512, whisper_loss=0.09039, over 3843747.10 frames. ], batch size: 80, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:46:30,313 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.72 vs. limit=10.0 2024-08-15 11:46:31,044 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 13 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-15 11:46:35,661 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 20 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-15 11:46:37,212 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3166370.0, ans=0.125 2024-08-15 11:46:39,123 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3166370.0, ans=0.125 2024-08-15 11:46:40,072 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 40 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-15 11:46:44,968 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.06 vs. limit=15.0 2024-08-15 11:46:47,735 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3166470.0, ans=0.0 2024-08-15 11:46:50,598 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-15 11:46:57,511 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 24 from LS+wenet, 29 from Vox, 40 fro AS 2024-08-15 11:47:10,819 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.050e-02 2024-08-15 11:47:12,184 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3166570.0, ans=0.0 2024-08-15 11:47:22,338 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3166670.0, ans=0.05 2024-08-15 11:47:29,264 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 12350, loss[loss=0.1017, beats_loss=0.009757, ecapa_loss=0.0001517, whisper_loss=0.09038, over 15592.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01053, ecapa_loss=0.0001527, whisper_loss=0.09035, over 3831480.19 frames. ], batch size: 61, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:47:32,264 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.941e+01 2.362e+01 2.585e+01 2.912e+01 4.342e+01, threshold=5.170e+01, percent-clipped=0.0 2024-08-15 11:47:37,510 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3166770.0, ans=0.125 2024-08-15 11:47:59,618 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 13 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-15 11:48:02,481 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 19 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-15 11:48:09,531 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.18 vs. limit=22.5 2024-08-15 11:48:11,797 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 20 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-15 11:48:19,520 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.16 vs. limit=15.0 2024-08-15 11:48:28,186 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 33 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-15 11:48:29,920 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 24 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-15 11:48:43,655 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 12400, loss[loss=0.1056, beats_loss=0.01151, ecapa_loss=0.0001464, whisper_loss=0.0926, over 21496.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01058, ecapa_loss=0.0001513, whisper_loss=0.09055, over 3854874.30 frames. ], batch size: 87, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:48:44,440 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3167270.0, ans=0.125 2024-08-15 11:48:45,758 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-15 11:48:46,046 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3167270.0, ans=0.125 2024-08-15 11:48:50,504 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3167270.0, ans=0.0 2024-08-15 11:49:21,475 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 31 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-15 11:49:42,989 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.06 vs. limit=22.5 2024-08-15 11:49:45,534 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3167670.0, ans=0.125 2024-08-15 11:49:58,065 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 12450, loss[loss=0.1225, beats_loss=0.007844, ecapa_loss=0.0001556, whisper_loss=0.1131, over 15088.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01057, ecapa_loss=0.0001518, whisper_loss=0.09042, over 3852691.71 frames. ], batch size: 56, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:50:01,264 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.961e+01 2.281e+01 2.553e+01 2.853e+01 4.118e+01, threshold=5.106e+01, percent-clipped=0.0 2024-08-15 11:50:09,176 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3167770.0, ans=0.0 2024-08-15 11:50:18,160 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3167870.0, ans=0.1 2024-08-15 11:50:23,989 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3167870.0, ans=0.0 2024-08-15 11:50:33,850 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 20 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-15 11:50:37,483 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=3167970.0, ans=6.0 2024-08-15 11:51:09,559 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3168170.0, ans=0.04949747468305833 2024-08-15 11:51:11,608 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 12500, loss[loss=0.1136, beats_loss=0.009408, ecapa_loss=0.0001277, whisper_loss=0.1029, over 20126.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01061, ecapa_loss=0.0001505, whisper_loss=0.09066, over 3894026.49 frames. ], batch size: 76, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:51:17,112 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.06 vs. limit=15.0 2024-08-15 11:51:41,622 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3168470.0, ans=0.1 2024-08-15 11:52:11,719 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3168670.0, ans=0.2 2024-08-15 11:52:15,189 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3168670.0, ans=0.09899494936611666 2024-08-15 11:52:17,978 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3168670.0, ans=0.1 2024-08-15 11:52:26,417 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 12550, loss[loss=0.1005, beats_loss=0.01144, ecapa_loss=0.0001622, whisper_loss=0.08741, over 16277.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01062, ecapa_loss=0.0001511, whisper_loss=0.09021, over 3889080.46 frames. ], batch size: 66, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:52:29,386 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.896e+01 2.452e+01 2.757e+01 2.941e+01 1.392e+02, threshold=5.513e+01, percent-clipped=1.0 2024-08-15 11:52:30,127 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3168770.0, ans=0.125 2024-08-15 11:52:42,320 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3168870.0, ans=0.0 2024-08-15 11:52:44,675 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 19 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-15 11:53:12,903 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=3169070.0, ans=10.0 2024-08-15 11:53:17,505 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3169070.0, ans=0.125 2024-08-15 11:53:17,651 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.67 vs. limit=15.0 2024-08-15 11:53:21,424 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 20 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-15 11:53:26,876 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-15 11:53:33,942 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-15 11:53:40,991 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 12600, loss[loss=0.08567, beats_loss=0.01112, ecapa_loss=0.0001485, whisper_loss=0.07306, over 17593.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01059, ecapa_loss=0.0001509, whisper_loss=0.09108, over 3894525.25 frames. ], batch size: 72, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:53:41,509 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3169270.0, ans=0.0 2024-08-15 11:53:46,625 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.49 vs. limit=12.0 2024-08-15 11:53:52,412 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.67 vs. limit=12.0 2024-08-15 11:53:53,411 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3169270.0, ans=0.0 2024-08-15 11:54:08,009 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3169370.0, ans=0.125 2024-08-15 11:54:23,244 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3169470.0, ans=0.125 2024-08-15 11:54:56,000 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 12650, loss[loss=0.1121, beats_loss=0.01059, ecapa_loss=0.0001504, whisper_loss=0.09997, over 22495.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01062, ecapa_loss=0.0001509, whisper_loss=0.09074, over 3895751.15 frames. ], batch size: 88, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:54:58,128 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3169770.0, ans=0.125 2024-08-15 11:54:58,955 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.896e+01 2.444e+01 2.686e+01 2.954e+01 5.186e+01, threshold=5.373e+01, percent-clipped=0.0 2024-08-15 11:54:59,539 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3169770.0, ans=0.1 2024-08-15 11:55:08,128 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 11:55:24,342 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3169970.0, ans=0.5 2024-08-15 11:55:24,466 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3169970.0, ans=0.125 2024-08-15 11:55:34,181 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3169970.0, ans=0.0 2024-08-15 11:55:35,197 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 19 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-15 11:55:47,605 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3170070.0, ans=0.2 2024-08-15 11:55:49,963 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 20 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-15 11:55:51,384 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 19 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-15 11:55:54,409 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-15 11:55:56,108 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.31 vs. limit=15.0 2024-08-15 11:55:58,846 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3170170.0, ans=0.1 2024-08-15 11:56:09,280 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 12700, loss[loss=0.09124, beats_loss=0.01403, ecapa_loss=0.0001175, whisper_loss=0.07603, over 21461.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0107, ecapa_loss=0.0001499, whisper_loss=0.09073, over 3867964.84 frames. ], batch size: 88, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:56:16,803 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 26 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-15 11:56:30,467 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.47 vs. limit=12.0 2024-08-15 11:56:32,590 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 15 from Vox, 48 fro AS 2024-08-15 11:56:32,902 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3170370.0, ans=0.125 2024-08-15 11:56:42,629 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 14 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-15 11:56:51,351 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 26 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-15 11:56:57,413 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 25 from LS+wenet, 29 from Vox, 24 fro AS 2024-08-15 11:57:02,425 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3170570.0, ans=0.04949747468305833 2024-08-15 11:57:22,267 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 12750, loss[loss=0.09019, beats_loss=0.01253, ecapa_loss=0.0001163, whisper_loss=0.0765, over 17917.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01076, ecapa_loss=0.0001499, whisper_loss=0.09041, over 3891689.38 frames. ], batch size: 70, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:57:22,940 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3170770.0, ans=0.2 2024-08-15 11:57:25,232 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.911e+01 2.276e+01 2.433e+01 2.763e+01 4.017e+01, threshold=4.866e+01, percent-clipped=0.0 2024-08-15 11:57:55,719 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3170970.0, ans=0.125 2024-08-15 11:58:14,271 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-15 11:58:22,196 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3171070.0, ans=0.125 2024-08-15 11:58:26,431 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 25 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-15 11:58:35,496 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 11 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-15 11:58:37,424 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3171170.0, ans=0.125 2024-08-15 11:58:39,826 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 12800, loss[loss=0.09424, beats_loss=0.01217, ecapa_loss=0.0001039, whisper_loss=0.08103, over 15607.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01077, ecapa_loss=0.0001487, whisper_loss=0.09024, over 3897743.65 frames. ], batch size: 58, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:58:45,063 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 24 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-15 11:58:45,472 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3171270.0, ans=0.2 2024-08-15 11:58:59,907 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3171370.0, ans=0.125 2024-08-15 11:59:02,728 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3171370.0, ans=0.0 2024-08-15 11:59:15,209 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.91 vs. limit=22.5 2024-08-15 11:59:23,425 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=3171570.0, ans=0.95 2024-08-15 11:59:54,400 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 12850, loss[loss=0.1218, beats_loss=0.009321, ecapa_loss=0.0001668, whisper_loss=0.1108, over 17575.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01071, ecapa_loss=0.000151, whisper_loss=0.09049, over 3902699.56 frames. ], batch size: 69, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:59:57,432 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.257e+01 2.519e+01 2.816e+01 4.550e+01, threshold=5.038e+01, percent-clipped=0.0 2024-08-15 12:00:14,139 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 24 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-15 12:00:17,375 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=3171870.0, ans=0.2 2024-08-15 12:00:21,598 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-15 12:00:30,631 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3171970.0, ans=0.125 2024-08-15 12:00:30,660 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3171970.0, ans=0.125 2024-08-15 12:00:45,673 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3172070.0, ans=0.2 2024-08-15 12:01:04,267 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-15 12:01:08,156 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 12900, loss[loss=0.1239, beats_loss=0.008634, ecapa_loss=0.0001448, whisper_loss=0.1138, over 15150.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01082, ecapa_loss=0.0001515, whisper_loss=0.08946, over 3896352.52 frames. ], batch size: 57, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:01:08,665 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3172270.0, ans=0.0 2024-08-15 12:01:41,434 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3172470.0, ans=0.0 2024-08-15 12:01:41,729 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.05 vs. limit=15.0 2024-08-15 12:02:00,625 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3172570.0, ans=0.125 2024-08-15 12:02:04,693 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 35 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-15 12:02:15,675 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3172670.0, ans=0.07 2024-08-15 12:02:17,081 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.86 vs. limit=15.0 2024-08-15 12:02:21,812 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 12950, loss[loss=0.09301, beats_loss=0.01296, ecapa_loss=0.0001733, whisper_loss=0.07832, over 20016.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01079, ecapa_loss=0.0001525, whisper_loss=0.08952, over 3893795.41 frames. ], batch size: 86, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:02:24,289 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3172770.0, ans=0.1 2024-08-15 12:02:24,916 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.877e+01 2.275e+01 2.546e+01 2.873e+01 4.108e+01, threshold=5.092e+01, percent-clipped=0.0 2024-08-15 12:02:40,253 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3172870.0, ans=0.0 2024-08-15 12:02:42,754 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 26 from LS+wenet, 31 from Vox, 37 fro AS 2024-08-15 12:02:47,250 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3172870.0, ans=0.2 2024-08-15 12:02:50,274 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3172870.0, ans=0.125 2024-08-15 12:02:54,664 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3172970.0, ans=0.125 2024-08-15 12:03:06,516 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3173070.0, ans=0.1 2024-08-15 12:03:18,711 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.48 vs. limit=12.0 2024-08-15 12:03:20,079 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.655e-03 2024-08-15 12:03:30,358 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3173170.0, ans=0.025 2024-08-15 12:03:34,099 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 42 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-15 12:03:34,375 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3173170.0, ans=0.0 2024-08-15 12:03:37,114 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 13000, loss[loss=0.07792, beats_loss=0.01357, ecapa_loss=0.0001227, whisper_loss=0.06313, over 16765.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01071, ecapa_loss=0.0001526, whisper_loss=0.08982, over 3905400.55 frames. ], batch size: 69, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:04:05,251 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.23 vs. limit=15.0 2024-08-15 12:04:16,782 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=3173470.0, ans=0.05 2024-08-15 12:04:20,484 WARNING [optim.py:496] (1/4) Scaling gradients by 0.0589878149330616, model_norm_threshold=50.92251968383789 2024-08-15 12:04:20,654 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.49, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.672e+05, grad_sumsq=3.654e+07, orig_rms_sq=1.005e-02 2024-08-15 12:04:21,426 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3173570.0, ans=0.125 2024-08-15 12:04:39,358 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3173670.0, ans=0.1 2024-08-15 12:04:42,293 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3173670.0, ans=0.1 2024-08-15 12:04:48,283 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3173670.0, ans=0.125 2024-08-15 12:04:51,877 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 13050, loss[loss=0.07704, beats_loss=0.01352, ecapa_loss=0.0001248, whisper_loss=0.06227, over 17260.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01076, ecapa_loss=0.0001516, whisper_loss=0.08982, over 3890721.12 frames. ], batch size: 70, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:04:54,716 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.413e+01 2.554e+01 2.771e+01 8.633e+02, threshold=5.107e+01, percent-clipped=2.0 2024-08-15 12:05:06,918 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3173870.0, ans=0.125 2024-08-15 12:05:11,671 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.89 vs. limit=15.0 2024-08-15 12:05:20,248 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 26 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-15 12:05:32,462 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 30 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-15 12:05:37,596 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3174070.0, ans=0.0 2024-08-15 12:06:01,263 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3174170.0, ans=0.125 2024-08-15 12:06:06,672 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 13100, loss[loss=0.1033, beats_loss=0.01071, ecapa_loss=0.0001788, whisper_loss=0.09079, over 18617.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01068, ecapa_loss=0.0001512, whisper_loss=0.09022, over 3900341.90 frames. ], batch size: 79, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:06:34,460 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 17 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-15 12:07:20,593 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.75 vs. limit=8.0 2024-08-15 12:07:20,964 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 13150, loss[loss=0.09452, beats_loss=0.01288, ecapa_loss=0.0001063, whisper_loss=0.08057, over 16296.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01065, ecapa_loss=0.0001507, whisper_loss=0.09084, over 3892290.19 frames. ], batch size: 64, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:07:23,845 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.813e+01 2.322e+01 2.580e+01 2.894e+01 4.254e+01, threshold=5.159e+01, percent-clipped=0.0 2024-08-15 12:07:27,638 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3174770.0, ans=0.125 2024-08-15 12:07:33,273 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3174770.0, ans=0.0 2024-08-15 12:07:46,228 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-15 12:07:49,236 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3174970.0, ans=0.125 2024-08-15 12:07:50,997 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3174970.0, ans=0.1 2024-08-15 12:07:55,320 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 17 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-15 12:07:55,598 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3174970.0, ans=0.125 2024-08-15 12:08:05,590 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 16 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-15 12:08:34,115 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 13200, loss[loss=0.09377, beats_loss=0.01386, ecapa_loss=0.0001073, whisper_loss=0.07883, over 17050.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01059, ecapa_loss=0.0001505, whisper_loss=0.0915, over 3899647.10 frames. ], batch size: 67, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:08:37,942 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3175270.0, ans=0.125 2024-08-15 12:08:45,506 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3175270.0, ans=0.0 2024-08-15 12:08:59,775 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3175370.0, ans=0.0 2024-08-15 12:09:02,857 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3175370.0, ans=0.0 2024-08-15 12:09:07,353 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3175470.0, ans=0.125 2024-08-15 12:09:08,661 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3175470.0, ans=0.125 2024-08-15 12:09:38,042 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 23 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-15 12:09:50,263 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 13250, loss[loss=0.1147, beats_loss=0.01128, ecapa_loss=0.0001197, whisper_loss=0.1022, over 23192.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01058, ecapa_loss=0.0001507, whisper_loss=0.09131, over 3868124.27 frames. ], batch size: 90, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:09:52,330 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 36 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-15 12:09:53,262 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.280e+01 2.540e+01 2.785e+01 5.121e+01, threshold=5.079e+01, percent-clipped=0.0 2024-08-15 12:10:44,829 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 34 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-15 12:11:04,910 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3176270.0, ans=0.025 2024-08-15 12:11:05,748 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 13300, loss[loss=0.1147, beats_loss=0.009732, ecapa_loss=0.0001823, whisper_loss=0.1031, over 21412.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01054, ecapa_loss=0.0001507, whisper_loss=0.09158, over 3874724.26 frames. ], batch size: 91, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:11:06,638 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.70 vs. limit=15.0 2024-08-15 12:11:07,299 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 20 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-15 12:11:14,936 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3176270.0, ans=0.125 2024-08-15 12:11:16,536 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3176270.0, ans=0.125 2024-08-15 12:11:19,399 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3176370.0, ans=0.125 2024-08-15 12:11:29,252 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 22 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-15 12:11:33,711 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3176470.0, ans=0.125 2024-08-15 12:11:36,423 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 25 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-15 12:12:01,475 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3176570.0, ans=0.125 2024-08-15 12:12:05,067 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.79 vs. limit=12.0 2024-08-15 12:12:18,522 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 13350, loss[loss=0.09951, beats_loss=0.01209, ecapa_loss=0.0001448, whisper_loss=0.08598, over 22506.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01052, ecapa_loss=0.0001512, whisper_loss=0.09155, over 3872814.40 frames. ], batch size: 94, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:12:21,406 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.761e+01 2.329e+01 2.607e+01 2.940e+01 2.592e+02, threshold=5.213e+01, percent-clipped=3.0 2024-08-15 12:12:26,580 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3176770.0, ans=0.1 2024-08-15 12:12:29,286 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 14 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-15 12:12:42,561 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2024-08-15 12:13:03,277 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3177070.0, ans=0.125 2024-08-15 12:13:14,769 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 21 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-15 12:13:19,925 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3177170.0, ans=0.125 2024-08-15 12:13:32,319 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 13400, loss[loss=0.08773, beats_loss=0.01288, ecapa_loss=0.0001563, whisper_loss=0.07329, over 20269.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01056, ecapa_loss=0.0001516, whisper_loss=0.09086, over 3889361.63 frames. ], batch size: 85, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:13:39,455 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.77 vs. limit=12.0 2024-08-15 12:13:40,221 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 24 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-15 12:13:56,482 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3177370.0, ans=0.0 2024-08-15 12:13:56,648 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.24 vs. limit=15.0 2024-08-15 12:14:02,298 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 25 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-15 12:14:09,465 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-15 12:14:09,797 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3177470.0, ans=0.2 2024-08-15 12:14:15,168 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 25 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-15 12:14:22,414 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3177570.0, ans=0.125 2024-08-15 12:14:30,705 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-15 12:14:37,164 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3177670.0, ans=0.125 2024-08-15 12:14:39,279 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.76 vs. limit=6.0 2024-08-15 12:14:45,819 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 13450, loss[loss=0.1088, beats_loss=0.0107, ecapa_loss=0.0001536, whisper_loss=0.09654, over 20776.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0106, ecapa_loss=0.000151, whisper_loss=0.09082, over 3897699.88 frames. ], batch size: 84, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:14:48,642 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.536e+01 2.666e+01 2.899e+01 1.016e+02, threshold=5.331e+01, percent-clipped=2.0 2024-08-15 12:15:00,583 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3177870.0, ans=0.125 2024-08-15 12:15:02,315 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3177870.0, ans=0.1 2024-08-15 12:15:03,907 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3177870.0, ans=0.04949747468305833 2024-08-15 12:15:12,534 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 18 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-15 12:15:29,986 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3178070.0, ans=0.1 2024-08-15 12:15:31,204 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 21 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-15 12:15:32,609 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 23 from LS+wenet, 32 from Vox, 38 fro AS 2024-08-15 12:15:36,152 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 26 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-15 12:15:40,643 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 24 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-15 12:16:00,536 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 13500, loss[loss=0.08704, beats_loss=0.01227, ecapa_loss=0.0001347, whisper_loss=0.07342, over 22293.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01062, ecapa_loss=0.0001508, whisper_loss=0.0906, over 3881421.79 frames. ], batch size: 90, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:16:02,776 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3178270.0, ans=0.125 2024-08-15 12:16:08,369 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 23 from LS+wenet, 6 from Vox, 30 fro AS 2024-08-15 12:16:17,106 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 20 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-15 12:16:31,235 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3178470.0, ans=0.2 2024-08-15 12:16:44,787 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3178570.0, ans=0.1 2024-08-15 12:16:44,838 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3178570.0, ans=0.125 2024-08-15 12:16:45,867 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 33 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-15 12:17:08,120 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3178670.0, ans=0.0 2024-08-15 12:17:14,726 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 13550, loss[loss=0.1067, beats_loss=0.01273, ecapa_loss=0.0001109, whisper_loss=0.09283, over 20081.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01062, ecapa_loss=0.0001509, whisper_loss=0.0906, over 3897771.36 frames. ], batch size: 79, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:17:16,569 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-15 12:17:17,638 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.788e+01 2.308e+01 2.563e+01 2.825e+01 4.152e+01, threshold=5.126e+01, percent-clipped=0.0 2024-08-15 12:17:19,672 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3178770.0, ans=0.125 2024-08-15 12:17:22,121 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-15 12:17:31,547 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.85 vs. limit=15.0 2024-08-15 12:17:32,624 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3178870.0, ans=0.2 2024-08-15 12:17:42,622 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-15 12:17:58,941 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-15 12:18:25,849 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3179170.0, ans=0.0 2024-08-15 12:18:28,357 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 13600, loss[loss=0.08728, beats_loss=0.009005, ecapa_loss=0.000173, whisper_loss=0.07655, over 14345.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01065, ecapa_loss=0.0001496, whisper_loss=0.0909, over 3901766.92 frames. ], batch size: 55, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:18:33,183 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 21 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-15 12:18:36,381 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3179270.0, ans=0.125 2024-08-15 12:18:36,461 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3179270.0, ans=0.2 2024-08-15 12:18:56,580 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 26 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-15 12:18:57,797 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 21 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-15 12:19:01,168 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.56 vs. limit=10.0 2024-08-15 12:19:02,303 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 17 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-15 12:19:12,646 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 38 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-15 12:19:14,040 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 16 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-15 12:19:41,950 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 13650, loss[loss=0.09832, beats_loss=0.01235, ecapa_loss=0.0001517, whisper_loss=0.08446, over 18590.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01072, ecapa_loss=0.0001502, whisper_loss=0.09039, over 3895402.91 frames. ], batch size: 75, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:19:42,298 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-15 12:19:44,951 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.753e+01 2.314e+01 2.512e+01 2.853e+01 1.013e+02, threshold=5.025e+01, percent-clipped=2.0 2024-08-15 12:19:46,647 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 35 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-15 12:20:16,880 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3179970.0, ans=0.125 2024-08-15 12:20:18,314 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 21 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-15 12:20:24,455 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 32 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-15 12:20:25,057 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.64 vs. limit=22.5 2024-08-15 12:20:55,491 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 13700, loss[loss=0.1059, beats_loss=0.0101, ecapa_loss=0.0001376, whisper_loss=0.09442, over 16688.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01066, ecapa_loss=0.0001507, whisper_loss=0.09053, over 3899263.34 frames. ], batch size: 63, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:21:11,321 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3180370.0, ans=0.0 2024-08-15 12:21:19,057 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 36 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-15 12:21:46,009 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3180570.0, ans=0.125 2024-08-15 12:21:53,348 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 21 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-15 12:22:11,418 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 13750, loss[loss=0.1167, beats_loss=0.008285, ecapa_loss=0.0001757, whisper_loss=0.1067, over 21918.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01059, ecapa_loss=0.0001505, whisper_loss=0.09055, over 3890820.81 frames. ], batch size: 86, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:22:14,194 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.271e+01 2.530e+01 2.885e+01 4.854e+01, threshold=5.060e+01, percent-clipped=0.0 2024-08-15 12:22:14,820 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3180770.0, ans=0.125 2024-08-15 12:22:20,794 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3180770.0, ans=0.0 2024-08-15 12:22:39,241 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.98 vs. limit=15.0 2024-08-15 12:22:41,744 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 22 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-15 12:23:13,701 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3181170.0, ans=0.1 2024-08-15 12:23:13,853 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=5.049e+00 2024-08-15 12:23:25,776 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 13800, loss[loss=0.1048, beats_loss=0.01035, ecapa_loss=0.000175, whisper_loss=0.09268, over 17818.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01058, ecapa_loss=0.0001492, whisper_loss=0.09046, over 3895918.15 frames. ], batch size: 72, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:23:42,658 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3181370.0, ans=0.125 2024-08-15 12:23:42,883 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3181370.0, ans=0.1 2024-08-15 12:23:51,072 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 21 from LS+wenet, 23 from Vox, 48 fro AS 2024-08-15 12:24:10,561 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 23 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-15 12:24:40,098 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 13850, loss[loss=0.1025, beats_loss=0.008747, ecapa_loss=0.0001615, whisper_loss=0.09213, over 17662.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01056, ecapa_loss=0.0001492, whisper_loss=0.09102, over 3917799.68 frames. ], batch size: 69, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:24:43,016 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.942e+01 2.369e+01 2.668e+01 2.994e+01 7.332e+01, threshold=5.336e+01, percent-clipped=2.0 2024-08-15 12:24:55,029 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 19 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-15 12:25:11,419 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3181970.0, ans=0.0 2024-08-15 12:25:15,973 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.07 vs. limit=15.0 2024-08-15 12:25:16,733 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-15 12:25:18,597 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3181970.0, ans=0.125 2024-08-15 12:25:34,431 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 20 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-15 12:25:34,801 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3182070.0, ans=0.09899494936611666 2024-08-15 12:25:50,787 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 13 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-15 12:25:53,660 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 13900, loss[loss=0.09812, beats_loss=0.01081, ecapa_loss=0.0001599, whisper_loss=0.08571, over 20278.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01054, ecapa_loss=0.0001485, whisper_loss=0.09047, over 3874468.18 frames. ], batch size: 82, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:26:00,313 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.whiten.whitening_limit, batch_count=3182270.0, ans=12.0 2024-08-15 12:26:12,045 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3182370.0, ans=0.2 2024-08-15 12:26:35,197 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3182470.0, ans=0.0 2024-08-15 12:26:47,092 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0 2024-08-15 12:26:56,205 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3182670.0, ans=0.0 2024-08-15 12:26:58,751 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 34 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-15 12:27:01,482 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 13 from LS+wenet, 12 from Vox, 38 fro AS 2024-08-15 12:27:05,805 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 36 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-15 12:27:06,723 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 13950, loss[loss=0.1249, beats_loss=0.009283, ecapa_loss=0.0001482, whisper_loss=0.1141, over 22518.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01055, ecapa_loss=0.0001478, whisper_loss=0.09054, over 3897574.02 frames. ], batch size: 89, lr: 2.80e-03, grad_scale: 1.152921504606847e+18 2024-08-15 12:27:09,348 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.899e+01 2.273e+01 2.481e+01 2.745e+01 4.473e+01, threshold=4.963e+01, percent-clipped=0.0 2024-08-15 12:27:09,975 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3182770.0, ans=0.125 2024-08-15 12:27:23,874 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 19 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-15 12:27:28,200 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 19 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-15 12:27:50,942 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-15 12:27:51,268 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3183070.0, ans=0.1 2024-08-15 12:28:06,409 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3183170.0, ans=0.125 2024-08-15 12:28:14,787 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 20 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-15 12:28:20,293 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 14000, loss[loss=0.113, beats_loss=0.009784, ecapa_loss=0.0001254, whisper_loss=0.1019, over 23529.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01054, ecapa_loss=0.0001477, whisper_loss=0.09057, over 3903233.58 frames. ], batch size: 89, lr: 2.80e-03, grad_scale: 1.152921504606847e+18 2024-08-15 12:28:27,797 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 19 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-15 12:28:47,498 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 29 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-15 12:29:25,733 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-15 12:29:31,466 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 27 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-15 12:29:34,307 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 14050, loss[loss=0.08203, beats_loss=0.01279, ecapa_loss=0.0001703, whisper_loss=0.06753, over 14478.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01047, ecapa_loss=0.0001481, whisper_loss=0.09175, over 3918344.09 frames. ], batch size: 62, lr: 2.80e-03, grad_scale: 1.152921504606847e+18 2024-08-15 12:29:37,263 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.762e+01 2.178e+01 2.428e+01 2.740e+01 4.100e+01, threshold=4.856e+01, percent-clipped=0.0 2024-08-15 12:29:54,206 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3183870.0, ans=0.125 2024-08-15 12:30:05,330 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3183970.0, ans=0.125 2024-08-15 12:30:34,795 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 17 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-15 12:30:50,223 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 14100, loss[loss=0.1003, beats_loss=0.01228, ecapa_loss=0.0001491, whisper_loss=0.0865, over 16324.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01058, ecapa_loss=0.0001473, whisper_loss=0.0912, over 3910952.02 frames. ], batch size: 66, lr: 2.80e-03, grad_scale: 1.152921504606847e+18 2024-08-15 12:30:56,548 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.44 vs. limit=10.0 2024-08-15 12:30:56,661 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.77 vs. limit=15.0 2024-08-15 12:30:58,964 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3184270.0, ans=0.2 2024-08-15 12:31:14,919 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 24 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-15 12:31:16,710 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3184370.0, ans=0.125 2024-08-15 12:31:23,899 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3184470.0, ans=0.0 2024-08-15 12:31:26,739 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 21 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-15 12:31:46,142 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3184570.0, ans=0.125 2024-08-15 12:31:59,220 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 25 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-15 12:32:03,068 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 14150, loss[loss=0.1181, beats_loss=0.011, ecapa_loss=0.0001695, whisper_loss=0.1054, over 21342.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01065, ecapa_loss=0.0001476, whisper_loss=0.09128, over 3892110.40 frames. ], batch size: 89, lr: 2.80e-03, grad_scale: 1.152921504606847e+18 2024-08-15 12:32:03,371 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-15 12:32:06,093 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.412e+01 2.567e+01 2.890e+01 3.775e+01, threshold=5.134e+01, percent-clipped=0.0 2024-08-15 12:32:08,329 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3184770.0, ans=0.125 2024-08-15 12:32:09,912 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3184770.0, ans=0.0 2024-08-15 12:32:10,231 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.43 vs. limit=22.5 2024-08-15 12:32:30,836 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 22 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-15 12:33:01,401 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.60 vs. limit=15.0 2024-08-15 12:33:04,541 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3185170.0, ans=0.0 2024-08-15 12:33:06,112 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3185170.0, ans=0.125 2024-08-15 12:33:08,268 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.92 vs. limit=15.0 2024-08-15 12:33:22,191 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 14200, loss[loss=0.103, beats_loss=0.009027, ecapa_loss=0.0001413, whisper_loss=0.09252, over 15454.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01062, ecapa_loss=0.0001488, whisper_loss=0.09063, over 3891321.73 frames. ], batch size: 56, lr: 2.80e-03, grad_scale: 1.152921504606847e+18 2024-08-15 12:33:30,625 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3185270.0, ans=0.1 2024-08-15 12:33:32,951 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 34 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-15 12:33:43,306 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3185370.0, ans=0.2 2024-08-15 12:33:56,163 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3185470.0, ans=0.125 2024-08-15 12:34:01,617 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3185470.0, ans=0.125 2024-08-15 12:34:09,090 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-15 12:34:23,139 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3185570.0, ans=0.0 2024-08-15 12:34:30,511 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3185670.0, ans=0.0 2024-08-15 12:34:36,194 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3185670.0, ans=0.125 2024-08-15 12:34:37,346 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 23 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-15 12:34:44,386 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 14250, loss[loss=0.1143, beats_loss=0.009664, ecapa_loss=0.0001597, whisper_loss=0.1031, over 15884.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01061, ecapa_loss=0.0001491, whisper_loss=0.09074, over 3900331.43 frames. ], batch size: 63, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:34:49,738 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.313e+01 2.543e+01 2.810e+01 4.306e+01, threshold=5.087e+01, percent-clipped=0.0 2024-08-15 12:34:49,975 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 23 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-15 12:34:59,393 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.37 vs. limit=15.0 2024-08-15 12:35:03,099 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3185870.0, ans=0.1 2024-08-15 12:35:03,127 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3185870.0, ans=0.2 2024-08-15 12:35:08,112 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3185870.0, ans=0.1 2024-08-15 12:35:59,954 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3186070.0, ans=0.125 2024-08-15 12:36:18,546 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3186170.0, ans=0.125 2024-08-15 12:36:23,856 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 14300, loss[loss=0.09916, beats_loss=0.009459, ecapa_loss=0.0001606, whisper_loss=0.08809, over 16927.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01057, ecapa_loss=0.0001492, whisper_loss=0.09077, over 3922304.81 frames. ], batch size: 69, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:36:39,730 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 21 from LS+wenet, 19 from Vox, 52 fro AS 2024-08-15 12:36:44,379 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3186370.0, ans=0.125 2024-08-15 12:36:46,545 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3186370.0, ans=0.0 2024-08-15 12:37:02,983 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-15 12:37:09,789 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3186470.0, ans=0.1 2024-08-15 12:37:18,064 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3186470.0, ans=0.125 2024-08-15 12:37:25,094 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3186570.0, ans=0.1 2024-08-15 12:37:53,449 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.03 vs. limit=15.0 2024-08-15 12:37:56,132 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 12:38:03,688 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 14350, loss[loss=0.1113, beats_loss=0.008675, ecapa_loss=0.0001662, whisper_loss=0.101, over 19042.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01062, ecapa_loss=0.0001488, whisper_loss=0.0908, over 3950849.28 frames. ], batch size: 74, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:38:09,851 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.012e+01 2.292e+01 2.515e+01 2.764e+01 5.097e+01, threshold=5.030e+01, percent-clipped=1.0 2024-08-15 12:38:10,384 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3186770.0, ans=0.125 2024-08-15 12:39:04,496 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-15 12:39:06,828 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3187070.0, ans=0.125 2024-08-15 12:39:09,079 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3187070.0, ans=0.125 2024-08-15 12:39:28,971 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 30 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-15 12:39:30,091 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3187170.0, ans=0.125 2024-08-15 12:39:46,708 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 14400, loss[loss=0.1005, beats_loss=0.009596, ecapa_loss=0.0001219, whisper_loss=0.08971, over 14873.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01062, ecapa_loss=0.0001494, whisper_loss=0.09114, over 3956789.57 frames. ], batch size: 56, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:39:53,876 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3187270.0, ans=0.0 2024-08-15 12:40:09,991 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3187370.0, ans=0.125 2024-08-15 12:40:13,746 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 35 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-15 12:40:44,769 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3187570.0, ans=0.2 2024-08-15 12:40:54,845 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3187670.0, ans=0.0 2024-08-15 12:40:59,340 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3187670.0, ans=0.035 2024-08-15 12:41:06,236 INFO [train_multi_KD3.py:1116] (1/4) Epoch 22, batch 14450, loss[loss=0.0848, beats_loss=0.0129, ecapa_loss=0.0001302, whisper_loss=0.07061, over 18999.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01064, ecapa_loss=0.0001492, whisper_loss=0.09152, over 3953241.78 frames. ], batch size: 77, lr: 2.80e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 12:41:12,112 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.715e+01 2.363e+01 2.570e+01 2.963e+01 1.669e+02, threshold=5.140e+01, percent-clipped=2.0 2024-08-15 12:41:12,680 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3187770.0, ans=0.125 2024-08-15 12:41:14,686 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2024-08-15 12:41:25,490 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3187870.0, ans=0.1 2024-08-15 12:41:26,752 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3187870.0, ans=0.125 2024-08-15 12:41:31,161 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3187870.0, ans=0.125 2024-08-15 12:41:32,398 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 24 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-15 12:41:35,674 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.176e-01 2024-08-15 12:41:38,279 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3187970.0, ans=0.0 2024-08-15 12:41:45,900 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3187970.0, ans=0.0 2024-08-15 12:41:54,590 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3188070.0, ans=0.125 2024-08-15 12:42:01,289 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 25 from LS+wenet, 10 from Vox, 20 fro AS 2024-08-15 12:42:03,805 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.90 vs. limit=6.0 2024-08-15 12:42:46,637 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 0, loss[loss=0.1207, beats_loss=0.009426, ecapa_loss=0.0001114, whisper_loss=0.1101, over 17854.00 frames. ], tot_loss[loss=0.1207, beats_loss=0.009426, ecapa_loss=0.0001114, whisper_loss=0.1101, over 17854.00 frames. ], batch size: 66, lr: 2.74e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 12:42:46,638 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-15 12:43:28,487 INFO [train_multi_KD3.py:1149] (1/4) Epoch 23, validation on ASR_libri: loss=0.2517, beats_loss=0, ecapa_loss=0.0005338, whisper_loss=0.2464, over 922467.00 frames. 2024-08-15 12:43:45,205 INFO [train_multi_KD3.py:1149] (1/4) Epoch 23, validation on SV_voxceleb1: loss=0.00428, beats_loss=0, ecapa_loss=0.000428, whisper_loss=0, over 939242.00 frames. 2024-08-15 12:45:43,888 INFO [train_multi_KD3.py:1149] (1/4) Epoch 23, validation on AT_audioset: loss=0.02325, beats_loss=0.02325, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 12:45:43,891 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 30639MB 2024-08-15 12:45:48,734 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3188220.0, ans=0.1 2024-08-15 12:46:07,860 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 22 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-15 12:46:22,753 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3188320.0, ans=0.0 2024-08-15 12:46:46,346 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3188420.0, ans=0.09899494936611666 2024-08-15 12:47:46,325 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.09 vs. limit=15.0 2024-08-15 12:47:48,052 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 22 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-15 12:47:50,372 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 50, loss[loss=0.1061, beats_loss=0.007799, ecapa_loss=0.0001588, whisper_loss=0.09668, over 16792.00 frames. ], tot_loss[loss=0.09948, beats_loss=0.009629, ecapa_loss=0.0001573, whisper_loss=0.08828, over 860605.51 frames. ], batch size: 65, lr: 2.74e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 12:48:06,060 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3188720.0, ans=0.0 2024-08-15 12:48:13,947 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.052e+01 2.413e+01 2.733e+01 3.074e+01 3.899e+01, threshold=5.466e+01, percent-clipped=0.0 2024-08-15 12:48:58,116 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.66 vs. limit=22.5 2024-08-15 12:49:02,285 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 12:49:21,784 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3189020.0, ans=0.025 2024-08-15 12:49:24,973 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 21 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-15 12:49:32,176 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 19 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-15 12:49:40,483 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3189120.0, ans=0.125 2024-08-15 12:49:49,916 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 100, loss[loss=0.0859, beats_loss=0.01147, ecapa_loss=0.0001335, whisper_loss=0.07309, over 15994.00 frames. ], tot_loss[loss=0.09957, beats_loss=0.009839, ecapa_loss=0.0001515, whisper_loss=0.08822, over 1496761.04 frames. ], batch size: 63, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 12:50:27,503 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 23 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-15 12:50:41,511 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 24 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-15 12:51:10,171 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 18 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-15 12:51:27,212 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3189620.0, ans=0.125 2024-08-15 12:51:34,389 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3189620.0, ans=0.125 2024-08-15 12:51:41,279 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 150, loss[loss=0.1209, beats_loss=0.009288, ecapa_loss=0.0001403, whisper_loss=0.1102, over 22631.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.009549, ecapa_loss=0.0001521, whisper_loss=0.09053, over 2034035.63 frames. ], batch size: 89, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 12:51:57,043 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.145e+01 2.541e+01 2.794e+01 3.145e+01 4.567e+01, threshold=5.588e+01, percent-clipped=0.0 2024-08-15 12:52:00,726 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.30 vs. limit=6.0 2024-08-15 12:52:08,768 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 22 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-15 12:52:18,665 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 19 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-15 12:52:24,920 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 18 from LS+wenet, 18 from Vox, 17 fro AS 2024-08-15 12:52:30,693 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.93 vs. limit=15.0 2024-08-15 12:52:41,341 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 20 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-15 12:52:50,374 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.57 vs. limit=12.0 2024-08-15 12:52:55,135 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3190120.0, ans=0.1 2024-08-15 12:53:02,656 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 30 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-15 12:53:05,908 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 200, loss[loss=0.1105, beats_loss=0.009661, ecapa_loss=0.0001238, whisper_loss=0.09963, over 23833.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.009687, ecapa_loss=0.0001533, whisper_loss=0.09155, over 2433815.86 frames. ], batch size: 90, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 12:53:32,359 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.01 vs. limit=15.0 2024-08-15 12:53:33,957 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3190320.0, ans=0.1 2024-08-15 12:53:37,317 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3190420.0, ans=0.1 2024-08-15 12:53:49,878 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3190420.0, ans=0.125 2024-08-15 12:53:57,502 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3190520.0, ans=0.125 2024-08-15 12:54:19,871 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 13 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-15 12:54:24,670 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 250, loss[loss=0.09599, beats_loss=0.008673, ecapa_loss=0.0001629, whisper_loss=0.08569, over 18714.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.009884, ecapa_loss=0.0001519, whisper_loss=0.09185, over 2762993.23 frames. ], batch size: 70, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 12:54:28,586 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3190720.0, ans=0.1 2024-08-15 12:54:38,830 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.620e+01 2.264e+01 2.507e+01 2.916e+01 4.701e+01, threshold=5.014e+01, percent-clipped=0.0 2024-08-15 12:54:39,627 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.04 vs. limit=15.0 2024-08-15 12:54:54,316 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3190920.0, ans=0.125 2024-08-15 12:54:58,734 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 20 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-15 12:54:59,035 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3190920.0, ans=0.0 2024-08-15 12:55:03,615 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3190920.0, ans=0.125 2024-08-15 12:55:16,860 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3191020.0, ans=0.125 2024-08-15 12:55:41,383 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 300, loss[loss=0.09963, beats_loss=0.01163, ecapa_loss=0.0001579, whisper_loss=0.08642, over 21083.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01006, ecapa_loss=0.0001523, whisper_loss=0.09143, over 2987709.38 frames. ], batch size: 86, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 12:55:41,837 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3191220.0, ans=0.125 2024-08-15 12:55:46,770 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3191220.0, ans=0.05 2024-08-15 12:55:51,726 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3191220.0, ans=0.125 2024-08-15 12:56:10,104 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 18 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-15 12:56:10,394 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3191320.0, ans=0.1 2024-08-15 12:56:11,255 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 25 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-15 12:56:45,096 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.45 vs. limit=15.0 2024-08-15 12:56:45,944 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 16 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-15 12:56:47,920 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.11 vs. limit=15.0 2024-08-15 12:56:57,632 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3191720.0, ans=0.2 2024-08-15 12:56:58,859 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 350, loss[loss=0.1024, beats_loss=0.01036, ecapa_loss=0.0001222, whisper_loss=0.09085, over 20625.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0102, ecapa_loss=0.0001506, whisper_loss=0.09091, over 3167857.50 frames. ], batch size: 77, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 12:57:12,593 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.755e+01 2.342e+01 2.524e+01 2.862e+01 4.157e+01, threshold=5.047e+01, percent-clipped=0.0 2024-08-15 12:57:15,953 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 22 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-15 12:57:19,077 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3191820.0, ans=0.125 2024-08-15 12:57:19,205 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.14 vs. limit=15.0 2024-08-15 12:57:39,763 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 21 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-15 12:57:41,964 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3191920.0, ans=0.0 2024-08-15 12:57:43,159 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 22 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-15 12:57:53,986 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3192020.0, ans=0.09899494936611666 2024-08-15 12:57:54,000 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3192020.0, ans=0.0 2024-08-15 12:58:15,468 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.72 vs. limit=12.0 2024-08-15 12:58:16,047 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 400, loss[loss=0.07949, beats_loss=0.01381, ecapa_loss=0.000153, whisper_loss=0.06415, over 16797.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01029, ecapa_loss=0.0001509, whisper_loss=0.09051, over 3309781.29 frames. ], batch size: 72, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 12:58:18,314 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.43 vs. limit=10.0 2024-08-15 12:58:39,402 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3192320.0, ans=0.125 2024-08-15 12:58:46,570 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-15 12:58:57,481 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 29 from LS+wenet, 12 from Vox, 41 fro AS 2024-08-15 12:58:59,384 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 22 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-15 12:59:07,363 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3192520.0, ans=0.0 2024-08-15 12:59:10,514 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3192520.0, ans=0.125 2024-08-15 12:59:26,439 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.105e+05 2024-08-15 12:59:35,517 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 450, loss[loss=0.09516, beats_loss=0.01028, ecapa_loss=0.000127, whisper_loss=0.08361, over 19566.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01026, ecapa_loss=0.0001512, whisper_loss=0.09053, over 3417406.19 frames. ], batch size: 74, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 12:59:49,624 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.262e+01 2.454e+01 2.784e+01 4.737e+01, threshold=4.907e+01, percent-clipped=0.0 2024-08-15 12:59:49,950 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3192820.0, ans=0.125 2024-08-15 13:00:08,173 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3192920.0, ans=0.1 2024-08-15 13:00:18,681 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 37 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-15 13:00:43,235 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 36 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-15 13:00:58,585 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 500, loss[loss=0.105, beats_loss=0.01154, ecapa_loss=0.0001431, whisper_loss=0.09202, over 22172.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01022, ecapa_loss=0.0001506, whisper_loss=0.09053, over 3514077.33 frames. ], batch size: 88, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:01:36,607 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 33 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-15 13:01:38,427 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 23 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-15 13:01:39,194 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.01 vs. limit=22.5 2024-08-15 13:01:59,866 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3193520.0, ans=0.0 2024-08-15 13:02:03,152 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3193520.0, ans=0.0 2024-08-15 13:02:27,340 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 27 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-15 13:02:30,179 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 550, loss[loss=0.1151, beats_loss=0.01023, ecapa_loss=0.0001445, whisper_loss=0.1034, over 16194.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01022, ecapa_loss=0.0001503, whisper_loss=0.09083, over 3592344.64 frames. ], batch size: 63, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:02:44,745 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3193720.0, ans=0.125 2024-08-15 13:02:45,510 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.695e+01 2.304e+01 2.513e+01 2.793e+01 3.514e+01, threshold=5.025e+01, percent-clipped=0.0 2024-08-15 13:02:53,646 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3193820.0, ans=0.0 2024-08-15 13:02:57,645 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=3193820.0, ans=0.02 2024-08-15 13:02:59,730 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3193820.0, ans=0.0 2024-08-15 13:03:25,561 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.51 vs. limit=12.0 2024-08-15 13:03:30,753 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 29 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-15 13:03:44,418 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3194120.0, ans=0.125 2024-08-15 13:03:56,383 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 600, loss[loss=0.1184, beats_loss=0.00845, ecapa_loss=0.0001521, whisper_loss=0.1084, over 18227.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0103, ecapa_loss=0.000149, whisper_loss=0.09037, over 3623092.14 frames. ], batch size: 69, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:04:00,608 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=3194220.0, ans=6.0 2024-08-15 13:04:01,396 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 29 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-15 13:04:19,647 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 26 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-15 13:04:22,407 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 21 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-15 13:04:26,493 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-15 13:04:40,410 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 19 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-15 13:04:47,527 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 29 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-15 13:05:02,400 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3194620.0, ans=0.125 2024-08-15 13:05:07,090 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 650, loss[loss=0.103, beats_loss=0.01209, ecapa_loss=0.000149, whisper_loss=0.0894, over 23579.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01032, ecapa_loss=0.0001494, whisper_loss=0.09049, over 3671124.78 frames. ], batch size: 94, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:05:18,195 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.866e+01 2.357e+01 2.569e+01 2.869e+01 2.947e+02, threshold=5.138e+01, percent-clipped=4.0 2024-08-15 13:05:30,871 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3194820.0, ans=0.125 2024-08-15 13:05:42,852 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.26 vs. limit=22.5 2024-08-15 13:05:45,302 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 22 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-15 13:05:58,075 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3195120.0, ans=0.0 2024-08-15 13:06:03,684 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3195120.0, ans=0.0 2024-08-15 13:06:06,235 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 26 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-15 13:06:08,625 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 19 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-15 13:06:12,315 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 700, loss[loss=0.0983, beats_loss=0.01135, ecapa_loss=0.0001183, whisper_loss=0.08576, over 16645.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01037, ecapa_loss=0.0001495, whisper_loss=0.09037, over 3716491.42 frames. ], batch size: 62, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:06:14,113 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3195220.0, ans=0.2 2024-08-15 13:06:21,028 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-15 13:06:33,102 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3195320.0, ans=0.125 2024-08-15 13:06:41,897 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3195420.0, ans=0.125 2024-08-15 13:06:46,614 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3195420.0, ans=0.125 2024-08-15 13:06:57,591 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3195520.0, ans=0.0 2024-08-15 13:07:16,519 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 750, loss[loss=0.1184, beats_loss=0.007762, ecapa_loss=0.0001697, whisper_loss=0.1089, over 17272.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01044, ecapa_loss=0.0001494, whisper_loss=0.08989, over 3731682.66 frames. ], batch size: 68, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:07:28,508 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.716e+01 2.327e+01 2.582e+01 2.848e+01 1.200e+02, threshold=5.164e+01, percent-clipped=2.0 2024-08-15 13:07:32,854 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3195820.0, ans=0.0 2024-08-15 13:07:34,054 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3195820.0, ans=0.125 2024-08-15 13:07:36,625 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3195820.0, ans=0.125 2024-08-15 13:08:05,610 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3196020.0, ans=0.0 2024-08-15 13:08:15,536 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 30 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-15 13:08:20,870 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3196220.0, ans=0.2 2024-08-15 13:08:21,597 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 800, loss[loss=0.0959, beats_loss=0.01038, ecapa_loss=0.0001661, whisper_loss=0.08386, over 21196.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01039, ecapa_loss=0.0001492, whisper_loss=0.0898, over 3768900.14 frames. ], batch size: 87, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:08:26,539 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.58 vs. limit=15.0 2024-08-15 13:08:42,845 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 10 from LS+wenet, 26 from Vox, 20 fro AS 2024-08-15 13:08:44,375 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 11 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-15 13:08:49,834 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3196420.0, ans=0.0 2024-08-15 13:08:57,881 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.41 vs. limit=15.0 2024-08-15 13:09:26,438 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3196720.0, ans=0.0 2024-08-15 13:09:27,230 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 850, loss[loss=0.1038, beats_loss=0.01041, ecapa_loss=0.0001544, whisper_loss=0.09182, over 21878.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01039, ecapa_loss=0.0001489, whisper_loss=0.08971, over 3771927.94 frames. ], batch size: 89, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:09:27,677 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3196720.0, ans=0.125 2024-08-15 13:09:38,921 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.952e+01 2.346e+01 2.636e+01 2.893e+01 3.086e+02, threshold=5.271e+01, percent-clipped=3.0 2024-08-15 13:09:44,512 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3196820.0, ans=0.1 2024-08-15 13:09:51,153 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3196820.0, ans=0.125 2024-08-15 13:09:59,895 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-15 13:10:02,363 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 22 from LS+wenet, 19 from Vox, 49 fro AS 2024-08-15 13:10:03,952 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3196920.0, ans=0.2 2024-08-15 13:10:22,347 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 18 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-15 13:10:24,841 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 15 from LS+wenet, 24 from Vox, 19 fro AS 2024-08-15 13:10:32,226 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3197220.0, ans=0.125 2024-08-15 13:10:33,018 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 900, loss[loss=0.07826, beats_loss=0.0119, ecapa_loss=0.0001096, whisper_loss=0.06527, over 14758.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01049, ecapa_loss=0.0001478, whisper_loss=0.08863, over 3767714.48 frames. ], batch size: 55, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:10:33,417 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3197220.0, ans=0.125 2024-08-15 13:10:57,003 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3197320.0, ans=0.1 2024-08-15 13:11:16,292 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 11 from Vox, 55 fro AS 2024-08-15 13:11:20,016 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 21 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-15 13:11:26,842 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 13:11:32,164 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 27 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-15 13:11:38,298 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 950, loss[loss=0.1128, beats_loss=0.01034, ecapa_loss=0.0001557, whisper_loss=0.1009, over 22734.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01056, ecapa_loss=0.0001464, whisper_loss=0.08917, over 3776988.58 frames. ], batch size: 91, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:11:39,970 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3197720.0, ans=0.0 2024-08-15 13:11:41,319 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3197720.0, ans=0.0 2024-08-15 13:11:50,135 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.722e+01 2.292e+01 2.595e+01 2.867e+01 1.968e+02, threshold=5.190e+01, percent-clipped=1.0 2024-08-15 13:11:50,381 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 23 from LS+wenet, 12 from Vox, 50 fro AS 2024-08-15 13:12:16,271 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.95 vs. limit=15.0 2024-08-15 13:12:20,759 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 26 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-15 13:12:26,153 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3198020.0, ans=0.125 2024-08-15 13:12:28,692 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3198020.0, ans=0.07 2024-08-15 13:12:28,713 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3198020.0, ans=0.025 2024-08-15 13:12:34,105 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3198120.0, ans=0.1 2024-08-15 13:12:39,725 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.66 vs. limit=15.0 2024-08-15 13:12:44,254 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 1000, loss[loss=0.06673, beats_loss=0.01217, ecapa_loss=9.886e-05, whisper_loss=0.05357, over 16907.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.0106, ecapa_loss=0.0001449, whisper_loss=0.0889, over 3754508.51 frames. ], batch size: 62, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:12:48,486 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3198220.0, ans=0.1 2024-08-15 13:13:07,206 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3198320.0, ans=0.125 2024-08-15 13:13:09,663 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3198420.0, ans=0.0 2024-08-15 13:13:10,699 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 21 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-15 13:13:10,859 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3198420.0, ans=0.1 2024-08-15 13:13:11,007 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3198420.0, ans=0.0 2024-08-15 13:13:21,632 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3198420.0, ans=0.2 2024-08-15 13:13:24,268 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3198520.0, ans=0.125 2024-08-15 13:13:35,241 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.19 vs. limit=15.0 2024-08-15 13:13:35,276 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.96 vs. limit=10.0 2024-08-15 13:13:47,665 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3198620.0, ans=0.125 2024-08-15 13:13:47,694 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3198620.0, ans=0.125 2024-08-15 13:13:49,668 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 1050, loss[loss=0.1061, beats_loss=0.01008, ecapa_loss=0.0001348, whisper_loss=0.09465, over 14969.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.0106, ecapa_loss=0.0001452, whisper_loss=0.08849, over 3773852.36 frames. ], batch size: 58, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:14:01,318 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.752e+01 2.361e+01 2.585e+01 2.930e+01 4.862e+01, threshold=5.170e+01, percent-clipped=0.0 2024-08-15 13:14:09,471 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 13:14:19,447 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.83 vs. limit=15.0 2024-08-15 13:14:38,291 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3199020.0, ans=0.0 2024-08-15 13:14:43,462 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3199120.0, ans=0.125 2024-08-15 13:14:54,334 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 1100, loss[loss=0.09775, beats_loss=0.01152, ecapa_loss=0.0001278, whisper_loss=0.08495, over 17208.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01053, ecapa_loss=0.0001442, whisper_loss=0.08927, over 3768351.69 frames. ], batch size: 67, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:14:57,567 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3199220.0, ans=0.05 2024-08-15 13:15:04,098 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=3199220.0, ans=22.5 2024-08-15 13:15:05,119 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3199220.0, ans=0.0 2024-08-15 13:15:06,504 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.97 vs. limit=15.0 2024-08-15 13:15:15,211 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3199320.0, ans=0.125 2024-08-15 13:15:17,721 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 23 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-15 13:15:18,389 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.73 vs. limit=15.0 2024-08-15 13:15:22,125 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3199420.0, ans=0.125 2024-08-15 13:15:43,482 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3199520.0, ans=0.125 2024-08-15 13:15:59,814 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 1150, loss[loss=0.1272, beats_loss=0.00727, ecapa_loss=0.0001329, whisper_loss=0.1186, over 18282.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01054, ecapa_loss=0.0001441, whisper_loss=0.08943, over 3798146.60 frames. ], batch size: 66, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:16:04,951 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.32 vs. limit=22.5 2024-08-15 13:16:11,515 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.841e+01 2.313e+01 2.590e+01 2.898e+01 5.614e+01, threshold=5.180e+01, percent-clipped=1.0 2024-08-15 13:16:20,202 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3199820.0, ans=0.125 2024-08-15 13:16:25,509 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3199920.0, ans=0.035 2024-08-15 13:16:25,575 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3199920.0, ans=0.0 2024-08-15 13:16:26,720 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3199920.0, ans=0.0 2024-08-15 13:16:28,394 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=3199920.0, ans=15.0 2024-08-15 13:16:30,445 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3199920.0, ans=0.0 2024-08-15 13:16:32,972 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3199920.0, ans=0.125 2024-08-15 13:16:43,995 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3200020.0, ans=0.2 2024-08-15 13:16:47,250 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-15 13:16:49,832 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 28 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-15 13:16:52,481 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-15 13:17:01,364 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 23 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-15 13:17:05,872 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 24 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-15 13:17:09,279 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 1200, loss[loss=0.1021, beats_loss=0.009831, ecapa_loss=0.0001529, whisper_loss=0.09077, over 22608.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01049, ecapa_loss=0.0001447, whisper_loss=0.08953, over 3790259.91 frames. ], batch size: 92, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:17:10,749 WARNING [optim.py:496] (1/4) Scaling gradients by 0.052070412784814835, model_norm_threshold=51.8048095703125 2024-08-15 13:17:10,938 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.18, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.799e+05, grad_sumsq=1.791e+07, orig_rms_sq=1.005e-02 2024-08-15 13:17:30,196 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3200320.0, ans=0.1 2024-08-15 13:17:31,069 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 30 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-15 13:17:38,281 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.16 vs. limit=6.0 2024-08-15 13:17:50,123 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3200520.0, ans=0.1 2024-08-15 13:17:56,554 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3200520.0, ans=0.125 2024-08-15 13:18:07,009 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3200620.0, ans=0.125 2024-08-15 13:18:08,752 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3200620.0, ans=0.1 2024-08-15 13:18:11,706 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.86 vs. limit=15.0 2024-08-15 13:18:13,665 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3200620.0, ans=0.95 2024-08-15 13:18:14,891 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3200720.0, ans=0.125 2024-08-15 13:18:15,665 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 1250, loss[loss=0.09735, beats_loss=0.01212, ecapa_loss=0.0001008, whisper_loss=0.08422, over 18190.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01055, ecapa_loss=0.0001458, whisper_loss=0.08946, over 3780955.53 frames. ], batch size: 68, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:18:27,342 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.694e+01 2.258e+01 2.452e+01 2.719e+01 9.949e+02, threshold=4.904e+01, percent-clipped=2.0 2024-08-15 13:18:29,597 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3200820.0, ans=0.025 2024-08-15 13:18:35,640 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 25 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-15 13:18:35,875 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3200820.0, ans=0.125 2024-08-15 13:18:35,933 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3200820.0, ans=0.1 2024-08-15 13:18:54,136 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 37 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-15 13:18:59,758 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=3201020.0, ans=0.5 2024-08-15 13:19:07,622 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3201120.0, ans=0.125 2024-08-15 13:19:21,109 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 1300, loss[loss=0.1093, beats_loss=0.007585, ecapa_loss=0.0001692, whisper_loss=0.1001, over 14179.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.0105, ecapa_loss=0.0001461, whisper_loss=0.08911, over 3778754.61 frames. ], batch size: 59, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:19:27,052 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3201220.0, ans=0.0 2024-08-15 13:19:29,794 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3201220.0, ans=0.2 2024-08-15 13:19:33,927 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3201320.0, ans=0.0 2024-08-15 13:19:57,789 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.22 vs. limit=22.5 2024-08-15 13:19:59,043 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.83 vs. limit=15.0 2024-08-15 13:20:14,976 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3201620.0, ans=0.2 2024-08-15 13:20:20,313 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=3201620.0, ans=15.0 2024-08-15 13:20:21,004 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-15 13:20:26,204 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3201720.0, ans=0.125 2024-08-15 13:20:27,022 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 1350, loss[loss=0.1202, beats_loss=0.01013, ecapa_loss=0.0001369, whisper_loss=0.1087, over 22667.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01047, ecapa_loss=0.0001454, whisper_loss=0.08958, over 3774502.93 frames. ], batch size: 86, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:20:34,926 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.30 vs. limit=22.5 2024-08-15 13:20:39,293 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.840e+01 2.225e+01 2.528e+01 2.736e+01 6.244e+01, threshold=5.056e+01, percent-clipped=1.0 2024-08-15 13:20:48,323 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 24 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-15 13:20:49,890 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3201820.0, ans=0.125 2024-08-15 13:20:56,544 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 18 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-15 13:21:22,499 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3202120.0, ans=0.2 2024-08-15 13:21:32,502 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3202120.0, ans=0.125 2024-08-15 13:21:34,894 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 1400, loss[loss=0.08985, beats_loss=0.01019, ecapa_loss=0.0001316, whisper_loss=0.07834, over 15630.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01042, ecapa_loss=0.0001461, whisper_loss=0.09009, over 3769575.70 frames. ], batch size: 59, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:21:36,694 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3202220.0, ans=0.125 2024-08-15 13:21:36,719 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3202220.0, ans=0.125 2024-08-15 13:21:41,032 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3202220.0, ans=0.125 2024-08-15 13:22:08,598 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 12 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-15 13:22:14,298 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 26 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-15 13:22:20,972 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.95 vs. limit=15.0 2024-08-15 13:22:25,066 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3202520.0, ans=0.0 2024-08-15 13:22:41,918 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3202620.0, ans=0.05 2024-08-15 13:22:45,046 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 13:22:47,479 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 1450, loss[loss=0.1182, beats_loss=0.01049, ecapa_loss=0.0001583, whisper_loss=0.1062, over 19026.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01051, ecapa_loss=0.0001451, whisper_loss=0.08936, over 3767063.46 frames. ], batch size: 79, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:23:24,039 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.773e+01 2.256e+01 2.495e+01 2.819e+01 4.681e+02, threshold=4.990e+01, percent-clipped=2.0 2024-08-15 13:23:24,221 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 22 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-15 13:23:31,176 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3202820.0, ans=0.125 2024-08-15 13:23:39,018 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3202920.0, ans=0.1 2024-08-15 13:23:44,058 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3202920.0, ans=0.1 2024-08-15 13:23:59,788 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 26 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-15 13:24:01,495 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 19 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-15 13:24:07,781 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3203020.0, ans=0.07 2024-08-15 13:24:25,101 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.80 vs. limit=15.0 2024-08-15 13:24:25,774 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 1500, loss[loss=0.06928, beats_loss=0.00773, ecapa_loss=0.0002036, whisper_loss=0.05951, over 12554.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.0105, ecapa_loss=0.0001442, whisper_loss=0.08847, over 3758659.55 frames. ], batch size: 55, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:24:36,453 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3203220.0, ans=0.0 2024-08-15 13:24:40,125 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 19 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-15 13:24:43,067 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3203320.0, ans=0.0 2024-08-15 13:24:54,610 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 19 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-15 13:24:56,439 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3203420.0, ans=0.0 2024-08-15 13:25:12,475 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3203520.0, ans=0.125 2024-08-15 13:25:38,807 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 1550, loss[loss=0.08924, beats_loss=0.01132, ecapa_loss=0.0001311, whisper_loss=0.07661, over 20709.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01045, ecapa_loss=0.0001439, whisper_loss=0.08923, over 3754075.04 frames. ], batch size: 79, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:25:39,133 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3203720.0, ans=0.125 2024-08-15 13:25:42,379 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 22 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-15 13:25:46,744 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.29 vs. limit=15.0 2024-08-15 13:25:51,801 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.256e+01 2.497e+01 2.794e+01 4.870e+01, threshold=4.993e+01, percent-clipped=0.0 2024-08-15 13:25:51,955 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 21 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-15 13:26:20,371 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.24 vs. limit=6.0 2024-08-15 13:26:24,733 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 18 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-15 13:26:41,552 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.69 vs. limit=15.0 2024-08-15 13:26:54,819 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 1600, loss[loss=0.1043, beats_loss=0.008561, ecapa_loss=0.0001306, whisper_loss=0.09441, over 16829.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01043, ecapa_loss=0.0001454, whisper_loss=0.08909, over 3773191.98 frames. ], batch size: 62, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:26:55,236 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3204220.0, ans=0.125 2024-08-15 13:26:56,193 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 24 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-15 13:26:56,769 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.00 vs. limit=15.0 2024-08-15 13:27:03,016 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3204220.0, ans=0.2 2024-08-15 13:27:06,242 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.58 vs. limit=22.5 2024-08-15 13:27:20,148 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-15 13:27:20,750 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.34 vs. limit=15.0 2024-08-15 13:27:33,648 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 13:27:34,808 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3204420.0, ans=0.125 2024-08-15 13:27:36,084 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3204420.0, ans=0.2 2024-08-15 13:27:51,107 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3204520.0, ans=0.2 2024-08-15 13:28:08,831 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 1650, loss[loss=0.09455, beats_loss=0.01029, ecapa_loss=0.0001599, whisper_loss=0.08266, over 22796.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01044, ecapa_loss=0.0001455, whisper_loss=0.08905, over 3761975.25 frames. ], batch size: 92, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:28:21,011 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.17 vs. limit=22.5 2024-08-15 13:28:21,700 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.261e+01 2.464e+01 2.812e+01 1.426e+02, threshold=4.927e+01, percent-clipped=1.0 2024-08-15 13:28:33,634 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3204820.0, ans=0.125 2024-08-15 13:28:38,888 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 36 from LS+wenet, 12 from Vox, 44 fro AS 2024-08-15 13:28:43,357 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3204920.0, ans=0.2 2024-08-15 13:29:09,502 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 23 from LS+wenet, 19 from Vox, 52 fro AS 2024-08-15 13:29:10,858 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 25 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-15 13:29:12,943 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3205120.0, ans=0.125 2024-08-15 13:29:21,682 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3205220.0, ans=0.125 2024-08-15 13:29:22,896 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 1700, loss[loss=0.1078, beats_loss=0.0107, ecapa_loss=0.0001402, whisper_loss=0.09568, over 20393.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.0104, ecapa_loss=0.0001459, whisper_loss=0.08942, over 3769482.51 frames. ], batch size: 80, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:29:26,041 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3205220.0, ans=0.1 2024-08-15 13:29:38,390 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 21 from LS+wenet, 9 from Vox, 28 fro AS 2024-08-15 13:29:42,407 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.93 vs. limit=22.5 2024-08-15 13:29:45,251 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3205320.0, ans=0.035 2024-08-15 13:30:11,177 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3205520.0, ans=0.2 2024-08-15 13:30:25,936 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.14 vs. limit=15.0 2024-08-15 13:30:31,756 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 23 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-15 13:30:38,588 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 1750, loss[loss=0.08418, beats_loss=0.01011, ecapa_loss=0.0001759, whisper_loss=0.07231, over 18348.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01041, ecapa_loss=0.000146, whisper_loss=0.08914, over 3771700.62 frames. ], batch size: 77, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:30:42,954 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3205720.0, ans=0.0 2024-08-15 13:30:51,546 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.735e+01 2.279e+01 2.476e+01 2.729e+01 6.838e+01, threshold=4.951e+01, percent-clipped=2.0 2024-08-15 13:30:57,898 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=3205820.0, ans=15.0 2024-08-15 13:31:00,953 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3205820.0, ans=0.125 2024-08-15 13:31:09,153 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.74 vs. limit=15.0 2024-08-15 13:31:36,759 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3206120.0, ans=0.2 2024-08-15 13:31:53,488 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 1800, loss[loss=0.1035, beats_loss=0.01042, ecapa_loss=0.0001581, whisper_loss=0.09151, over 21559.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.0104, ecapa_loss=0.0001472, whisper_loss=0.08873, over 3788026.46 frames. ], batch size: 88, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:32:00,988 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3206220.0, ans=0.035 2024-08-15 13:32:02,097 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 29 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-15 13:32:03,609 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3206220.0, ans=0.125 2024-08-15 13:32:42,411 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3206520.0, ans=0.125 2024-08-15 13:32:47,967 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3206520.0, ans=0.1 2024-08-15 13:33:06,812 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 1850, loss[loss=0.08544, beats_loss=0.0151, ecapa_loss=0.0001009, whisper_loss=0.06933, over 16172.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01035, ecapa_loss=0.0001467, whisper_loss=0.08936, over 3776140.93 frames. ], batch size: 64, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:33:07,267 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3206720.0, ans=0.0 2024-08-15 13:33:11,402 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 25 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-15 13:33:12,657 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 23 from LS+wenet, 12 from Vox, 38 fro AS 2024-08-15 13:33:16,175 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3206720.0, ans=0.1 2024-08-15 13:33:19,228 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3206720.0, ans=0.0 2024-08-15 13:33:20,043 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.257e+01 2.507e+01 2.743e+01 3.719e+01, threshold=5.013e+01, percent-clipped=0.0 2024-08-15 13:33:22,832 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-15 13:33:23,148 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3206820.0, ans=0.125 2024-08-15 13:33:26,453 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3206820.0, ans=0.05 2024-08-15 13:33:33,470 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3206820.0, ans=0.2 2024-08-15 13:33:50,012 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3207020.0, ans=0.0 2024-08-15 13:33:52,637 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3207020.0, ans=0.2 2024-08-15 13:34:02,852 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3207020.0, ans=0.125 2024-08-15 13:34:03,294 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.04 vs. limit=22.5 2024-08-15 13:34:05,609 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3207120.0, ans=0.2 2024-08-15 13:34:05,874 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.81 vs. limit=15.0 2024-08-15 13:34:16,658 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 27 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-15 13:34:21,159 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 1900, loss[loss=0.1016, beats_loss=0.01088, ecapa_loss=0.0001339, whisper_loss=0.0894, over 22371.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01039, ecapa_loss=0.0001462, whisper_loss=0.08918, over 3782056.33 frames. ], batch size: 92, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:34:24,220 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3207220.0, ans=0.0 2024-08-15 13:34:27,648 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.20 vs. limit=22.5 2024-08-15 13:34:29,227 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3207220.0, ans=0.0 2024-08-15 13:34:59,740 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 26 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-15 13:35:22,759 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3207620.0, ans=0.0 2024-08-15 13:35:32,713 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3207620.0, ans=0.0 2024-08-15 13:35:36,876 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 1950, loss[loss=0.1339, beats_loss=0.009114, ecapa_loss=0.0001562, whisper_loss=0.1232, over 23823.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01041, ecapa_loss=0.0001457, whisper_loss=0.08902, over 3772625.38 frames. ], batch size: 94, lr: 2.73e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:35:37,353 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3207720.0, ans=0.0 2024-08-15 13:35:38,386 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 21 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-15 13:35:43,358 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3207720.0, ans=0.125 2024-08-15 13:35:43,382 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=3207720.0, ans=0.05 2024-08-15 13:35:45,953 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 21 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-15 13:35:49,143 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3207720.0, ans=0.0 2024-08-15 13:35:49,162 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3207720.0, ans=0.125 2024-08-15 13:35:49,887 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.666e+01 2.350e+01 2.550e+01 2.908e+01 4.451e+01, threshold=5.100e+01, percent-clipped=0.0 2024-08-15 13:35:54,660 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-15 13:36:02,818 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 20 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-15 13:36:50,743 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 2000, loss[loss=0.09821, beats_loss=0.01152, ecapa_loss=0.0001049, whisper_loss=0.08565, over 16819.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01042, ecapa_loss=0.0001451, whisper_loss=0.08914, over 3761237.48 frames. ], batch size: 64, lr: 2.73e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:36:56,879 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 29 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-15 13:37:03,074 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 33 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-15 13:37:03,455 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3208220.0, ans=0.125 2024-08-15 13:37:10,461 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3208320.0, ans=0.025 2024-08-15 13:37:50,536 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.73 vs. limit=15.0 2024-08-15 13:38:08,441 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 2050, loss[loss=0.1213, beats_loss=0.008385, ecapa_loss=0.0001402, whisper_loss=0.1115, over 19906.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01037, ecapa_loss=0.0001451, whisper_loss=0.0898, over 3735373.03 frames. ], batch size: 73, lr: 2.73e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:38:13,183 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3208720.0, ans=0.125 2024-08-15 13:38:22,082 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.750e+01 2.259e+01 2.501e+01 2.773e+01 1.854e+02, threshold=5.002e+01, percent-clipped=2.0 2024-08-15 13:38:50,996 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 26 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-15 13:38:59,926 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3209020.0, ans=0.125 2024-08-15 13:39:01,129 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3209020.0, ans=0.025 2024-08-15 13:39:22,690 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 2100, loss[loss=0.08176, beats_loss=0.01115, ecapa_loss=0.0001812, whisper_loss=0.0688, over 15045.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01043, ecapa_loss=0.0001436, whisper_loss=0.08958, over 3751361.57 frames. ], batch size: 62, lr: 2.73e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:39:46,962 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 16 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-15 13:39:50,474 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.32 vs. limit=22.5 2024-08-15 13:39:58,256 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3209420.0, ans=0.0 2024-08-15 13:40:02,684 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3209420.0, ans=0.125 2024-08-15 13:40:17,106 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.17 vs. limit=22.5 2024-08-15 13:40:25,247 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3209620.0, ans=0.0 2024-08-15 13:40:32,793 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.19 vs. limit=15.0 2024-08-15 13:40:34,979 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3209720.0, ans=0.1 2024-08-15 13:40:35,936 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 2150, loss[loss=0.1002, beats_loss=0.01051, ecapa_loss=0.0001497, whisper_loss=0.08815, over 15775.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01054, ecapa_loss=0.0001421, whisper_loss=0.08944, over 3748758.18 frames. ], batch size: 64, lr: 2.73e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:40:36,338 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3209720.0, ans=0.125 2024-08-15 13:40:37,625 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-15 13:40:43,677 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 19 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-15 13:40:49,098 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.347e+01 2.631e+01 2.979e+01 4.158e+01, threshold=5.262e+01, percent-clipped=0.0 2024-08-15 13:40:54,665 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 32 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-15 13:40:54,990 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3209820.0, ans=0.125 2024-08-15 13:41:00,289 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3209820.0, ans=0.0 2024-08-15 13:41:26,051 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3210020.0, ans=0.2 2024-08-15 13:41:31,020 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 24 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-15 13:41:38,273 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 21 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-15 13:41:38,547 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3210120.0, ans=0.09899494936611666 2024-08-15 13:41:49,894 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 2200, loss[loss=0.1339, beats_loss=0.008463, ecapa_loss=0.0001322, whisper_loss=0.1241, over 16139.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01053, ecapa_loss=0.0001425, whisper_loss=0.08984, over 3769686.83 frames. ], batch size: 58, lr: 2.73e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:42:10,827 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-15 13:42:15,518 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 23 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-15 13:42:20,840 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.93 vs. limit=15.0 2024-08-15 13:42:23,476 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 30 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-15 13:42:40,965 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 25 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-15 13:42:57,613 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 34 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-15 13:43:00,835 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 24 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-15 13:43:04,762 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 2250, loss[loss=0.1081, beats_loss=0.01108, ecapa_loss=0.0001461, whisper_loss=0.09557, over 19450.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01057, ecapa_loss=0.0001434, whisper_loss=0.09085, over 3817821.50 frames. ], batch size: 77, lr: 2.73e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:43:08,814 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.10 vs. limit=15.0 2024-08-15 13:43:09,226 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 26 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-15 13:43:15,092 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-15 13:43:17,990 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.966e+01 2.315e+01 2.592e+01 2.973e+01 1.052e+02, threshold=5.184e+01, percent-clipped=4.0 2024-08-15 13:43:18,409 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3210820.0, ans=0.5 2024-08-15 13:43:27,025 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3210820.0, ans=0.0 2024-08-15 13:43:31,593 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 21 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-15 13:43:32,907 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 30 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-15 13:43:37,394 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 21 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-15 13:43:49,265 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-15 13:44:16,017 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 10 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-15 13:44:21,486 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 2300, loss[loss=0.09824, beats_loss=0.01244, ecapa_loss=9.416e-05, whisper_loss=0.08485, over 20435.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01062, ecapa_loss=0.0001439, whisper_loss=0.09105, over 3863006.57 frames. ], batch size: 76, lr: 2.73e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:44:29,227 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3211220.0, ans=0.125 2024-08-15 13:44:31,216 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3211220.0, ans=0.125 2024-08-15 13:44:45,923 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3211320.0, ans=0.0 2024-08-15 13:44:46,225 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.84 vs. limit=12.0 2024-08-15 13:44:53,612 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3211320.0, ans=0.0 2024-08-15 13:45:14,367 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 38 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-15 13:45:16,108 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3211520.0, ans=0.125 2024-08-15 13:45:22,217 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 28 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-15 13:45:45,522 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 27 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-15 13:45:47,910 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 2350, loss[loss=0.08041, beats_loss=0.01342, ecapa_loss=0.0001447, whisper_loss=0.06554, over 21571.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01062, ecapa_loss=0.0001444, whisper_loss=0.09093, over 3866451.45 frames. ], batch size: 90, lr: 2.73e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:45:53,811 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-15 13:46:01,330 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3211720.0, ans=0.125 2024-08-15 13:46:03,947 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.354e+01 2.614e+01 2.902e+01 1.801e+02, threshold=5.228e+01, percent-clipped=1.0 2024-08-15 13:46:07,565 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3211820.0, ans=0.1 2024-08-15 13:46:14,966 WARNING [optim.py:496] (1/4) Scaling gradients by 0.0751657783985138, model_norm_threshold=52.2847900390625 2024-08-15 13:46:15,144 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.18, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.469e+04, grad_sumsq=8.469e+04, orig_rms_sq=1.000e+00 2024-08-15 13:47:13,679 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 2400, loss[loss=0.1025, beats_loss=0.01033, ecapa_loss=0.0001786, whisper_loss=0.09037, over 21483.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01054, ecapa_loss=0.0001451, whisper_loss=0.09173, over 3872220.03 frames. ], batch size: 93, lr: 2.73e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:47:20,735 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3212220.0, ans=0.1 2024-08-15 13:47:41,807 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3212320.0, ans=0.1 2024-08-15 13:47:46,574 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 14 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-15 13:48:13,029 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 19 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-15 13:48:25,800 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.whiten.whitening_limit, batch_count=3212620.0, ans=12.0 2024-08-15 13:48:29,728 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 19 from LS+wenet, 26 from Vox, 25 fro AS 2024-08-15 13:48:30,064 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3212620.0, ans=0.0 2024-08-15 13:48:32,306 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3212620.0, ans=0.125 2024-08-15 13:48:35,879 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 2450, loss[loss=0.1234, beats_loss=0.00986, ecapa_loss=0.0001686, whisper_loss=0.1119, over 22519.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01059, ecapa_loss=0.0001455, whisper_loss=0.09145, over 3850692.50 frames. ], batch size: 92, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:48:43,826 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 23 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-15 13:48:48,079 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3212720.0, ans=0.125 2024-08-15 13:48:51,651 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.822e+01 2.196e+01 2.471e+01 2.708e+01 6.956e+02, threshold=4.941e+01, percent-clipped=1.0 2024-08-15 13:49:06,465 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.55 vs. limit=15.0 2024-08-15 13:49:19,110 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.86 vs. limit=15.0 2024-08-15 13:49:26,520 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3213020.0, ans=0.0 2024-08-15 13:49:30,933 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 24 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-15 13:49:44,008 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3213120.0, ans=0.2 2024-08-15 13:49:49,149 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3213120.0, ans=0.09899494936611666 2024-08-15 13:49:57,811 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 2500, loss[loss=0.08361, beats_loss=0.0097, ecapa_loss=0.0001736, whisper_loss=0.07217, over 14606.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01051, ecapa_loss=0.0001453, whisper_loss=0.09171, over 3830559.57 frames. ], batch size: 59, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:49:58,031 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 23 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-15 13:50:08,222 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3213220.0, ans=0.125 2024-08-15 13:50:12,179 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3213220.0, ans=0.0 2024-08-15 13:50:17,352 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3213320.0, ans=0.125 2024-08-15 13:50:41,042 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff3.min_abs, batch_count=3213420.0, ans=0.2 2024-08-15 13:50:41,065 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3213420.0, ans=0.125 2024-08-15 13:50:46,683 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3213420.0, ans=0.1 2024-08-15 13:50:50,005 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3213520.0, ans=0.125 2024-08-15 13:51:22,059 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3213720.0, ans=0.125 2024-08-15 13:51:22,769 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 2550, loss[loss=0.1361, beats_loss=0.009256, ecapa_loss=0.0001574, whisper_loss=0.1252, over 23068.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01047, ecapa_loss=0.000146, whisper_loss=0.09161, over 3840864.69 frames. ], batch size: 89, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:51:38,381 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.851e+01 2.247e+01 2.527e+01 2.799e+01 4.421e+01, threshold=5.053e+01, percent-clipped=0.0 2024-08-15 13:52:02,065 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-15 13:52:05,448 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3213920.0, ans=0.1 2024-08-15 13:52:08,794 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.36 vs. limit=22.5 2024-08-15 13:52:19,436 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=3214020.0, ans=15.0 2024-08-15 13:52:50,223 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3214220.0, ans=0.125 2024-08-15 13:52:51,901 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 2600, loss[loss=0.09091, beats_loss=0.01157, ecapa_loss=0.0001541, whisper_loss=0.0778, over 22214.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01059, ecapa_loss=0.000146, whisper_loss=0.09068, over 3849615.54 frames. ], batch size: 91, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:52:53,579 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 20 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-15 13:52:58,227 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.52 vs. limit=15.0 2024-08-15 13:53:13,012 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3214320.0, ans=0.5 2024-08-15 13:53:26,843 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 20 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-15 13:53:28,462 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3214420.0, ans=0.125 2024-08-15 13:53:28,920 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.08 vs. limit=10.0 2024-08-15 13:53:47,181 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3214520.0, ans=0.125 2024-08-15 13:54:07,078 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3214620.0, ans=0.025 2024-08-15 13:54:17,113 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 2650, loss[loss=0.1041, beats_loss=0.01092, ecapa_loss=0.0001416, whisper_loss=0.09178, over 18559.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0106, ecapa_loss=0.0001465, whisper_loss=0.09084, over 3879941.82 frames. ], batch size: 70, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:54:32,106 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.900e+01 2.308e+01 2.516e+01 2.935e+01 7.349e+01, threshold=5.032e+01, percent-clipped=1.0 2024-08-15 13:54:48,455 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3214820.0, ans=0.0 2024-08-15 13:55:08,573 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.53 vs. limit=22.5 2024-08-15 13:55:14,987 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3215020.0, ans=0.0 2024-08-15 13:55:20,694 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3215020.0, ans=0.125 2024-08-15 13:55:22,131 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3215020.0, ans=0.125 2024-08-15 13:55:23,577 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 20 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-15 13:55:41,759 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 2700, loss[loss=0.07427, beats_loss=0.00959, ecapa_loss=0.0001808, whisper_loss=0.06288, over 16685.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01061, ecapa_loss=0.0001459, whisper_loss=0.0902, over 3857322.01 frames. ], batch size: 66, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:55:44,122 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3215220.0, ans=0.1 2024-08-15 13:55:49,060 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3215220.0, ans=0.2 2024-08-15 13:55:54,953 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3215220.0, ans=0.0 2024-08-15 13:56:00,862 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 20 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-15 13:56:16,458 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-15 13:56:19,012 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.09 vs. limit=6.0 2024-08-15 13:56:36,395 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3215520.0, ans=0.0 2024-08-15 13:56:41,931 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3215520.0, ans=0.125 2024-08-15 13:56:41,996 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 13:57:05,448 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3215620.0, ans=0.125 2024-08-15 13:57:08,436 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 2750, loss[loss=0.08257, beats_loss=0.01161, ecapa_loss=0.000173, whisper_loss=0.06923, over 21480.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01054, ecapa_loss=0.0001451, whisper_loss=0.09091, over 3877565.73 frames. ], batch size: 90, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:57:23,635 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.865e+01 2.384e+01 2.723e+01 3.158e+01 5.499e+01, threshold=5.446e+01, percent-clipped=1.0 2024-08-15 13:57:27,999 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 16 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-15 13:57:30,079 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.09 vs. limit=15.0 2024-08-15 13:57:40,480 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-15 13:57:57,940 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=8.22 vs. limit=12.0 2024-08-15 13:58:07,234 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3216020.0, ans=0.07 2024-08-15 13:58:19,456 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 22 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-15 13:58:35,338 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 2800, loss[loss=0.08488, beats_loss=0.01314, ecapa_loss=0.000139, whisper_loss=0.07035, over 15472.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01048, ecapa_loss=0.0001463, whisper_loss=0.09171, over 3847152.96 frames. ], batch size: 61, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:58:35,877 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3216220.0, ans=0.1 2024-08-15 13:58:37,725 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-15 13:58:41,302 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3216220.0, ans=0.0 2024-08-15 13:58:41,505 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.74 vs. limit=15.0 2024-08-15 13:58:49,572 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3216220.0, ans=0.05 2024-08-15 13:58:57,031 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3216320.0, ans=0.2 2024-08-15 13:59:02,054 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3216320.0, ans=0.1 2024-08-15 13:59:02,243 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.61 vs. limit=15.0 2024-08-15 13:59:09,853 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.027e-02 2024-08-15 13:59:22,653 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3216420.0, ans=0.125 2024-08-15 13:59:43,596 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 23 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-15 13:59:44,975 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 30 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-15 13:59:48,633 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3216620.0, ans=0.1 2024-08-15 13:59:49,518 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 30 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-15 13:59:51,611 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3216620.0, ans=0.0 2024-08-15 14:00:02,941 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 2850, loss[loss=0.1065, beats_loss=0.01179, ecapa_loss=0.0001342, whisper_loss=0.09332, over 22477.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01049, ecapa_loss=0.0001473, whisper_loss=0.09172, over 3872294.32 frames. ], batch size: 91, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:00:04,545 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3216720.0, ans=0.125 2024-08-15 14:00:19,258 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.948e+01 2.378e+01 2.685e+01 2.976e+01 3.795e+01, threshold=5.370e+01, percent-clipped=0.0 2024-08-15 14:00:20,973 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 23 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-15 14:00:21,414 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.66 vs. limit=15.0 2024-08-15 14:00:24,930 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.84 vs. limit=12.0 2024-08-15 14:00:28,385 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 13 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-15 14:00:29,956 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3216820.0, ans=0.09899494936611666 2024-08-15 14:00:34,862 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 28 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-15 14:00:38,012 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-15 14:00:38,246 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3216920.0, ans=0.0 2024-08-15 14:01:04,131 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.36 vs. limit=12.0 2024-08-15 14:01:06,475 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.02 vs. limit=22.5 2024-08-15 14:01:06,701 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.76 vs. limit=15.0 2024-08-15 14:01:30,186 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 2900, loss[loss=0.1089, beats_loss=0.01256, ecapa_loss=0.0001185, whisper_loss=0.09516, over 22337.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0106, ecapa_loss=0.0001481, whisper_loss=0.09078, over 3877570.23 frames. ], batch size: 89, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:01:34,633 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-15 14:01:47,706 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-15 14:01:59,653 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.11 vs. limit=10.0 2024-08-15 14:02:40,739 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 26 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-15 14:02:46,340 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 22 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-15 14:02:51,753 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 2950, loss[loss=0.09297, beats_loss=0.01054, ecapa_loss=0.0001881, whisper_loss=0.08054, over 19131.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01063, ecapa_loss=0.0001487, whisper_loss=0.09028, over 3899448.45 frames. ], batch size: 82, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:02:54,500 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=3217720.0, ans=0.5 2024-08-15 14:03:04,673 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3217720.0, ans=0.0 2024-08-15 14:03:05,943 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3217720.0, ans=0.04949747468305833 2024-08-15 14:03:06,701 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.950e+01 2.332e+01 2.577e+01 2.863e+01 4.280e+01, threshold=5.153e+01, percent-clipped=0.0 2024-08-15 14:03:08,929 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3217820.0, ans=0.0 2024-08-15 14:03:13,819 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3217820.0, ans=0.0 2024-08-15 14:03:42,516 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3218020.0, ans=0.05 2024-08-15 14:03:48,542 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 24 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-15 14:03:48,966 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3218020.0, ans=0.125 2024-08-15 14:03:58,630 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3218120.0, ans=0.125 2024-08-15 14:04:14,059 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3218120.0, ans=0.0 2024-08-15 14:04:16,023 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 3000, loss[loss=0.1053, beats_loss=0.00985, ecapa_loss=0.0001268, whisper_loss=0.09419, over 23674.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01057, ecapa_loss=0.0001488, whisper_loss=0.09084, over 3918429.28 frames. ], batch size: 88, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:04:16,024 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-15 14:04:55,011 INFO [train_multi_KD3.py:1149] (1/4) Epoch 23, validation on ASR_libri: loss=0.2523, beats_loss=0, ecapa_loss=0.0005381, whisper_loss=0.2469, over 922467.00 frames. 2024-08-15 14:05:14,435 INFO [train_multi_KD3.py:1149] (1/4) Epoch 23, validation on SV_voxceleb1: loss=0.004148, beats_loss=0, ecapa_loss=0.0004148, whisper_loss=0, over 939242.00 frames. 2024-08-15 14:06:40,043 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.7686, 2.4891, 2.7073, 2.6047, 3.2240, 2.5847, 2.7670, 2.4501], device='cuda:1') 2024-08-15 14:07:09,225 INFO [train_multi_KD3.py:1149] (1/4) Epoch 23, validation on AT_audioset: loss=0.02341, beats_loss=0.02341, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 14:07:09,229 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 30639MB 2024-08-15 14:07:13,454 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 21 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-15 14:07:24,303 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3218320.0, ans=0.2 2024-08-15 14:07:37,807 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.32 vs. limit=12.0 2024-08-15 14:07:39,734 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 35 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-15 14:07:48,281 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 24 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-15 14:07:53,455 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3218420.0, ans=0.2 2024-08-15 14:07:58,458 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3218520.0, ans=0.2 2024-08-15 14:07:58,794 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.46 vs. limit=15.0 2024-08-15 14:08:04,746 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-15 14:08:20,059 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.55 vs. limit=15.0 2024-08-15 14:08:25,656 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3218620.0, ans=0.2 2024-08-15 14:08:33,878 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 3050, loss[loss=0.1021, beats_loss=0.01066, ecapa_loss=0.0001599, whisper_loss=0.08986, over 23196.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01047, ecapa_loss=0.0001509, whisper_loss=0.09171, over 3940631.27 frames. ], batch size: 93, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:08:40,968 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 24 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-15 14:08:49,819 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 23 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-15 14:08:51,313 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.875e+01 2.307e+01 2.650e+01 2.894e+01 1.730e+02, threshold=5.300e+01, percent-clipped=1.0 2024-08-15 14:09:05,655 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3218820.0, ans=0.1 2024-08-15 14:09:12,391 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=3218920.0, ans=0.05 2024-08-15 14:09:38,524 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3219020.0, ans=0.0 2024-08-15 14:09:43,808 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 25 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-15 14:09:46,711 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 17 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-15 14:09:47,027 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.08 vs. limit=15.0 2024-08-15 14:10:01,162 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 3100, loss[loss=0.1044, beats_loss=0.01294, ecapa_loss=0.0001197, whisper_loss=0.0903, over 20814.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0105, ecapa_loss=0.0001501, whisper_loss=0.09157, over 3959754.52 frames. ], batch size: 80, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:10:11,554 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3219220.0, ans=0.125 2024-08-15 14:10:11,578 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3219220.0, ans=0.125 2024-08-15 14:10:26,801 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 23 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-15 14:10:34,537 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 23 from LS+wenet, 18 from Vox, 50 fro AS 2024-08-15 14:10:36,503 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3219420.0, ans=0.125 2024-08-15 14:10:43,580 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 38 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-15 14:10:43,736 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3219420.0, ans=0.035 2024-08-15 14:10:52,205 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 22 from LS+wenet, 12 from Vox, 38 fro AS 2024-08-15 14:10:52,422 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3219520.0, ans=0.125 2024-08-15 14:11:07,130 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3219620.0, ans=0.125 2024-08-15 14:11:08,334 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 26 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-15 14:11:12,767 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.39 vs. limit=10.0 2024-08-15 14:11:22,809 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 3150, loss[loss=0.1118, beats_loss=0.009792, ecapa_loss=0.0001433, whisper_loss=0.1006, over 23736.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01064, ecapa_loss=0.0001497, whisper_loss=0.09153, over 3972548.41 frames. ], batch size: 90, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:11:38,442 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.945e+01 2.271e+01 2.467e+01 2.810e+01 4.738e+01, threshold=4.935e+01, percent-clipped=0.0 2024-08-15 14:11:54,342 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3219820.0, ans=0.125 2024-08-15 14:11:55,187 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-15 14:11:55,515 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3219920.0, ans=0.1 2024-08-15 14:12:20,256 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2024-08-15 14:12:25,491 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3220020.0, ans=0.0 2024-08-15 14:12:29,221 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3220120.0, ans=0.0 2024-08-15 14:12:42,773 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3220120.0, ans=0.125 2024-08-15 14:12:48,118 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 3200, loss[loss=0.1134, beats_loss=0.008074, ecapa_loss=0.0001586, whisper_loss=0.1037, over 14483.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01064, ecapa_loss=0.0001488, whisper_loss=0.09127, over 3938384.30 frames. ], batch size: 57, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:12:58,311 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3220220.0, ans=0.125 2024-08-15 14:12:58,604 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.94 vs. limit=22.5 2024-08-15 14:13:11,036 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3220320.0, ans=0.04949747468305833 2024-08-15 14:13:26,729 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3220420.0, ans=0.125 2024-08-15 14:13:28,339 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 35 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-15 14:13:30,008 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 35 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-15 14:13:45,380 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3220520.0, ans=0.125 2024-08-15 14:13:50,432 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3220520.0, ans=0.2 2024-08-15 14:14:01,950 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 25 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-15 14:14:07,069 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3220620.0, ans=0.125 2024-08-15 14:14:10,514 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3220620.0, ans=0.125 2024-08-15 14:14:12,808 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=3220720.0, ans=0.1 2024-08-15 14:14:14,086 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 3250, loss[loss=0.1026, beats_loss=0.01093, ecapa_loss=0.0001141, whisper_loss=0.09056, over 19450.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01068, ecapa_loss=0.0001481, whisper_loss=0.09111, over 3937061.53 frames. ], batch size: 72, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:14:23,710 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-15 14:14:30,844 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.376e+01 2.667e+01 3.123e+01 1.417e+02, threshold=5.334e+01, percent-clipped=1.0 2024-08-15 14:14:38,599 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-15 14:14:47,418 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 24 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-15 14:15:05,456 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.84 vs. limit=22.5 2024-08-15 14:15:06,195 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 18 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-15 14:15:10,679 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 22 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-15 14:15:23,850 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.34 vs. limit=15.0 2024-08-15 14:15:38,088 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 3300, loss[loss=0.1016, beats_loss=0.01079, ecapa_loss=0.0001255, whisper_loss=0.08954, over 19522.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01066, ecapa_loss=0.0001497, whisper_loss=0.09114, over 3894471.79 frames. ], batch size: 76, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:15:47,667 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3221220.0, ans=0.0 2024-08-15 14:15:51,060 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3221220.0, ans=0.1 2024-08-15 14:16:39,443 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 14:17:04,224 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 3350, loss[loss=0.1301, beats_loss=0.008768, ecapa_loss=0.0001544, whisper_loss=0.1198, over 22546.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01065, ecapa_loss=0.000149, whisper_loss=0.09085, over 3889228.32 frames. ], batch size: 89, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:17:11,984 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3221720.0, ans=0.125 2024-08-15 14:17:18,094 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 34 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-15 14:17:19,369 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.262e+01 2.579e+01 2.848e+01 8.552e+01, threshold=5.158e+01, percent-clipped=1.0 2024-08-15 14:17:26,936 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 16 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-15 14:17:27,276 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3221820.0, ans=0.2 2024-08-15 14:17:32,412 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-15 14:17:37,289 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.922e-02 2024-08-15 14:17:51,097 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3221920.0, ans=0.125 2024-08-15 14:17:53,043 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3221920.0, ans=0.125 2024-08-15 14:17:53,885 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 18 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-15 14:18:01,160 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3222020.0, ans=0.0 2024-08-15 14:18:27,262 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 16 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-15 14:18:27,547 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3222120.0, ans=0.025 2024-08-15 14:18:29,822 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 3400, loss[loss=0.09305, beats_loss=0.009722, ecapa_loss=0.0001537, whisper_loss=0.08179, over 18929.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01061, ecapa_loss=0.00015, whisper_loss=0.09081, over 3922610.24 frames. ], batch size: 76, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:18:34,017 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3222220.0, ans=0.125 2024-08-15 14:18:44,166 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3222220.0, ans=0.125 2024-08-15 14:18:48,634 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 13 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-15 14:18:54,014 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3222320.0, ans=0.05 2024-08-15 14:18:59,218 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.93 vs. limit=5.0 2024-08-15 14:19:26,311 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3222520.0, ans=0.125 2024-08-15 14:19:40,722 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 20 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-15 14:19:43,018 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3222620.0, ans=0.125 2024-08-15 14:19:51,243 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 3450, loss[loss=0.09388, beats_loss=0.01114, ecapa_loss=0.0001738, whisper_loss=0.08101, over 19085.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01061, ecapa_loss=0.0001511, whisper_loss=0.09069, over 3909494.92 frames. ], batch size: 78, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:20:07,407 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.344e+01 2.608e+01 2.883e+01 4.857e+01, threshold=5.217e+01, percent-clipped=0.0 2024-08-15 14:20:09,028 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 31 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-15 14:20:23,386 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 23 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-15 14:20:28,323 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3222920.0, ans=0.125 2024-08-15 14:20:35,678 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3222920.0, ans=0.1 2024-08-15 14:20:38,560 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3222920.0, ans=0.2 2024-08-15 14:20:46,671 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.324e-01 2024-08-15 14:20:55,146 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3223020.0, ans=0.0 2024-08-15 14:21:07,865 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 27 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-15 14:21:17,677 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 3500, loss[loss=0.1086, beats_loss=0.007974, ecapa_loss=0.0001806, whisper_loss=0.0988, over 16893.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01054, ecapa_loss=0.0001501, whisper_loss=0.09134, over 3911158.70 frames. ], batch size: 66, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:21:32,405 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 27 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-15 14:21:33,870 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3223320.0, ans=0.07 2024-08-15 14:21:41,247 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3223320.0, ans=0.1 2024-08-15 14:21:47,775 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 28 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-15 14:22:13,698 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2024-08-15 14:22:28,580 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3223520.0, ans=0.125 2024-08-15 14:22:41,402 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3223620.0, ans=0.125 2024-08-15 14:22:42,817 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 26 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-15 14:22:49,506 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 3550, loss[loss=0.08547, beats_loss=0.01346, ecapa_loss=0.0001147, whisper_loss=0.07086, over 23050.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01061, ecapa_loss=0.0001499, whisper_loss=0.09068, over 3944472.87 frames. ], batch size: 93, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:22:55,951 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 33 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-15 14:22:58,648 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 35 from LS+wenet, 13 from Vox, 43 fro AS 2024-08-15 14:23:02,781 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.914e+01 2.289e+01 2.498e+01 2.772e+01 4.287e+01, threshold=4.995e+01, percent-clipped=0.0 2024-08-15 14:23:15,568 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 16 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-15 14:23:20,244 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3223820.0, ans=0.0 2024-08-15 14:24:02,796 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 18 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-15 14:24:16,214 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3224120.0, ans=0.125 2024-08-15 14:24:25,601 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 3600, loss[loss=0.1064, beats_loss=0.007201, ecapa_loss=0.0002126, whisper_loss=0.09709, over 13916.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01067, ecapa_loss=0.0001504, whisper_loss=0.09009, over 3923939.41 frames. ], batch size: 58, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:24:28,240 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.38 vs. limit=15.0 2024-08-15 14:24:36,553 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3224220.0, ans=0.125 2024-08-15 14:24:47,432 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3224320.0, ans=0.1 2024-08-15 14:25:09,558 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3224420.0, ans=0.125 2024-08-15 14:25:26,052 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=3224520.0, ans=10.0 2024-08-15 14:25:26,886 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-15 14:25:42,116 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3224620.0, ans=0.125 2024-08-15 14:26:09,050 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 3650, loss[loss=0.09547, beats_loss=0.01245, ecapa_loss=0.000151, whisper_loss=0.08151, over 22635.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01063, ecapa_loss=0.0001511, whisper_loss=0.09052, over 3917834.13 frames. ], batch size: 92, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:26:15,714 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3224720.0, ans=0.1 2024-08-15 14:26:27,053 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3224720.0, ans=0.0 2024-08-15 14:26:30,269 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.918e+01 2.327e+01 2.522e+01 2.934e+01 4.655e+01, threshold=5.044e+01, percent-clipped=0.0 2024-08-15 14:26:35,091 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.76 vs. limit=15.0 2024-08-15 14:26:40,692 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.36 vs. limit=10.0 2024-08-15 14:26:43,821 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 23 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-15 14:27:10,179 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3224920.0, ans=0.2 2024-08-15 14:27:34,754 WARNING [optim.py:496] (1/4) Scaling gradients by 0.05073240399360657, model_norm_threshold=50.43817901611328 2024-08-15 14:27:34,964 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.16, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.552e+05, grad_sumsq=1.540e+07, orig_rms_sq=1.008e-02 2024-08-15 14:27:44,714 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3225020.0, ans=0.0 2024-08-15 14:27:58,099 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3225120.0, ans=0.125 2024-08-15 14:28:12,146 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3225120.0, ans=0.125 2024-08-15 14:28:23,227 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 3700, loss[loss=0.1153, beats_loss=0.01086, ecapa_loss=0.0001434, whisper_loss=0.103, over 22104.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01063, ecapa_loss=0.0001512, whisper_loss=0.09064, over 3888941.73 frames. ], batch size: 85, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:28:34,283 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 19 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-15 14:28:36,037 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3225220.0, ans=0.1 2024-08-15 14:28:41,988 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 21 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-15 14:29:07,661 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3225320.0, ans=0.1 2024-08-15 14:29:10,679 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.10 vs. limit=6.0 2024-08-15 14:29:19,080 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3225420.0, ans=0.5 2024-08-15 14:29:33,820 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 14 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-15 14:29:38,189 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3225420.0, ans=0.125 2024-08-15 14:29:45,260 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.50 vs. limit=15.0 2024-08-15 14:29:56,251 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.08 vs. limit=15.0 2024-08-15 14:30:00,671 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3225520.0, ans=0.125 2024-08-15 14:30:09,141 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.39 vs. limit=10.0 2024-08-15 14:30:29,698 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3225620.0, ans=0.125 2024-08-15 14:30:37,133 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 3750, loss[loss=0.1232, beats_loss=0.01129, ecapa_loss=0.0001383, whisper_loss=0.1106, over 17243.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01057, ecapa_loss=0.0001514, whisper_loss=0.09103, over 3852310.32 frames. ], batch size: 65, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:31:00,449 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.879e+01 2.295e+01 2.515e+01 2.786e+01 9.942e+02, threshold=5.030e+01, percent-clipped=1.0 2024-08-15 14:31:08,450 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 24 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-15 14:31:11,686 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 14 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-15 14:32:01,910 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.48 vs. limit=15.0 2024-08-15 14:32:04,516 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 16 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-15 14:32:06,777 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3226020.0, ans=0.125 2024-08-15 14:32:13,380 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3226020.0, ans=0.125 2024-08-15 14:32:15,100 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3226020.0, ans=0.0 2024-08-15 14:32:16,766 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.24 vs. limit=15.0 2024-08-15 14:32:35,069 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3226120.0, ans=0.1 2024-08-15 14:32:38,019 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 3800, loss[loss=0.1093, beats_loss=0.009799, ecapa_loss=0.0001542, whisper_loss=0.098, over 21293.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01054, ecapa_loss=0.000151, whisper_loss=0.09141, over 3849985.48 frames. ], batch size: 84, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:32:59,279 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3226320.0, ans=0.2 2024-08-15 14:33:01,281 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3226320.0, ans=0.0 2024-08-15 14:33:01,323 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3226320.0, ans=0.0 2024-08-15 14:33:01,480 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.70 vs. limit=22.5 2024-08-15 14:33:26,071 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3226420.0, ans=0.0 2024-08-15 14:33:56,922 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 18 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-15 14:34:10,458 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 3850, loss[loss=0.09514, beats_loss=0.01264, ecapa_loss=0.0001192, whisper_loss=0.08131, over 18743.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01056, ecapa_loss=0.0001519, whisper_loss=0.09091, over 3831004.18 frames. ], batch size: 76, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:34:12,637 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3226720.0, ans=0.1 2024-08-15 14:34:24,140 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 19 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-15 14:34:26,817 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.11 vs. limit=15.0 2024-08-15 14:34:27,401 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.293e+01 2.527e+01 2.817e+01 3.723e+01, threshold=5.053e+01, percent-clipped=0.0 2024-08-15 14:34:42,897 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 15 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-15 14:34:59,203 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 23 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-15 14:35:17,739 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.66 vs. limit=22.5 2024-08-15 14:35:26,021 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 16 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-15 14:35:37,860 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3227120.0, ans=0.5 2024-08-15 14:35:43,165 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 3900, loss[loss=0.1095, beats_loss=0.01084, ecapa_loss=0.0001623, whisper_loss=0.09704, over 21885.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01055, ecapa_loss=0.0001536, whisper_loss=0.09133, over 3860113.62 frames. ], batch size: 88, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:35:43,643 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3227220.0, ans=0.125 2024-08-15 14:35:45,919 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.15 vs. limit=12.0 2024-08-15 14:35:56,629 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3227220.0, ans=0.025 2024-08-15 14:36:15,601 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 26 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-15 14:36:15,843 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3227320.0, ans=0.125 2024-08-15 14:36:16,345 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.12 vs. limit=6.0 2024-08-15 14:36:20,718 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 21 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-15 14:36:23,370 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3227420.0, ans=0.125 2024-08-15 14:36:28,323 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3227420.0, ans=0.125 2024-08-15 14:36:40,102 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3227520.0, ans=0.07 2024-08-15 14:36:52,349 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.37 vs. limit=6.0 2024-08-15 14:36:54,980 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3227620.0, ans=0.1 2024-08-15 14:37:10,545 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 3950, loss[loss=0.1303, beats_loss=0.009393, ecapa_loss=0.0001531, whisper_loss=0.1194, over 19526.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01053, ecapa_loss=0.0001538, whisper_loss=0.0914, over 3862270.77 frames. ], batch size: 75, lr: 2.72e-03, grad_scale: 1.152921504606847e+18 2024-08-15 14:37:26,153 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.995e+01 2.465e+01 2.719e+01 3.087e+01 1.515e+02, threshold=5.437e+01, percent-clipped=3.0 2024-08-15 14:37:29,841 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 24 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-15 14:37:34,086 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3227820.0, ans=0.1 2024-08-15 14:37:39,487 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 32 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-15 14:37:39,826 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3227820.0, ans=0.0 2024-08-15 14:37:51,599 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3227920.0, ans=0.125 2024-08-15 14:38:23,347 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3228120.0, ans=0.125 2024-08-15 14:38:39,649 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 4000, loss[loss=0.07578, beats_loss=0.01234, ecapa_loss=0.0001247, whisper_loss=0.06219, over 15394.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0105, ecapa_loss=0.0001528, whisper_loss=0.09116, over 3877763.06 frames. ], batch size: 63, lr: 2.72e-03, grad_scale: 1.152921504606847e+18 2024-08-15 14:38:43,218 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3228220.0, ans=0.125 2024-08-15 14:38:48,051 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.55 vs. limit=12.0 2024-08-15 14:38:49,536 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3228220.0, ans=0.125 2024-08-15 14:38:50,626 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 21 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-15 14:39:11,420 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.40 vs. limit=15.0 2024-08-15 14:39:16,855 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3228420.0, ans=0.125 2024-08-15 14:39:18,244 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 22 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-15 14:39:18,469 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3228420.0, ans=0.0 2024-08-15 14:39:48,471 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=3228620.0, ans=0.025 2024-08-15 14:40:05,378 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 4050, loss[loss=0.09425, beats_loss=0.01147, ecapa_loss=0.0001305, whisper_loss=0.08147, over 22479.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01045, ecapa_loss=0.0001534, whisper_loss=0.09135, over 3884251.11 frames. ], batch size: 88, lr: 2.72e-03, grad_scale: 1.152921504606847e+18 2024-08-15 14:40:18,439 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 32 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-15 14:40:20,551 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3228720.0, ans=0.1 2024-08-15 14:40:24,192 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.850e+01 2.316e+01 2.614e+01 2.943e+01 4.388e+01, threshold=5.229e+01, percent-clipped=0.0 2024-08-15 14:40:32,967 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3228820.0, ans=0.0 2024-08-15 14:40:33,030 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3228820.0, ans=0.125 2024-08-15 14:40:43,599 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3228820.0, ans=0.125 2024-08-15 14:41:06,331 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 33 from LS+wenet, 30 from Vox, 25 fro AS 2024-08-15 14:41:14,923 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3229020.0, ans=0.0 2024-08-15 14:41:29,950 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.67 vs. limit=12.0 2024-08-15 14:41:58,860 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 4100, loss[loss=0.094, beats_loss=0.01091, ecapa_loss=0.0001415, whisper_loss=0.08168, over 15042.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01048, ecapa_loss=0.000153, whisper_loss=0.09193, over 3858834.85 frames. ], batch size: 58, lr: 2.72e-03, grad_scale: 1.152921504606847e+18 2024-08-15 14:41:59,429 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3229220.0, ans=0.0 2024-08-15 14:42:33,126 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 36 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-15 14:42:54,093 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.31 vs. limit=15.0 2024-08-15 14:42:55,992 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3229420.0, ans=0.2 2024-08-15 14:43:00,379 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 22 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-15 14:43:27,431 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 29 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-15 14:43:32,499 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=3229520.0, ans=0.5 2024-08-15 14:43:38,346 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 24 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-15 14:44:04,059 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 4150, loss[loss=0.09772, beats_loss=0.01094, ecapa_loss=0.0001614, whisper_loss=0.08516, over 16472.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.0105, ecapa_loss=0.0001522, whisper_loss=0.09175, over 3890879.21 frames. ], batch size: 67, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:44:28,208 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.912e+01 2.317e+01 2.580e+01 2.886e+01 4.298e+01, threshold=5.160e+01, percent-clipped=0.0 2024-08-15 14:44:45,350 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-15 14:44:57,261 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.49 vs. limit=15.0 2024-08-15 14:45:16,057 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3230120.0, ans=0.0 2024-08-15 14:45:22,829 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 25 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-15 14:45:33,798 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 4200, loss[loss=0.1116, beats_loss=0.009985, ecapa_loss=0.0001661, whisper_loss=0.09991, over 21015.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01052, ecapa_loss=0.0001509, whisper_loss=0.09151, over 3897210.59 frames. ], batch size: 87, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:45:58,859 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3230320.0, ans=0.1 2024-08-15 14:46:24,472 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 28 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-15 14:46:49,088 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 20 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-15 14:47:00,288 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3230720.0, ans=0.1 2024-08-15 14:47:02,380 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 4250, loss[loss=0.1094, beats_loss=0.009385, ecapa_loss=0.0001856, whisper_loss=0.0982, over 22221.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01053, ecapa_loss=0.0001509, whisper_loss=0.09175, over 3912877.47 frames. ], batch size: 92, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:47:02,498 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 17 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-15 14:47:10,625 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3230720.0, ans=0.125 2024-08-15 14:47:20,516 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.308e+01 2.518e+01 2.859e+01 8.550e+01, threshold=5.036e+01, percent-clipped=1.0 2024-08-15 14:47:34,059 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3230820.0, ans=0.125 2024-08-15 14:47:41,919 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 20 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-15 14:47:48,072 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3230920.0, ans=0.1 2024-08-15 14:48:11,309 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3231020.0, ans=0.04949747468305833 2024-08-15 14:48:17,158 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 27 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-15 14:48:34,515 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 4300, loss[loss=0.09437, beats_loss=0.01173, ecapa_loss=0.0001461, whisper_loss=0.08119, over 19293.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01051, ecapa_loss=0.0001498, whisper_loss=0.09105, over 3902403.49 frames. ], batch size: 79, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:49:17,711 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3231420.0, ans=0.2 2024-08-15 14:49:19,299 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 14:49:24,346 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3231520.0, ans=0.1 2024-08-15 14:49:35,399 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 21 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-15 14:49:35,874 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.83 vs. limit=10.0 2024-08-15 14:49:45,859 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.18 vs. limit=15.0 2024-08-15 14:49:59,342 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 4350, loss[loss=0.1117, beats_loss=0.01083, ecapa_loss=0.0001418, whisper_loss=0.09944, over 15821.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01041, ecapa_loss=0.0001507, whisper_loss=0.09165, over 3889145.88 frames. ], batch size: 62, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:50:15,576 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 19 from LS+wenet, 11 from Vox, 36 fro AS 2024-08-15 14:50:17,890 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.740e+01 2.376e+01 2.619e+01 2.961e+01 5.969e+01, threshold=5.237e+01, percent-clipped=2.0 2024-08-15 14:50:22,582 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 30 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-15 14:50:38,437 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3231920.0, ans=0.1 2024-08-15 14:51:05,941 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3232020.0, ans=0.2 2024-08-15 14:51:10,209 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3232120.0, ans=0.05 2024-08-15 14:51:28,312 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 4400, loss[loss=0.1296, beats_loss=0.00681, ecapa_loss=0.0001461, whisper_loss=0.1214, over 14841.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01041, ecapa_loss=0.0001503, whisper_loss=0.0921, over 3876932.86 frames. ], batch size: 55, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:51:35,736 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 23 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-15 14:51:41,749 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 36 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-15 14:51:45,500 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3232320.0, ans=0.125 2024-08-15 14:51:50,018 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-15 14:51:53,464 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3232320.0, ans=0.125 2024-08-15 14:52:06,122 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.04 vs. limit=15.0 2024-08-15 14:52:21,511 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.61 vs. limit=15.0 2024-08-15 14:52:36,218 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3232620.0, ans=0.125 2024-08-15 14:52:49,271 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.65 vs. limit=22.5 2024-08-15 14:52:51,294 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 4450, loss[loss=0.1101, beats_loss=0.009792, ecapa_loss=0.0001543, whisper_loss=0.09881, over 22690.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01046, ecapa_loss=0.0001497, whisper_loss=0.09178, over 3891003.38 frames. ], batch size: 91, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:53:08,826 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.602e+01 2.314e+01 2.575e+01 2.815e+01 3.995e+01, threshold=5.150e+01, percent-clipped=0.0 2024-08-15 14:53:16,633 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.75 vs. limit=6.0 2024-08-15 14:53:28,776 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3232920.0, ans=0.1 2024-08-15 14:53:33,099 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.97 vs. limit=10.0 2024-08-15 14:53:36,795 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3232920.0, ans=0.125 2024-08-15 14:53:36,835 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.396e-01 2024-08-15 14:53:49,192 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 23 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-15 14:53:51,576 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3233020.0, ans=0.125 2024-08-15 14:54:06,002 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3233120.0, ans=0.125 2024-08-15 14:54:12,674 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 14 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-15 14:54:23,329 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3233220.0, ans=0.125 2024-08-15 14:54:25,007 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 4500, loss[loss=0.09612, beats_loss=0.01093, ecapa_loss=0.0001432, whisper_loss=0.08376, over 16452.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01042, ecapa_loss=0.000149, whisper_loss=0.09144, over 3881341.99 frames. ], batch size: 66, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:54:25,226 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 23 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-15 14:54:28,489 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3233220.0, ans=0.09899494936611666 2024-08-15 14:54:31,818 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3233220.0, ans=0.125 2024-08-15 14:54:33,561 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3233220.0, ans=0.0 2024-08-15 14:54:35,490 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten.whitening_limit, batch_count=3233220.0, ans=15.0 2024-08-15 14:55:00,905 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3233420.0, ans=0.125 2024-08-15 14:55:02,662 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3233420.0, ans=0.0 2024-08-15 14:55:39,703 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3233620.0, ans=0.125 2024-08-15 14:55:51,197 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 4550, loss[loss=0.1108, beats_loss=0.01199, ecapa_loss=0.0001197, whisper_loss=0.09766, over 23752.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01043, ecapa_loss=0.00015, whisper_loss=0.09098, over 3877856.86 frames. ], batch size: 90, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:56:07,623 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.811e+01 2.406e+01 2.642e+01 2.962e+01 1.202e+02, threshold=5.284e+01, percent-clipped=1.0 2024-08-15 14:56:10,413 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3233820.0, ans=0.1 2024-08-15 14:56:29,946 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 30 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-15 14:56:35,608 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3233920.0, ans=0.125 2024-08-15 14:57:18,811 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 4600, loss[loss=0.1058, beats_loss=0.009604, ecapa_loss=0.0001529, whisper_loss=0.09471, over 22919.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01042, ecapa_loss=0.0001497, whisper_loss=0.09106, over 3899303.29 frames. ], batch size: 92, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:57:23,946 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3234220.0, ans=0.2 2024-08-15 14:57:23,955 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3234220.0, ans=0.0 2024-08-15 14:57:43,614 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3234320.0, ans=0.125 2024-08-15 14:57:44,520 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 16 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-15 14:58:30,311 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 22 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-15 14:58:40,002 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 4650, loss[loss=0.09448, beats_loss=0.01021, ecapa_loss=0.0001662, whisper_loss=0.0826, over 17906.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01046, ecapa_loss=0.0001497, whisper_loss=0.0908, over 3869184.44 frames. ], batch size: 75, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:58:42,642 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3234720.0, ans=0.0 2024-08-15 14:58:43,135 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.69 vs. limit=15.0 2024-08-15 14:58:50,668 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3234720.0, ans=0.125 2024-08-15 14:58:56,436 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.866e+01 2.314e+01 2.498e+01 2.884e+01 4.685e+01, threshold=4.995e+01, percent-clipped=0.0 2024-08-15 14:59:00,833 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3234820.0, ans=0.0 2024-08-15 14:59:34,206 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.53 vs. limit=15.0 2024-08-15 15:00:08,340 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 4700, loss[loss=0.1218, beats_loss=0.009208, ecapa_loss=0.0001535, whisper_loss=0.1111, over 19928.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01044, ecapa_loss=0.0001499, whisper_loss=0.09154, over 3885549.25 frames. ], batch size: 78, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:00:15,548 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 23 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-15 15:00:43,318 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-15 15:00:51,442 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.77 vs. limit=22.5 2024-08-15 15:00:58,615 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 18 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-15 15:01:00,218 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 16 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-15 15:01:03,020 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.53 vs. limit=15.0 2024-08-15 15:01:05,901 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 36 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-15 15:01:12,455 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.15 vs. limit=22.5 2024-08-15 15:01:18,595 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 26 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-15 15:01:33,465 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 4750, loss[loss=0.1195, beats_loss=0.009879, ecapa_loss=0.0001142, whisper_loss=0.1085, over 16185.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01046, ecapa_loss=0.0001492, whisper_loss=0.09177, over 3882301.89 frames. ], batch size: 58, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:01:39,882 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 18 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-15 15:01:49,033 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.772e+01 2.242e+01 2.450e+01 2.790e+01 3.790e+01, threshold=4.901e+01, percent-clipped=0.0 2024-08-15 15:01:54,375 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-15 15:02:03,931 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3235920.0, ans=0.0 2024-08-15 15:02:19,551 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-08-15 15:02:38,037 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 22 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-15 15:02:51,646 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 4800, loss[loss=0.1052, beats_loss=0.00765, ecapa_loss=0.0002281, whisper_loss=0.09523, over 14110.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01046, ecapa_loss=0.0001504, whisper_loss=0.09165, over 3891044.38 frames. ], batch size: 61, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:02:59,334 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.54 vs. limit=12.0 2024-08-15 15:03:02,161 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3236220.0, ans=0.0 2024-08-15 15:03:06,087 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 20 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-15 15:03:22,238 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3236420.0, ans=0.0 2024-08-15 15:03:30,904 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-15 15:03:34,295 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 25 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-15 15:03:49,036 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3236520.0, ans=0.125 2024-08-15 15:03:58,154 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3236620.0, ans=0.1 2024-08-15 15:04:05,629 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3236620.0, ans=0.125 2024-08-15 15:04:09,484 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 4850, loss[loss=0.09919, beats_loss=0.01136, ecapa_loss=0.0001336, whisper_loss=0.08649, over 23159.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01046, ecapa_loss=0.0001513, whisper_loss=0.09177, over 3867470.42 frames. ], batch size: 92, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:04:09,594 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 26 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-15 15:04:24,485 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.428e+01 2.638e+01 3.060e+01 4.898e+01, threshold=5.277e+01, percent-clipped=0.0 2024-08-15 15:04:28,815 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3236820.0, ans=0.125 2024-08-15 15:04:32,015 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3236820.0, ans=0.125 2024-08-15 15:04:33,489 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3236820.0, ans=0.125 2024-08-15 15:04:34,935 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3236820.0, ans=0.125 2024-08-15 15:04:41,049 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 18 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-15 15:04:42,905 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3236920.0, ans=0.0 2024-08-15 15:04:52,151 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 19 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-15 15:04:54,321 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3237020.0, ans=0.125 2024-08-15 15:04:56,825 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3237020.0, ans=0.0 2024-08-15 15:05:05,776 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 15:05:09,469 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 30 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-15 15:05:11,188 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3237120.0, ans=0.125 2024-08-15 15:05:21,932 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 4900, loss[loss=0.08796, beats_loss=0.00987, ecapa_loss=0.0001489, whisper_loss=0.07661, over 19896.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01053, ecapa_loss=0.0001514, whisper_loss=0.09095, over 3868535.28 frames. ], batch size: 81, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:05:33,262 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-15 15:05:34,089 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.51 vs. limit=15.0 2024-08-15 15:05:42,923 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 25 from LS+wenet, 32 from Vox, 35 fro AS 2024-08-15 15:05:46,161 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3237320.0, ans=0.125 2024-08-15 15:06:01,934 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.07 vs. limit=15.0 2024-08-15 15:06:07,005 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3237520.0, ans=0.1 2024-08-15 15:06:16,376 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 21 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-15 15:06:31,229 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 4950, loss[loss=0.0815, beats_loss=0.0112, ecapa_loss=0.0001078, whisper_loss=0.06923, over 14425.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01048, ecapa_loss=0.0001516, whisper_loss=0.09099, over 3865392.38 frames. ], batch size: 55, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:06:41,095 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 27 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-15 15:06:44,213 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 26 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-15 15:06:45,359 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.025e+01 2.352e+01 2.615e+01 2.945e+01 2.370e+02, threshold=5.229e+01, percent-clipped=2.0 2024-08-15 15:06:46,997 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-15 15:06:54,222 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3237820.0, ans=0.1 2024-08-15 15:06:55,401 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 34 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-15 15:06:55,611 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3237820.0, ans=0.0 2024-08-15 15:07:05,489 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3237920.0, ans=0.125 2024-08-15 15:07:12,140 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3238020.0, ans=0.0 2024-08-15 15:07:19,087 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-15 15:07:32,909 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3238120.0, ans=0.0 2024-08-15 15:07:40,722 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 5000, loss[loss=0.08553, beats_loss=0.01089, ecapa_loss=0.0001589, whisper_loss=0.07305, over 17163.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01048, ecapa_loss=0.0001515, whisper_loss=0.09111, over 3857975.44 frames. ], batch size: 68, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:07:44,315 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3238220.0, ans=0.0 2024-08-15 15:07:45,783 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3238220.0, ans=0.0 2024-08-15 15:07:50,206 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.59 vs. limit=12.0 2024-08-15 15:07:57,198 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 24 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-15 15:07:58,718 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 14 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-15 15:07:59,225 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3238320.0, ans=0.0 2024-08-15 15:08:08,947 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3238420.0, ans=0.0 2024-08-15 15:08:15,390 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3238420.0, ans=0.07 2024-08-15 15:08:34,183 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.11 vs. limit=10.0 2024-08-15 15:08:35,624 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.77 vs. limit=10.0 2024-08-15 15:08:43,157 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 28 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-15 15:08:46,097 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3238620.0, ans=0.0 2024-08-15 15:08:48,454 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 5050, loss[loss=0.1, beats_loss=0.01115, ecapa_loss=0.0001573, whisper_loss=0.08728, over 21874.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01052, ecapa_loss=0.0001507, whisper_loss=0.09129, over 3866895.99 frames. ], batch size: 90, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:08:55,915 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3238720.0, ans=0.125 2024-08-15 15:08:55,986 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3238720.0, ans=0.125 2024-08-15 15:09:01,267 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3238820.0, ans=0.125 2024-08-15 15:09:02,035 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.810e+01 2.271e+01 2.496e+01 2.876e+01 1.159e+02, threshold=4.993e+01, percent-clipped=2.0 2024-08-15 15:09:11,374 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-15 15:09:17,124 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.12 vs. limit=12.0 2024-08-15 15:09:17,361 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.76 vs. limit=15.0 2024-08-15 15:09:34,437 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3239020.0, ans=0.125 2024-08-15 15:09:51,760 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 21 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-15 15:09:52,135 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3239120.0, ans=0.04949747468305833 2024-08-15 15:09:55,809 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 5100, loss[loss=0.09278, beats_loss=0.0113, ecapa_loss=0.0001361, whisper_loss=0.08013, over 16146.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01051, ecapa_loss=0.000151, whisper_loss=0.09138, over 3844823.58 frames. ], batch size: 63, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:10:10,291 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 24 from LS+wenet, 30 from Vox, 39 fro AS 2024-08-15 15:10:20,648 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3239320.0, ans=0.125 2024-08-15 15:10:23,489 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3239420.0, ans=0.125 2024-08-15 15:10:29,801 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3239420.0, ans=0.1 2024-08-15 15:10:34,785 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 16 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-15 15:10:36,784 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.48 vs. limit=15.0 2024-08-15 15:10:38,221 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.90 vs. limit=15.0 2024-08-15 15:11:03,284 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 5150, loss[loss=0.1044, beats_loss=0.01154, ecapa_loss=0.0001367, whisper_loss=0.09151, over 21710.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01063, ecapa_loss=0.0001505, whisper_loss=0.09056, over 3835188.33 frames. ], batch size: 86, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:11:16,909 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.662e+01 2.367e+01 2.663e+01 3.034e+01 8.372e+01, threshold=5.326e+01, percent-clipped=1.0 2024-08-15 15:11:17,110 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 24 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-15 15:11:24,124 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 15 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-15 15:11:35,171 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3239920.0, ans=0.125 2024-08-15 15:11:56,002 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3240020.0, ans=0.09899494936611666 2024-08-15 15:12:15,175 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 5200, loss[loss=0.07091, beats_loss=0.01283, ecapa_loss=0.0001124, whisper_loss=0.05695, over 16372.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0106, ecapa_loss=0.0001502, whisper_loss=0.0903, over 3826622.99 frames. ], batch size: 62, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:12:18,430 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3240220.0, ans=0.0 2024-08-15 15:12:22,459 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 28 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-15 15:12:25,129 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-15 15:12:38,612 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3240320.0, ans=0.0 2024-08-15 15:12:48,420 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3240420.0, ans=0.1 2024-08-15 15:13:00,152 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 25 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-15 15:13:01,842 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3240520.0, ans=0.2 2024-08-15 15:13:04,024 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-15 15:13:20,738 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 22 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-15 15:13:26,221 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 5250, loss[loss=0.1128, beats_loss=0.01143, ecapa_loss=0.0001525, whisper_loss=0.09981, over 22886.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01058, ecapa_loss=0.0001497, whisper_loss=0.09079, over 3847346.71 frames. ], batch size: 92, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:13:26,755 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3240720.0, ans=0.2 2024-08-15 15:13:28,254 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3240720.0, ans=0.0 2024-08-15 15:13:35,531 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3240720.0, ans=0.125 2024-08-15 15:13:35,544 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3240720.0, ans=0.0 2024-08-15 15:13:40,583 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.871e+01 2.275e+01 2.578e+01 2.785e+01 8.879e+01, threshold=5.156e+01, percent-clipped=2.0 2024-08-15 15:13:53,877 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3240920.0, ans=0.125 2024-08-15 15:14:02,563 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 27 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-15 15:14:03,141 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3240920.0, ans=0.2 2024-08-15 15:14:21,856 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3241020.0, ans=0.125 2024-08-15 15:14:23,157 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3241120.0, ans=0.2 2024-08-15 15:14:23,498 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.28 vs. limit=22.5 2024-08-15 15:14:28,475 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 25 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-15 15:14:35,420 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3241120.0, ans=0.1 2024-08-15 15:14:37,470 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 5300, loss[loss=0.1037, beats_loss=0.01068, ecapa_loss=0.00014, whisper_loss=0.09166, over 22708.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01059, ecapa_loss=0.0001496, whisper_loss=0.09052, over 3830975.10 frames. ], batch size: 90, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:14:50,488 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3241320.0, ans=0.2 2024-08-15 15:14:53,202 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 26 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-15 15:14:53,529 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3241320.0, ans=0.1 2024-08-15 15:15:37,340 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.05 vs. limit=15.0 2024-08-15 15:15:40,984 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-15 15:15:47,630 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 5350, loss[loss=0.1198, beats_loss=0.0116, ecapa_loss=0.0001558, whisper_loss=0.1067, over 22280.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01054, ecapa_loss=0.0001494, whisper_loss=0.09085, over 3846895.22 frames. ], batch size: 88, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:15:53,591 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3241720.0, ans=0.1 2024-08-15 15:15:53,677 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3241720.0, ans=0.07 2024-08-15 15:16:01,455 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.335e+01 2.708e+01 3.077e+01 2.135e+02, threshold=5.416e+01, percent-clipped=3.0 2024-08-15 15:16:10,703 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.83 vs. limit=15.0 2024-08-15 15:16:11,531 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3241820.0, ans=0.125 2024-08-15 15:16:22,707 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3241920.0, ans=0.0 2024-08-15 15:16:32,279 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3242020.0, ans=0.2 2024-08-15 15:16:56,974 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 5400, loss[loss=0.08594, beats_loss=0.01054, ecapa_loss=0.0001699, whisper_loss=0.0737, over 19093.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01051, ecapa_loss=0.0001497, whisper_loss=0.09112, over 3865622.90 frames. ], batch size: 79, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:16:59,906 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3242220.0, ans=0.035 2024-08-15 15:17:03,574 WARNING [optim.py:496] (1/4) Scaling gradients by 0.08951901644468307, model_norm_threshold=54.15595626831055 2024-08-15 15:17:03,744 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.765e+04, grad_sumsq=4.730e+06, orig_rms_sq=1.007e-02 2024-08-15 15:17:06,733 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-15 15:17:12,412 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 27 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-15 15:17:15,057 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-15 15:17:18,309 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3242320.0, ans=0.125 2024-08-15 15:17:25,454 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3242420.0, ans=0.1 2024-08-15 15:17:32,521 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 20 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-15 15:17:48,108 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.84 vs. limit=15.0 2024-08-15 15:18:07,658 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 5450, loss[loss=0.1013, beats_loss=0.008434, ecapa_loss=0.0001326, whisper_loss=0.09153, over 13962.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01056, ecapa_loss=0.000149, whisper_loss=0.09059, over 3863512.77 frames. ], batch size: 54, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:18:19,837 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 27 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-15 15:18:22,582 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.791e+01 2.261e+01 2.541e+01 2.887e+01 6.050e+02, threshold=5.082e+01, percent-clipped=2.0 2024-08-15 15:18:23,725 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.42 vs. limit=15.0 2024-08-15 15:18:42,292 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3242920.0, ans=0.0 2024-08-15 15:18:53,388 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3243020.0, ans=0.0 2024-08-15 15:18:55,448 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3243020.0, ans=0.1 2024-08-15 15:18:58,068 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3243020.0, ans=0.04949747468305833 2024-08-15 15:19:04,180 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.37 vs. limit=15.0 2024-08-15 15:19:15,770 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3243120.0, ans=0.2 2024-08-15 15:19:21,403 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2024-08-15 15:19:26,005 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 5500, loss[loss=0.106, beats_loss=0.01075, ecapa_loss=0.000144, whisper_loss=0.09379, over 19229.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01051, ecapa_loss=0.0001488, whisper_loss=0.09164, over 3900023.98 frames. ], batch size: 74, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:20:00,046 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 26 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-15 15:20:05,751 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3243420.0, ans=0.0 2024-08-15 15:20:14,119 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3243520.0, ans=0.1 2024-08-15 15:20:18,416 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3243520.0, ans=0.0 2024-08-15 15:20:45,963 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.79 vs. limit=10.0 2024-08-15 15:20:48,448 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-15 15:20:49,370 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 5550, loss[loss=0.1098, beats_loss=0.008871, ecapa_loss=0.0001511, whisper_loss=0.09945, over 24401.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01052, ecapa_loss=0.0001484, whisper_loss=0.09153, over 3926032.30 frames. ], batch size: 93, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:20:58,809 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 28 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-15 15:21:06,157 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.298e+01 2.585e+01 2.775e+01 4.176e+01, threshold=5.170e+01, percent-clipped=0.0 2024-08-15 15:21:10,853 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3243820.0, ans=0.125 2024-08-15 15:21:10,914 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3243820.0, ans=0.0 2024-08-15 15:21:20,402 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 21 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-15 15:21:42,178 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten.whitening_limit, batch_count=3244020.0, ans=15.0 2024-08-15 15:22:11,643 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.60 vs. limit=15.0 2024-08-15 15:22:13,551 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 5600, loss[loss=0.09781, beats_loss=0.01078, ecapa_loss=0.0001406, whisper_loss=0.08562, over 22080.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01054, ecapa_loss=0.0001486, whisper_loss=0.09141, over 3916172.57 frames. ], batch size: 88, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:22:37,872 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3244320.0, ans=0.09899494936611666 2024-08-15 15:22:45,632 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3244420.0, ans=0.05 2024-08-15 15:22:45,635 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3244420.0, ans=0.1 2024-08-15 15:23:04,129 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3244520.0, ans=0.0 2024-08-15 15:23:11,263 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 23 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-15 15:23:16,530 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3244620.0, ans=0.125 2024-08-15 15:23:16,577 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3244620.0, ans=0.125 2024-08-15 15:23:21,149 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-15 15:23:26,684 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 19 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-15 15:23:29,722 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.12 vs. limit=15.0 2024-08-15 15:23:35,733 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 5650, loss[loss=0.1074, beats_loss=0.009145, ecapa_loss=0.0001342, whisper_loss=0.0969, over 16307.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01064, ecapa_loss=0.000148, whisper_loss=0.09049, over 3919810.10 frames. ], batch size: 61, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:23:44,759 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.11 vs. limit=22.5 2024-08-15 15:23:55,551 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.879e+01 2.317e+01 2.489e+01 2.780e+01 3.847e+01, threshold=4.978e+01, percent-clipped=0.0 2024-08-15 15:24:18,038 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3244920.0, ans=0.0 2024-08-15 15:24:22,513 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3244920.0, ans=0.125 2024-08-15 15:24:24,108 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 25 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-15 15:24:34,739 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-15 15:24:39,641 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 22 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-15 15:24:41,461 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3245020.0, ans=0.0 2024-08-15 15:24:43,454 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3245020.0, ans=0.125 2024-08-15 15:24:51,342 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3245120.0, ans=0.09899494936611666 2024-08-15 15:25:01,897 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 5700, loss[loss=0.0812, beats_loss=0.01264, ecapa_loss=0.000182, whisper_loss=0.06675, over 16079.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01061, ecapa_loss=0.0001492, whisper_loss=0.09099, over 3939918.76 frames. ], batch size: 68, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:25:22,622 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3245320.0, ans=0.1 2024-08-15 15:25:27,598 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3245320.0, ans=0.2 2024-08-15 15:25:34,882 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3245420.0, ans=0.04949747468305833 2024-08-15 15:25:55,974 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3245520.0, ans=0.0 2024-08-15 15:26:08,786 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 27 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-15 15:26:09,029 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3245620.0, ans=0.2 2024-08-15 15:26:17,099 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 27 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-15 15:26:28,885 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 5750, loss[loss=0.09911, beats_loss=0.01085, ecapa_loss=0.0001301, whisper_loss=0.08696, over 21941.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01054, ecapa_loss=0.0001501, whisper_loss=0.09139, over 3944136.23 frames. ], batch size: 88, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:26:35,858 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.05 vs. limit=15.0 2024-08-15 15:26:42,331 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3245720.0, ans=0.125 2024-08-15 15:26:46,367 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.388e+01 2.608e+01 2.971e+01 1.987e+02, threshold=5.216e+01, percent-clipped=2.0 2024-08-15 15:27:03,789 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3245920.0, ans=0.5 2024-08-15 15:27:05,574 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3245920.0, ans=0.125 2024-08-15 15:27:34,462 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3246020.0, ans=0.1 2024-08-15 15:27:37,482 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 40 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-15 15:27:40,542 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 19 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-15 15:27:40,772 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3246120.0, ans=0.0 2024-08-15 15:27:49,442 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 33 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-15 15:27:53,851 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 5800, loss[loss=0.08373, beats_loss=0.01187, ecapa_loss=0.0001686, whisper_loss=0.07017, over 21063.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01056, ecapa_loss=0.0001504, whisper_loss=0.09083, over 3921817.19 frames. ], batch size: 90, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:27:54,247 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=3246220.0, ans=0.025 2024-08-15 15:27:57,823 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3246220.0, ans=0.125 2024-08-15 15:28:03,542 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-15 15:28:12,381 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3246320.0, ans=0.1 2024-08-15 15:28:26,088 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3246420.0, ans=0.0 2024-08-15 15:28:26,131 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3246420.0, ans=0.125 2024-08-15 15:28:52,172 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 25 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-15 15:28:52,448 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3246520.0, ans=0.0 2024-08-15 15:29:01,359 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.31 vs. limit=15.0 2024-08-15 15:29:11,622 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 5850, loss[loss=0.1157, beats_loss=0.01012, ecapa_loss=0.000143, whisper_loss=0.1041, over 24028.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01057, ecapa_loss=0.0001501, whisper_loss=0.09088, over 3936348.68 frames. ], batch size: 94, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:29:26,435 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.819e+01 2.269e+01 2.533e+01 2.888e+01 4.930e+01, threshold=5.067e+01, percent-clipped=0.0 2024-08-15 15:29:28,647 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3246820.0, ans=0.1 2024-08-15 15:29:37,112 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3246820.0, ans=0.125 2024-08-15 15:29:42,622 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 24 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-15 15:29:51,674 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3246920.0, ans=0.125 2024-08-15 15:30:11,816 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 34 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-15 15:30:15,592 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3247120.0, ans=0.0 2024-08-15 15:30:25,920 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 5900, loss[loss=0.09189, beats_loss=0.01305, ecapa_loss=0.0001404, whisper_loss=0.07744, over 19111.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01064, ecapa_loss=0.0001497, whisper_loss=0.08984, over 3899105.12 frames. ], batch size: 78, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:30:26,572 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3247220.0, ans=0.125 2024-08-15 15:30:59,896 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3247420.0, ans=0.125 2024-08-15 15:31:23,907 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3247520.0, ans=0.2 2024-08-15 15:31:29,143 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.09 vs. limit=15.0 2024-08-15 15:31:34,519 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 20 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-15 15:31:37,182 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 27 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-15 15:31:43,873 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 5950, loss[loss=0.1023, beats_loss=0.01007, ecapa_loss=0.000161, whisper_loss=0.09062, over 19458.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01068, ecapa_loss=0.0001498, whisper_loss=0.08917, over 3878269.80 frames. ], batch size: 79, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:31:46,070 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3247720.0, ans=0.1 2024-08-15 15:31:58,715 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.298e+01 2.552e+01 2.863e+01 3.856e+01, threshold=5.104e+01, percent-clipped=0.0 2024-08-15 15:32:07,753 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3247820.0, ans=0.1 2024-08-15 15:32:08,974 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-15 15:32:25,695 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-15 15:32:27,221 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3248020.0, ans=0.035 2024-08-15 15:32:28,716 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3248020.0, ans=0.0 2024-08-15 15:32:54,495 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 6000, loss[loss=0.09691, beats_loss=0.01398, ecapa_loss=0.0001074, whisper_loss=0.08185, over 21737.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01066, ecapa_loss=0.0001501, whisper_loss=0.08978, over 3877020.31 frames. ], batch size: 87, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:32:54,496 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-15 15:33:33,430 INFO [train_multi_KD3.py:1149] (1/4) Epoch 23, validation on ASR_libri: loss=0.2517, beats_loss=0, ecapa_loss=0.0005302, whisper_loss=0.2464, over 922467.00 frames. 2024-08-15 15:33:54,081 INFO [train_multi_KD3.py:1149] (1/4) Epoch 23, validation on SV_voxceleb1: loss=0.004186, beats_loss=0, ecapa_loss=0.0004186, whisper_loss=0, over 939242.00 frames. 2024-08-15 15:35:52,411 INFO [train_multi_KD3.py:1149] (1/4) Epoch 23, validation on AT_audioset: loss=0.02334, beats_loss=0.02334, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 15:35:52,415 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 30639MB 2024-08-15 15:35:52,537 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 21 from LS+wenet, 27 from Vox, 47 fro AS 2024-08-15 15:35:56,945 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3248220.0, ans=0.1 2024-08-15 15:35:59,944 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.44 vs. limit=22.5 2024-08-15 15:36:03,228 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.68 vs. limit=15.0 2024-08-15 15:36:06,473 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 24 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-15 15:36:07,886 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 32 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-15 15:36:08,433 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3248320.0, ans=0.0 2024-08-15 15:36:14,101 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3248320.0, ans=0.1 2024-08-15 15:36:17,746 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 35 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-15 15:36:22,813 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.39 vs. limit=22.5 2024-08-15 15:36:23,537 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 22 from LS+wenet, 31 from Vox, 38 fro AS 2024-08-15 15:36:25,381 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3248420.0, ans=0.1 2024-08-15 15:36:31,202 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3248420.0, ans=0.0 2024-08-15 15:36:34,112 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3248520.0, ans=0.09899494936611666 2024-08-15 15:36:48,142 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3248620.0, ans=0.125 2024-08-15 15:36:56,518 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.09 vs. limit=15.0 2024-08-15 15:37:02,506 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 6050, loss[loss=0.1004, beats_loss=0.01088, ecapa_loss=0.0001607, whisper_loss=0.08793, over 22105.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01063, ecapa_loss=0.0001495, whisper_loss=0.09068, over 3878695.68 frames. ], batch size: 92, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:37:05,320 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 13 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-15 15:37:14,769 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 22 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-15 15:37:16,247 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.938e+01 2.402e+01 2.591e+01 2.977e+01 8.754e+01, threshold=5.182e+01, percent-clipped=1.0 2024-08-15 15:37:22,490 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3248820.0, ans=0.2 2024-08-15 15:37:26,806 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3248820.0, ans=0.125 2024-08-15 15:38:06,377 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 15:38:12,788 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 6100, loss[loss=0.1072, beats_loss=0.01035, ecapa_loss=0.0001353, whisper_loss=0.09552, over 22362.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01072, ecapa_loss=0.000149, whisper_loss=0.08967, over 3880035.52 frames. ], batch size: 87, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:38:17,406 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3249220.0, ans=0.0 2024-08-15 15:38:28,730 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 24 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-15 15:39:11,909 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 19 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-15 15:39:17,367 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-15 15:39:22,911 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 6150, loss[loss=0.1203, beats_loss=0.007442, ecapa_loss=0.000161, whisper_loss=0.1113, over 22383.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01066, ecapa_loss=0.0001492, whisper_loss=0.09005, over 3892011.16 frames. ], batch size: 88, lr: 2.71e-03, grad_scale: 1.152921504606847e+18 2024-08-15 15:39:29,732 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 22 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-15 15:39:37,101 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.845e+01 2.219e+01 2.437e+01 2.676e+01 4.381e+01, threshold=4.874e+01, percent-clipped=0.0 2024-08-15 15:39:37,216 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 26 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-15 15:39:37,660 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3249820.0, ans=0.0 2024-08-15 15:39:42,361 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3249820.0, ans=0.2 2024-08-15 15:39:47,509 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 23 from LS+wenet, 22 from Vox, 18 fro AS 2024-08-15 15:39:49,181 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3249820.0, ans=0.2 2024-08-15 15:40:02,246 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.76 vs. limit=15.0 2024-08-15 15:40:16,022 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3250020.0, ans=0.1 2024-08-15 15:40:16,026 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3250020.0, ans=0.05 2024-08-15 15:40:23,903 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.62 vs. limit=22.5 2024-08-15 15:40:34,288 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 6200, loss[loss=0.1009, beats_loss=0.01009, ecapa_loss=0.0001287, whisper_loss=0.08951, over 15284.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01063, ecapa_loss=0.0001495, whisper_loss=0.09034, over 3871830.29 frames. ], batch size: 56, lr: 2.71e-03, grad_scale: 1.152921504606847e+18 2024-08-15 15:40:43,522 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3250220.0, ans=0.07 2024-08-15 15:40:59,765 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3250320.0, ans=0.125 2024-08-15 15:41:06,804 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 31 from Vox, 30 fro AS 2024-08-15 15:41:10,092 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=3250420.0, ans=0.1 2024-08-15 15:41:23,006 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3250520.0, ans=0.1 2024-08-15 15:41:26,891 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-15 15:41:45,743 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3250720.0, ans=10.0 2024-08-15 15:41:46,457 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 6250, loss[loss=0.1062, beats_loss=0.008884, ecapa_loss=0.0001853, whisper_loss=0.0955, over 19081.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01062, ecapa_loss=0.0001497, whisper_loss=0.0902, over 3864201.47 frames. ], batch size: 77, lr: 2.71e-03, grad_scale: 1.152921504606847e+18 2024-08-15 15:42:01,941 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.964e+01 2.383e+01 2.624e+01 2.920e+01 1.622e+02, threshold=5.248e+01, percent-clipped=1.0 2024-08-15 15:42:09,724 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=7.565e-01 2024-08-15 15:42:23,528 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3250920.0, ans=0.5 2024-08-15 15:42:48,789 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.51 vs. limit=15.0 2024-08-15 15:42:56,423 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 6300, loss[loss=0.1263, beats_loss=0.008994, ecapa_loss=0.0001613, whisper_loss=0.1157, over 14060.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01065, ecapa_loss=0.0001501, whisper_loss=0.08984, over 3839616.85 frames. ], batch size: 54, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:43:20,484 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3251320.0, ans=0.125 2024-08-15 15:43:30,024 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 31 from Vox, 30 fro AS 2024-08-15 15:43:39,688 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3251520.0, ans=0.1 2024-08-15 15:43:46,863 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 22 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-15 15:43:52,002 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 23 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-15 15:43:56,935 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 25 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-15 15:43:58,236 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 16 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-15 15:44:06,877 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 6350, loss[loss=0.1381, beats_loss=0.007634, ecapa_loss=0.0001623, whisper_loss=0.1288, over 23346.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01063, ecapa_loss=0.0001512, whisper_loss=0.08929, over 3835528.42 frames. ], batch size: 89, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:44:15,753 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3251720.0, ans=0.125 2024-08-15 15:44:18,315 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3251720.0, ans=0.0 2024-08-15 15:44:18,577 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.73 vs. limit=10.0 2024-08-15 15:44:22,021 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.294e+01 2.523e+01 2.815e+01 3.585e+01, threshold=5.047e+01, percent-clipped=0.0 2024-08-15 15:44:41,002 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 26 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-15 15:45:05,776 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.20 vs. limit=12.0 2024-08-15 15:45:17,835 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 6400, loss[loss=0.1246, beats_loss=0.008991, ecapa_loss=0.0001457, whisper_loss=0.1141, over 19648.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01063, ecapa_loss=0.0001504, whisper_loss=0.08922, over 3833152.25 frames. ], batch size: 74, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:45:36,342 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3252320.0, ans=0.0 2024-08-15 15:45:40,548 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 15:46:27,820 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 6450, loss[loss=0.1155, beats_loss=0.01095, ecapa_loss=0.000122, whisper_loss=0.1033, over 23939.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01061, ecapa_loss=0.0001497, whisper_loss=0.09003, over 3861863.44 frames. ], batch size: 93, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:46:32,567 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3252720.0, ans=0.125 2024-08-15 15:46:38,095 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3252720.0, ans=0.1 2024-08-15 15:46:39,538 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3252720.0, ans=0.0 2024-08-15 15:46:42,914 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.984e+01 2.348e+01 2.696e+01 2.907e+01 4.718e+01, threshold=5.393e+01, percent-clipped=0.0 2024-08-15 15:46:57,895 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.37 vs. limit=15.0 2024-08-15 15:47:09,133 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 22 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-15 15:47:09,444 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3253020.0, ans=0.125 2024-08-15 15:47:19,119 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.16 vs. limit=15.0 2024-08-15 15:47:31,222 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.93 vs. limit=10.0 2024-08-15 15:47:41,868 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 6500, loss[loss=0.1074, beats_loss=0.01027, ecapa_loss=0.0001759, whisper_loss=0.09538, over 21681.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0106, ecapa_loss=0.0001503, whisper_loss=0.0901, over 3848798.08 frames. ], batch size: 93, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:47:46,006 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 19 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-15 15:47:55,469 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.30 vs. limit=15.0 2024-08-15 15:48:27,784 WARNING [optim.py:496] (1/4) Scaling gradients by 0.07086287438869476, model_norm_threshold=53.929649353027344 2024-08-15 15:48:27,956 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.15, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.590e+04, grad_sumsq=8.502e+06, orig_rms_sq=1.010e-02 2024-08-15 15:48:45,580 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3253620.0, ans=0.2 2024-08-15 15:48:47,075 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3253620.0, ans=0.0 2024-08-15 15:48:56,040 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 6550, loss[loss=0.09997, beats_loss=0.008724, ecapa_loss=0.0002016, whisper_loss=0.08923, over 21450.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01068, ecapa_loss=0.0001507, whisper_loss=0.09009, over 3876939.53 frames. ], batch size: 91, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:48:57,994 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3253720.0, ans=0.2 2024-08-15 15:49:11,802 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.956e+01 2.371e+01 2.638e+01 2.935e+01 7.610e+02, threshold=5.275e+01, percent-clipped=2.0 2024-08-15 15:49:15,266 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3253820.0, ans=0.125 2024-08-15 15:49:15,285 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3253820.0, ans=0.0 2024-08-15 15:49:18,178 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3253820.0, ans=0.0 2024-08-15 15:49:33,764 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 28 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-15 15:49:38,489 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3254020.0, ans=0.0 2024-08-15 15:50:00,259 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3254120.0, ans=0.0 2024-08-15 15:50:07,730 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 6600, loss[loss=0.1155, beats_loss=0.008023, ecapa_loss=0.0001612, whisper_loss=0.1059, over 22483.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01063, ecapa_loss=0.0001508, whisper_loss=0.09106, over 3915069.18 frames. ], batch size: 90, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:50:33,308 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3254320.0, ans=0.2 2024-08-15 15:50:43,311 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.51 vs. limit=15.0 2024-08-15 15:50:50,712 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 21 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-15 15:50:53,768 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 29 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-15 15:51:10,601 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 27 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-15 15:51:13,762 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3254620.0, ans=0.125 2024-08-15 15:51:19,726 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 6650, loss[loss=0.07494, beats_loss=0.01361, ecapa_loss=0.0001303, whisper_loss=0.06002, over 18757.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01063, ecapa_loss=0.0001496, whisper_loss=0.09126, over 3951463.87 frames. ], batch size: 79, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:51:29,710 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 18 from LS+wenet, 36 from Vox, 35 fro AS 2024-08-15 15:51:35,046 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.912e+01 2.370e+01 2.592e+01 2.847e+01 4.238e+01, threshold=5.184e+01, percent-clipped=0.0 2024-08-15 15:51:38,463 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3254820.0, ans=0.2 2024-08-15 15:51:38,521 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=3.868e-02 2024-08-15 15:51:41,636 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3254820.0, ans=0.2 2024-08-15 15:51:46,734 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 22 from LS+wenet, 27 from Vox, 24 fro AS 2024-08-15 15:51:50,295 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.71 vs. limit=15.0 2024-08-15 15:52:07,742 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3255020.0, ans=0.0 2024-08-15 15:52:17,729 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3255120.0, ans=0.05 2024-08-15 15:52:27,235 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 27 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-15 15:52:30,156 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 25 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-15 15:52:33,177 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 6700, loss[loss=0.0699, beats_loss=0.01286, ecapa_loss=0.0001272, whisper_loss=0.05576, over 17861.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01058, ecapa_loss=0.0001505, whisper_loss=0.09163, over 3957896.48 frames. ], batch size: 73, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:52:37,038 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3255220.0, ans=0.0 2024-08-15 15:52:48,425 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3255320.0, ans=0.2 2024-08-15 15:52:55,574 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3255320.0, ans=0.09899494936611666 2024-08-15 15:53:02,618 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3255420.0, ans=0.0 2024-08-15 15:53:06,673 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3255420.0, ans=0.125 2024-08-15 15:53:16,467 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3255520.0, ans=0.125 2024-08-15 15:53:17,002 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.99 vs. limit=12.0 2024-08-15 15:53:19,499 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3255520.0, ans=0.07 2024-08-15 15:53:23,154 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 25 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-15 15:53:45,409 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 6750, loss[loss=0.09168, beats_loss=0.01, ecapa_loss=0.0001384, whisper_loss=0.08029, over 17905.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01059, ecapa_loss=0.0001498, whisper_loss=0.09096, over 3910215.52 frames. ], batch size: 70, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:53:48,318 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 20 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-15 15:53:52,117 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3255720.0, ans=0.125 2024-08-15 15:53:53,054 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 37 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-15 15:54:01,208 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.760e+01 2.293e+01 2.545e+01 2.878e+01 4.170e+01, threshold=5.090e+01, percent-clipped=0.0 2024-08-15 15:54:11,371 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 20 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-15 15:54:15,009 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.96 vs. limit=15.0 2024-08-15 15:54:19,016 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3255920.0, ans=0.125 2024-08-15 15:54:24,987 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=15.99 vs. limit=15.0 2024-08-15 15:54:26,116 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3255920.0, ans=0.125 2024-08-15 15:54:38,342 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 32 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-15 15:54:48,381 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3256120.0, ans=0.0 2024-08-15 15:54:56,439 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 6800, loss[loss=0.08724, beats_loss=0.01285, ecapa_loss=0.0001432, whisper_loss=0.07296, over 14442.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01063, ecapa_loss=0.0001501, whisper_loss=0.09059, over 3872853.61 frames. ], batch size: 60, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:55:05,109 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3256220.0, ans=0.1 2024-08-15 15:55:25,272 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.12 vs. limit=10.0 2024-08-15 15:55:33,313 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3256420.0, ans=0.1 2024-08-15 15:55:37,779 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.54 vs. limit=15.0 2024-08-15 15:55:54,149 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3256620.0, ans=0.07 2024-08-15 15:55:57,139 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.77 vs. limit=15.0 2024-08-15 15:55:59,767 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3256620.0, ans=0.125 2024-08-15 15:56:04,113 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3256620.0, ans=0.04949747468305833 2024-08-15 15:56:06,392 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 6850, loss[loss=0.1214, beats_loss=0.008727, ecapa_loss=0.0001487, whisper_loss=0.1111, over 21931.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01061, ecapa_loss=0.0001492, whisper_loss=0.09058, over 3844876.14 frames. ], batch size: 82, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:56:10,851 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 12 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-15 15:56:22,617 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.749e+01 2.268e+01 2.467e+01 2.871e+01 7.953e+01, threshold=4.935e+01, percent-clipped=1.0 2024-08-15 15:56:33,068 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 15 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-15 15:56:36,718 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3256920.0, ans=0.125 2024-08-15 15:56:39,195 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 19 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-15 15:56:48,122 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-15 15:56:52,220 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 15 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-15 15:56:52,579 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3257020.0, ans=0.2 2024-08-15 15:57:20,494 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 6900, loss[loss=0.1058, beats_loss=0.01236, ecapa_loss=0.0001446, whisper_loss=0.09197, over 17853.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01059, ecapa_loss=0.0001497, whisper_loss=0.0905, over 3851775.75 frames. ], batch size: 71, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:57:22,644 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 24 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-15 15:57:30,098 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 12 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-15 15:57:32,069 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3257220.0, ans=0.0 2024-08-15 15:57:33,475 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 24 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-15 15:57:33,828 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3257220.0, ans=0.125 2024-08-15 15:57:41,012 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3257320.0, ans=0.0 2024-08-15 15:57:53,292 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3257420.0, ans=0.0 2024-08-15 15:57:59,520 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3257420.0, ans=0.125 2024-08-15 15:58:06,015 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 33 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-15 15:58:09,637 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3257520.0, ans=0.0 2024-08-15 15:58:12,580 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten.whitening_limit, batch_count=3257520.0, ans=22.5 2024-08-15 15:58:32,278 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.92 vs. limit=15.0 2024-08-15 15:58:34,002 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 6950, loss[loss=0.1164, beats_loss=0.0113, ecapa_loss=0.0001441, whisper_loss=0.1036, over 23014.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01053, ecapa_loss=0.0001497, whisper_loss=0.09146, over 3842825.34 frames. ], batch size: 90, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:58:34,102 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 21 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-15 15:58:40,855 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 28 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-15 15:58:42,744 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3257720.0, ans=0.1 2024-08-15 15:58:45,378 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 20 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-15 15:58:49,572 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.881e+01 2.345e+01 2.623e+01 2.937e+01 1.105e+02, threshold=5.245e+01, percent-clipped=3.0 2024-08-15 15:58:57,529 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.83 vs. limit=6.0 2024-08-15 15:59:14,789 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.79 vs. limit=15.0 2024-08-15 15:59:22,219 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 24 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-15 15:59:24,135 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3258020.0, ans=0.2 2024-08-15 15:59:34,679 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.49 vs. limit=22.5 2024-08-15 15:59:44,395 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 7000, loss[loss=0.106, beats_loss=0.008623, ecapa_loss=0.0001403, whisper_loss=0.09594, over 16753.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01051, ecapa_loss=0.0001509, whisper_loss=0.09135, over 3831559.42 frames. ], batch size: 66, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:59:50,561 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3258220.0, ans=0.125 2024-08-15 16:00:18,153 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3258420.0, ans=0.2 2024-08-15 16:00:27,130 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3258520.0, ans=0.125 2024-08-15 16:00:35,266 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 32 from LS+wenet, 9 from Vox, 31 fro AS 2024-08-15 16:00:42,673 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3258620.0, ans=0.125 2024-08-15 16:00:43,049 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.78 vs. limit=15.0 2024-08-15 16:00:52,961 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3258720.0, ans=0.125 2024-08-15 16:00:53,702 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 7050, loss[loss=0.08654, beats_loss=0.009214, ecapa_loss=0.0001717, whisper_loss=0.07561, over 17978.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01049, ecapa_loss=0.0001519, whisper_loss=0.09133, over 3852026.92 frames. ], batch size: 77, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:00:54,320 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3258720.0, ans=0.1 2024-08-15 16:01:00,159 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=14.66 vs. limit=15.0 2024-08-15 16:01:01,067 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3258720.0, ans=0.0 2024-08-15 16:01:08,930 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.667e+01 2.307e+01 2.519e+01 2.895e+01 2.053e+02, threshold=5.037e+01, percent-clipped=1.0 2024-08-15 16:01:23,733 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3258920.0, ans=0.0 2024-08-15 16:01:26,521 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3258920.0, ans=0.125 2024-08-15 16:01:50,003 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3259120.0, ans=0.0 2024-08-15 16:01:50,146 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3259120.0, ans=0.125 2024-08-15 16:01:51,168 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 21 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-15 16:01:52,943 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3259120.0, ans=0.1 2024-08-15 16:01:54,194 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3259120.0, ans=0.125 2024-08-15 16:01:56,190 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3259120.0, ans=0.0 2024-08-15 16:02:00,169 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 31 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-15 16:02:04,206 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 7100, loss[loss=0.1101, beats_loss=0.009884, ecapa_loss=0.000176, whisper_loss=0.09846, over 20944.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01048, ecapa_loss=0.0001522, whisper_loss=0.09149, over 3859774.05 frames. ], batch size: 88, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:02:26,661 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.40 vs. limit=15.0 2024-08-15 16:02:32,978 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 29 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-15 16:02:45,956 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3259520.0, ans=0.125 2024-08-15 16:02:54,483 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3259520.0, ans=0.1 2024-08-15 16:03:11,851 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3259620.0, ans=0.0 2024-08-15 16:03:15,618 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 7150, loss[loss=0.1095, beats_loss=0.01053, ecapa_loss=0.0001827, whisper_loss=0.09713, over 20796.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0106, ecapa_loss=0.0001514, whisper_loss=0.09049, over 3885669.38 frames. ], batch size: 88, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:03:31,249 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.305e+01 2.549e+01 2.852e+01 2.933e+02, threshold=5.099e+01, percent-clipped=1.0 2024-08-15 16:03:49,967 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3259920.0, ans=0.125 2024-08-15 16:04:07,515 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3260020.0, ans=0.0 2024-08-15 16:04:14,065 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 14 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-15 16:04:21,003 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 22 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-15 16:04:26,737 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 7200, loss[loss=0.09179, beats_loss=0.009014, ecapa_loss=0.0001578, whisper_loss=0.0812, over 19072.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01057, ecapa_loss=0.0001518, whisper_loss=0.09084, over 3883981.83 frames. ], batch size: 77, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:04:31,546 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3260220.0, ans=0.09899494936611666 2024-08-15 16:04:42,851 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3260320.0, ans=0.2 2024-08-15 16:04:47,073 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3260320.0, ans=0.0 2024-08-15 16:05:11,133 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3260520.0, ans=0.1 2024-08-15 16:05:20,176 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 17 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-15 16:05:37,215 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 7250, loss[loss=0.1103, beats_loss=0.01042, ecapa_loss=0.0001361, whisper_loss=0.09851, over 20875.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01059, ecapa_loss=0.0001521, whisper_loss=0.09091, over 3912375.58 frames. ], batch size: 84, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:05:38,791 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 22 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-15 16:05:40,675 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.20 vs. limit=15.0 2024-08-15 16:05:46,078 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3260720.0, ans=0.125 2024-08-15 16:05:52,175 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.362e+01 2.587e+01 2.816e+01 1.917e+02, threshold=5.173e+01, percent-clipped=1.0 2024-08-15 16:05:57,710 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 32 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-15 16:06:13,923 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3260920.0, ans=0.2 2024-08-15 16:06:25,126 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 33 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-15 16:06:26,210 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 19 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-15 16:06:26,789 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3261020.0, ans=0.125 2024-08-15 16:06:40,344 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 29 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-15 16:06:41,897 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3261120.0, ans=0.1 2024-08-15 16:06:46,029 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=5.732e+05 2024-08-15 16:06:46,842 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 7300, loss[loss=0.1282, beats_loss=0.006394, ecapa_loss=0.0001443, whisper_loss=0.1204, over 15165.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01058, ecapa_loss=0.0001521, whisper_loss=0.0916, over 3938133.64 frames. ], batch size: 54, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:06:47,435 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3261220.0, ans=0.07 2024-08-15 16:06:51,612 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3261220.0, ans=0.0 2024-08-15 16:06:54,377 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3261220.0, ans=0.0 2024-08-15 16:06:58,415 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3261220.0, ans=0.125 2024-08-15 16:07:03,118 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.27 vs. limit=15.0 2024-08-15 16:07:06,335 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 21 from LS+wenet, 16 from Vox, 17 fro AS 2024-08-15 16:07:08,235 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=3.746e-02 2024-08-15 16:07:19,331 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3261420.0, ans=0.125 2024-08-15 16:07:28,238 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3261520.0, ans=0.0 2024-08-15 16:07:30,694 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3261520.0, ans=0.125 2024-08-15 16:07:33,767 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 25 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-15 16:07:39,567 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 26 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-15 16:07:50,958 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3261620.0, ans=0.1 2024-08-15 16:07:53,915 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3261620.0, ans=0.125 2024-08-15 16:07:57,240 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 7350, loss[loss=0.0972, beats_loss=0.01203, ecapa_loss=0.0001175, whisper_loss=0.08399, over 19230.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01053, ecapa_loss=0.0001507, whisper_loss=0.09144, over 3894975.16 frames. ], batch size: 73, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:08:05,176 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3261720.0, ans=0.125 2024-08-15 16:08:13,239 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.328e+01 2.533e+01 2.862e+01 3.908e+01, threshold=5.065e+01, percent-clipped=0.0 2024-08-15 16:08:13,354 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 22 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-15 16:08:16,828 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3261820.0, ans=0.125 2024-08-15 16:08:22,237 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 21 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-15 16:08:36,817 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.393e-01 2024-08-15 16:08:43,377 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 35 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-15 16:09:01,550 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 19 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-15 16:09:08,249 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 7400, loss[loss=0.1077, beats_loss=0.01075, ecapa_loss=0.000123, whisper_loss=0.0957, over 14383.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01061, ecapa_loss=0.0001505, whisper_loss=0.09115, over 3900695.46 frames. ], batch size: 53, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:09:15,660 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3262220.0, ans=0.1 2024-08-15 16:09:29,929 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3262320.0, ans=0.0 2024-08-15 16:09:37,785 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 13 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-15 16:09:38,203 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.29 vs. limit=15.0 2024-08-15 16:09:42,867 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.55 vs. limit=22.5 2024-08-15 16:09:47,685 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 18 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-15 16:09:56,117 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3262520.0, ans=0.0 2024-08-15 16:09:58,837 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.85 vs. limit=12.0 2024-08-15 16:09:59,565 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 19 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-15 16:10:01,393 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3262520.0, ans=0.125 2024-08-15 16:10:08,204 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 24 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-15 16:10:12,534 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 20 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-15 16:10:16,531 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 19 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-15 16:10:17,738 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 7450, loss[loss=0.0911, beats_loss=0.01213, ecapa_loss=0.000157, whisper_loss=0.07741, over 18843.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0106, ecapa_loss=0.0001497, whisper_loss=0.09082, over 3879279.92 frames. ], batch size: 77, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:10:24,950 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3262720.0, ans=0.125 2024-08-15 16:10:29,764 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.00 vs. limit=22.5 2024-08-15 16:10:32,930 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.008e+01 2.336e+01 2.535e+01 2.838e+01 5.757e+01, threshold=5.069e+01, percent-clipped=1.0 2024-08-15 16:11:01,211 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.70 vs. limit=15.0 2024-08-15 16:11:12,827 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.25 vs. limit=12.0 2024-08-15 16:11:20,699 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3263120.0, ans=0.125 2024-08-15 16:11:27,190 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 7500, loss[loss=0.09108, beats_loss=0.01313, ecapa_loss=0.0001143, whisper_loss=0.0768, over 23081.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01054, ecapa_loss=0.0001509, whisper_loss=0.0911, over 3890159.01 frames. ], batch size: 90, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:11:36,144 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3263220.0, ans=0.0 2024-08-15 16:11:41,328 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 23 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-15 16:11:45,601 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 27 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-15 16:11:55,918 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3263420.0, ans=0.0 2024-08-15 16:12:11,577 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.54 vs. limit=12.0 2024-08-15 16:12:12,631 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3263520.0, ans=0.2 2024-08-15 16:12:30,707 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 30 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-15 16:12:37,233 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 7550, loss[loss=0.117, beats_loss=0.009377, ecapa_loss=0.0001528, whisper_loss=0.1061, over 22482.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01047, ecapa_loss=0.0001511, whisper_loss=0.09178, over 3882118.67 frames. ], batch size: 91, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:12:38,691 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 25 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-15 16:12:49,547 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3263720.0, ans=0.2 2024-08-15 16:12:52,883 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.915e+01 2.288e+01 2.542e+01 2.895e+01 9.119e+01, threshold=5.085e+01, percent-clipped=2.0 2024-08-15 16:12:55,938 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3263820.0, ans=0.125 2024-08-15 16:13:21,584 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3264020.0, ans=0.5 2024-08-15 16:13:22,843 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3264020.0, ans=0.125 2024-08-15 16:13:26,265 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.26 vs. limit=10.0 2024-08-15 16:13:30,477 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3264020.0, ans=0.0 2024-08-15 16:13:31,892 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3264020.0, ans=0.2 2024-08-15 16:13:39,769 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 19 from LS+wenet, 9 from Vox, 27 fro AS 2024-08-15 16:13:47,647 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3264220.0, ans=0.0 2024-08-15 16:13:48,449 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 7600, loss[loss=0.108, beats_loss=0.009556, ecapa_loss=0.0001666, whisper_loss=0.09681, over 15260.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01046, ecapa_loss=0.0001502, whisper_loss=0.09155, over 3860822.31 frames. ], batch size: 63, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:14:02,179 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3264320.0, ans=0.125 2024-08-15 16:14:16,650 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3264420.0, ans=0.125 2024-08-15 16:14:33,340 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 20 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-15 16:14:33,734 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3264520.0, ans=0.125 2024-08-15 16:14:43,151 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-15 16:14:48,089 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.996e+05 2024-08-15 16:15:00,122 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 7650, loss[loss=0.09785, beats_loss=0.009464, ecapa_loss=0.0001656, whisper_loss=0.08673, over 20932.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01041, ecapa_loss=0.0001498, whisper_loss=0.09178, over 3862691.78 frames. ], batch size: 82, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:15:05,133 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3264720.0, ans=0.1 2024-08-15 16:15:15,825 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.326e+01 2.582e+01 2.912e+01 5.220e+01, threshold=5.164e+01, percent-clipped=1.0 2024-08-15 16:15:22,609 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 21 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-15 16:15:28,621 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-15 16:15:33,061 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 27 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-15 16:15:48,868 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3265020.0, ans=0.125 2024-08-15 16:15:48,902 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3265020.0, ans=0.125 2024-08-15 16:15:50,009 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 24 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-15 16:16:08,952 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3265120.0, ans=0.125 2024-08-15 16:16:11,163 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 7700, loss[loss=0.1202, beats_loss=0.01008, ecapa_loss=0.0001302, whisper_loss=0.1088, over 21800.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01048, ecapa_loss=0.0001501, whisper_loss=0.09095, over 3888843.61 frames. ], batch size: 83, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:16:28,812 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 19 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-15 16:16:53,081 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.82 vs. limit=22.5 2024-08-15 16:17:06,302 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3265520.0, ans=0.1 2024-08-15 16:17:24,489 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 7750, loss[loss=0.07794, beats_loss=0.01552, ecapa_loss=0.0001532, whisper_loss=0.06089, over 15624.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01053, ecapa_loss=0.00015, whisper_loss=0.08997, over 3869903.24 frames. ], batch size: 70, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:17:32,351 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3265720.0, ans=0.125 2024-08-15 16:17:46,603 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.309e+01 2.587e+01 2.792e+01 3.462e+01, threshold=5.174e+01, percent-clipped=0.0 2024-08-15 16:18:25,672 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 25 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-15 16:18:32,020 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 28 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-15 16:18:47,218 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3266120.0, ans=0.125 2024-08-15 16:18:50,832 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 7800, loss[loss=0.0841, beats_loss=0.01149, ecapa_loss=0.0001578, whisper_loss=0.07103, over 19291.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01051, ecapa_loss=0.00015, whisper_loss=0.09025, over 3860742.46 frames. ], batch size: 81, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:18:59,554 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 22 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-15 16:19:14,466 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3266320.0, ans=0.125 2024-08-15 16:19:45,545 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 32 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-15 16:19:56,710 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.40 vs. limit=15.0 2024-08-15 16:20:00,509 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 22 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-15 16:20:20,616 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 11 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-15 16:20:33,328 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 7850, loss[loss=0.1047, beats_loss=0.0108, ecapa_loss=0.0001222, whisper_loss=0.09265, over 17730.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01052, ecapa_loss=0.0001501, whisper_loss=0.09066, over 3853668.17 frames. ], batch size: 68, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:20:46,721 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-08-15 16:20:56,110 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.771e+01 2.312e+01 2.657e+01 2.999e+01 5.998e+01, threshold=5.314e+01, percent-clipped=1.0 2024-08-15 16:21:07,633 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3266820.0, ans=0.2 2024-08-15 16:21:10,517 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3266920.0, ans=0.0 2024-08-15 16:21:33,186 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 22 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-15 16:22:16,404 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 22 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-15 16:22:21,329 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 7900, loss[loss=0.09, beats_loss=0.01011, ecapa_loss=0.0002069, whisper_loss=0.07782, over 21892.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01061, ecapa_loss=0.0001511, whisper_loss=0.09016, over 3867229.94 frames. ], batch size: 97, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:22:58,706 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 18 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-15 16:23:01,893 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3267320.0, ans=0.125 2024-08-15 16:23:28,236 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3267420.0, ans=0.2 2024-08-15 16:23:37,837 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=3267520.0, ans=0.05 2024-08-15 16:23:45,697 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 29 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-15 16:23:58,960 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3267620.0, ans=0.125 2024-08-15 16:24:27,093 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 7950, loss[loss=0.1079, beats_loss=0.009807, ecapa_loss=0.0001259, whisper_loss=0.09688, over 17687.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01065, ecapa_loss=0.0001488, whisper_loss=0.09021, over 3840345.34 frames. ], batch size: 67, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:24:52,530 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 37 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-15 16:24:53,098 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3267820.0, ans=0.125 2024-08-15 16:24:53,891 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.883e+01 2.363e+01 2.541e+01 2.931e+01 3.622e+01, threshold=5.082e+01, percent-clipped=0.0 2024-08-15 16:24:55,531 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3267820.0, ans=0.125 2024-08-15 16:25:53,045 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 23 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-15 16:26:02,113 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3268020.0, ans=0.2 2024-08-15 16:26:11,799 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3268120.0, ans=0.015 2024-08-15 16:26:33,059 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 8000, loss[loss=0.1039, beats_loss=0.0123, ecapa_loss=0.0001193, whisper_loss=0.09044, over 20131.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01068, ecapa_loss=0.0001473, whisper_loss=0.08995, over 3853222.88 frames. ], batch size: 80, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:26:55,412 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.98 vs. limit=12.0 2024-08-15 16:28:15,911 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 8050, loss[loss=0.0894, beats_loss=0.008878, ecapa_loss=0.0001883, whisper_loss=0.07864, over 18462.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01059, ecapa_loss=0.000148, whisper_loss=0.09032, over 3860136.63 frames. ], batch size: 73, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:28:19,897 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 16 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-15 16:28:21,531 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3268720.0, ans=0.1 2024-08-15 16:28:24,466 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3268720.0, ans=0.125 2024-08-15 16:28:32,999 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.283e+01 2.526e+01 2.890e+01 4.835e+01, threshold=5.052e+01, percent-clipped=0.0 2024-08-15 16:28:38,605 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3268820.0, ans=0.125 2024-08-15 16:28:42,057 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3268820.0, ans=0.2 2024-08-15 16:28:49,449 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3268920.0, ans=0.0 2024-08-15 16:29:05,750 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3269020.0, ans=0.125 2024-08-15 16:29:06,908 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-15 16:29:34,286 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3269220.0, ans=0.0 2024-08-15 16:29:35,048 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 8100, loss[loss=0.1365, beats_loss=0.007481, ecapa_loss=0.0001614, whisper_loss=0.1274, over 17539.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01065, ecapa_loss=0.0001479, whisper_loss=0.09039, over 3886183.58 frames. ], batch size: 65, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:29:52,826 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 14 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-15 16:29:59,973 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 22 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-15 16:30:16,970 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.945e-03 2024-08-15 16:30:18,063 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3269420.0, ans=0.0 2024-08-15 16:30:18,494 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.64 vs. limit=15.0 2024-08-15 16:30:22,390 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 17 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-15 16:30:24,167 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 26 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-15 16:30:29,544 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 22 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-15 16:30:45,608 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3269620.0, ans=0.125 2024-08-15 16:30:56,636 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 8150, loss[loss=0.1212, beats_loss=0.00838, ecapa_loss=0.0001668, whisper_loss=0.1112, over 17037.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01072, ecapa_loss=0.0001475, whisper_loss=0.08955, over 3889926.90 frames. ], batch size: 67, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:31:11,389 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3269720.0, ans=0.125 2024-08-15 16:31:15,493 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.817e+01 2.201e+01 2.455e+01 2.771e+01 3.780e+01, threshold=4.910e+01, percent-clipped=0.0 2024-08-15 16:31:15,632 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 35 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-15 16:31:32,253 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 20 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-15 16:31:44,310 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.58 vs. limit=15.0 2024-08-15 16:32:16,907 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 8200, loss[loss=0.1172, beats_loss=0.0103, ecapa_loss=0.0001453, whisper_loss=0.1055, over 22614.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01065, ecapa_loss=0.0001484, whisper_loss=0.09063, over 3919038.61 frames. ], batch size: 90, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:32:17,008 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 22 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-15 16:32:17,648 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.49 vs. limit=22.5 2024-08-15 16:32:34,407 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.97 vs. limit=10.0 2024-08-15 16:32:57,539 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3270420.0, ans=0.2 2024-08-15 16:33:03,724 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3270520.0, ans=0.125 2024-08-15 16:33:15,911 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 34 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-15 16:33:18,613 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-15 16:33:23,456 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3270620.0, ans=0.0 2024-08-15 16:33:34,017 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 8250, loss[loss=0.1095, beats_loss=0.007192, ecapa_loss=0.0002002, whisper_loss=0.1004, over 18274.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01065, ecapa_loss=0.0001484, whisper_loss=0.09048, over 3920446.93 frames. ], batch size: 80, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:33:41,814 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 16:33:46,648 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 28 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-15 16:33:50,940 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.927e+01 2.400e+01 2.685e+01 3.048e+01 2.636e+02, threshold=5.369e+01, percent-clipped=3.0 2024-08-15 16:33:51,638 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3270820.0, ans=0.125 2024-08-15 16:33:53,150 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3270820.0, ans=0.125 2024-08-15 16:34:16,459 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 27 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-15 16:34:19,910 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.43 vs. limit=10.0 2024-08-15 16:34:21,609 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.84 vs. limit=6.0 2024-08-15 16:34:27,443 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.76 vs. limit=5.0 2024-08-15 16:34:31,315 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3271020.0, ans=0.0 2024-08-15 16:34:48,510 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 8300, loss[loss=0.09787, beats_loss=0.008808, ecapa_loss=0.0001687, whisper_loss=0.08738, over 14348.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01061, ecapa_loss=0.0001493, whisper_loss=0.09053, over 3900482.25 frames. ], batch size: 59, lr: 2.70e-03, grad_scale: 1.152921504606847e+18 2024-08-15 16:34:56,842 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3271220.0, ans=0.125 2024-08-15 16:35:20,283 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3271420.0, ans=0.125 2024-08-15 16:35:21,546 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3271420.0, ans=0.0 2024-08-15 16:35:28,952 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3271420.0, ans=0.125 2024-08-15 16:35:40,905 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3271520.0, ans=0.95 2024-08-15 16:35:42,058 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 18 from LS+wenet, 25 from Vox, 48 fro AS 2024-08-15 16:35:49,234 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 26 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-15 16:35:51,805 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.08 vs. limit=15.0 2024-08-15 16:36:02,986 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 8350, loss[loss=0.09276, beats_loss=0.01009, ecapa_loss=0.0001704, whisper_loss=0.08097, over 17862.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01064, ecapa_loss=0.0001501, whisper_loss=0.0904, over 3916129.77 frames. ], batch size: 72, lr: 2.70e-03, grad_scale: 1.152921504606847e+18 2024-08-15 16:36:06,059 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-15 16:36:18,840 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.348e+01 2.577e+01 2.897e+01 4.165e+01, threshold=5.153e+01, percent-clipped=0.0 2024-08-15 16:36:39,840 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.55 vs. limit=22.5 2024-08-15 16:37:05,412 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 26 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-15 16:37:07,039 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 24 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-15 16:37:11,274 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 19 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-15 16:37:17,074 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 8400, loss[loss=0.1208, beats_loss=0.008543, ecapa_loss=0.0001696, whisper_loss=0.1106, over 16011.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01056, ecapa_loss=0.0001505, whisper_loss=0.09085, over 3903973.78 frames. ], batch size: 63, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:37:20,743 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-15 16:37:33,042 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3272320.0, ans=0.2 2024-08-15 16:37:33,317 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.52 vs. limit=10.0 2024-08-15 16:37:37,237 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 25 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-15 16:37:49,589 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3272420.0, ans=0.1 2024-08-15 16:37:58,763 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3272420.0, ans=0.0 2024-08-15 16:37:58,904 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3272420.0, ans=0.0 2024-08-15 16:38:00,830 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.53 vs. limit=15.0 2024-08-15 16:38:11,527 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=8.974e+00 2024-08-15 16:38:32,756 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 8450, loss[loss=0.1018, beats_loss=0.01055, ecapa_loss=0.0001246, whisper_loss=0.08998, over 18957.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01058, ecapa_loss=0.0001494, whisper_loss=0.09105, over 3902483.62 frames. ], batch size: 73, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:38:38,802 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-15 16:38:42,266 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 25 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-15 16:38:48,513 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3272820.0, ans=0.125 2024-08-15 16:38:50,517 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.958e+01 2.308e+01 2.508e+01 2.815e+01 5.021e+01, threshold=5.016e+01, percent-clipped=0.0 2024-08-15 16:39:10,343 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 35 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-15 16:39:12,310 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3272920.0, ans=0.1 2024-08-15 16:39:17,954 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3273020.0, ans=0.2 2024-08-15 16:39:23,781 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3273020.0, ans=0.0 2024-08-15 16:39:25,824 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.95 vs. limit=22.5 2024-08-15 16:39:28,377 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3273020.0, ans=0.2 2024-08-15 16:39:29,563 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 30 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-15 16:39:31,745 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3273120.0, ans=0.1 2024-08-15 16:39:39,003 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3273120.0, ans=0.125 2024-08-15 16:39:41,661 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3273120.0, ans=0.2 2024-08-15 16:39:47,665 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 8500, loss[loss=0.1211, beats_loss=0.008124, ecapa_loss=0.0001902, whisper_loss=0.111, over 19959.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01056, ecapa_loss=0.0001497, whisper_loss=0.09103, over 3897201.33 frames. ], batch size: 81, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:39:49,563 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 24 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-15 16:39:55,769 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3273220.0, ans=0.125 2024-08-15 16:40:10,299 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3273320.0, ans=0.125 2024-08-15 16:40:10,591 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.52 vs. limit=15.0 2024-08-15 16:40:21,231 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3273420.0, ans=0.125 2024-08-15 16:40:29,071 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-15 16:40:32,254 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 28 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-15 16:40:38,335 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3273520.0, ans=0.0 2024-08-15 16:40:39,361 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-15 16:40:46,221 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3273520.0, ans=0.125 2024-08-15 16:41:02,696 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 27 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-15 16:41:03,072 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3273620.0, ans=0.125 2024-08-15 16:41:05,125 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 8550, loss[loss=0.1054, beats_loss=0.0107, ecapa_loss=0.0001648, whisper_loss=0.093, over 16413.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01061, ecapa_loss=0.0001492, whisper_loss=0.09081, over 3910920.14 frames. ], batch size: 67, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:41:21,307 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.98 vs. limit=15.0 2024-08-15 16:41:23,403 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.746e+01 2.398e+01 2.637e+01 2.998e+01 4.357e+01, threshold=5.275e+01, percent-clipped=0.0 2024-08-15 16:41:42,273 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3273920.0, ans=0.125 2024-08-15 16:41:43,287 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 19 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-15 16:41:43,930 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3273920.0, ans=0.125 2024-08-15 16:41:45,422 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3273920.0, ans=0.1 2024-08-15 16:41:49,304 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 16 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-15 16:42:05,165 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3274120.0, ans=0.1 2024-08-15 16:42:21,698 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 8600, loss[loss=0.08423, beats_loss=0.01302, ecapa_loss=0.00013, whisper_loss=0.0699, over 14367.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01056, ecapa_loss=0.0001489, whisper_loss=0.09166, over 3923410.22 frames. ], batch size: 57, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:42:31,602 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 23 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-15 16:42:31,877 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3274220.0, ans=0.125 2024-08-15 16:42:38,626 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 28 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-15 16:42:38,894 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3274320.0, ans=0.0 2024-08-15 16:42:39,092 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.90 vs. limit=15.0 2024-08-15 16:42:55,225 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 16 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-15 16:43:14,092 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.27 vs. limit=6.0 2024-08-15 16:43:18,034 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 25 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-15 16:43:24,005 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 25 from LS+wenet, 27 from Vox, 23 fro AS 2024-08-15 16:43:30,869 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 28 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-15 16:43:37,997 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 8650, loss[loss=0.08431, beats_loss=0.01028, ecapa_loss=0.000175, whisper_loss=0.07228, over 16524.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01047, ecapa_loss=0.0001489, whisper_loss=0.09204, over 3924423.25 frames. ], batch size: 71, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:43:42,417 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-15 16:43:44,342 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3274720.0, ans=0.0 2024-08-15 16:43:55,930 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.305e+01 2.531e+01 2.832e+01 4.112e+01, threshold=5.062e+01, percent-clipped=0.0 2024-08-15 16:44:21,778 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=3275020.0, ans=10.0 2024-08-15 16:44:21,838 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3275020.0, ans=0.0 2024-08-15 16:44:29,945 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.07 vs. limit=12.0 2024-08-15 16:44:53,540 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 8700, loss[loss=0.123, beats_loss=0.007951, ecapa_loss=0.000181, whisper_loss=0.1132, over 22100.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01052, ecapa_loss=0.00015, whisper_loss=0.09128, over 3911271.98 frames. ], batch size: 89, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:44:59,011 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.22 vs. limit=22.5 2024-08-15 16:45:00,244 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3275220.0, ans=0.125 2024-08-15 16:45:11,684 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.60 vs. limit=22.5 2024-08-15 16:45:20,919 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.34 vs. limit=10.0 2024-08-15 16:45:28,067 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-15 16:45:38,249 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.46 vs. limit=15.0 2024-08-15 16:45:40,490 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 34 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-15 16:45:55,768 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 34 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-15 16:46:05,996 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.14 vs. limit=6.0 2024-08-15 16:46:11,300 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 8750, loss[loss=0.1268, beats_loss=0.006262, ecapa_loss=0.0001872, whisper_loss=0.1186, over 22022.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01049, ecapa_loss=0.0001498, whisper_loss=0.09072, over 3903015.45 frames. ], batch size: 83, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:46:28,478 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3275820.0, ans=0.125 2024-08-15 16:46:29,359 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.742e+01 2.332e+01 2.573e+01 2.934e+01 5.671e+01, threshold=5.146e+01, percent-clipped=2.0 2024-08-15 16:46:29,725 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 23 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-15 16:46:34,272 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.38 vs. limit=12.0 2024-08-15 16:47:01,675 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 21 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-15 16:47:14,969 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-15 16:47:35,175 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 27 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-15 16:47:39,059 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 8800, loss[loss=0.08572, beats_loss=0.01038, ecapa_loss=0.0001602, whisper_loss=0.07373, over 23578.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01054, ecapa_loss=0.0001494, whisper_loss=0.09045, over 3920000.83 frames. ], batch size: 96, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:47:43,663 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.44 vs. limit=15.0 2024-08-15 16:47:50,366 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 35 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-15 16:47:55,483 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3276220.0, ans=0.0 2024-08-15 16:48:06,095 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3276320.0, ans=0.1 2024-08-15 16:48:18,195 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 21 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-15 16:48:18,437 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3276420.0, ans=0.125 2024-08-15 16:48:33,728 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.27 vs. limit=22.5 2024-08-15 16:48:35,563 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.52 vs. limit=22.5 2024-08-15 16:48:40,948 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3276520.0, ans=0.125 2024-08-15 16:48:53,704 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 18 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-15 16:49:07,437 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-15 16:49:07,975 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3276620.0, ans=0.2 2024-08-15 16:49:09,551 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 32 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-15 16:49:15,374 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 8850, loss[loss=0.109, beats_loss=0.009688, ecapa_loss=0.0001369, whisper_loss=0.09793, over 18390.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01057, ecapa_loss=0.0001492, whisper_loss=0.09036, over 3930730.28 frames. ], batch size: 71, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:49:34,403 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3276820.0, ans=0.125 2024-08-15 16:49:36,985 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.293e+01 2.672e+01 3.024e+01 1.700e+02, threshold=5.345e+01, percent-clipped=3.0 2024-08-15 16:49:37,249 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 26 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-15 16:49:37,693 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3276820.0, ans=0.125 2024-08-15 16:49:57,332 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3276920.0, ans=0.125 2024-08-15 16:50:02,193 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3276920.0, ans=0.0 2024-08-15 16:50:06,940 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.06 vs. limit=15.0 2024-08-15 16:50:10,922 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 28 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-15 16:50:31,992 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.42 vs. limit=15.0 2024-08-15 16:50:35,608 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 8900, loss[loss=0.1112, beats_loss=0.008985, ecapa_loss=0.0001697, whisper_loss=0.1005, over 21241.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01058, ecapa_loss=0.0001493, whisper_loss=0.09082, over 3938862.87 frames. ], batch size: 89, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:50:37,348 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 26 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-15 16:50:38,054 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.41 vs. limit=15.0 2024-08-15 16:50:42,152 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3277220.0, ans=0.07 2024-08-15 16:50:53,458 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.53 vs. limit=15.0 2024-08-15 16:50:55,842 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3277320.0, ans=0.125 2024-08-15 16:50:55,907 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3277320.0, ans=0.125 2024-08-15 16:50:57,316 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 16:51:36,718 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=3277620.0, ans=0.5 2024-08-15 16:51:49,303 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 8950, loss[loss=0.09422, beats_loss=0.0118, ecapa_loss=0.0001237, whisper_loss=0.08118, over 14677.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01052, ecapa_loss=0.0001492, whisper_loss=0.09061, over 3902286.88 frames. ], batch size: 56, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:52:06,963 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.875e+01 2.302e+01 2.531e+01 2.768e+01 4.662e+01, threshold=5.062e+01, percent-clipped=0.0 2024-08-15 16:52:08,644 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 29 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-15 16:52:17,740 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 16 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-15 16:52:18,114 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3277920.0, ans=0.1 2024-08-15 16:52:27,156 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3277920.0, ans=0.0 2024-08-15 16:52:34,206 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3278020.0, ans=0.1 2024-08-15 16:53:01,951 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 9000, loss[loss=0.09395, beats_loss=0.009862, ecapa_loss=0.0001666, whisper_loss=0.08242, over 14437.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01046, ecapa_loss=0.0001501, whisper_loss=0.09092, over 3917509.51 frames. ], batch size: 58, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:53:01,952 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-15 16:53:39,110 INFO [train_multi_KD3.py:1149] (1/4) Epoch 23, validation on ASR_libri: loss=0.2514, beats_loss=0, ecapa_loss=0.0005338, whisper_loss=0.2461, over 922467.00 frames. 2024-08-15 16:53:57,598 INFO [train_multi_KD3.py:1149] (1/4) Epoch 23, validation on SV_voxceleb1: loss=0.004212, beats_loss=0, ecapa_loss=0.0004212, whisper_loss=0, over 939242.00 frames. 2024-08-15 16:55:49,231 INFO [train_multi_KD3.py:1149] (1/4) Epoch 23, validation on AT_audioset: loss=0.02337, beats_loss=0.02337, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 16:55:49,235 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 30639MB 2024-08-15 16:55:54,381 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3278220.0, ans=0.0 2024-08-15 16:55:55,800 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3278220.0, ans=0.125 2024-08-15 16:55:55,909 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3278220.0, ans=0.0 2024-08-15 16:56:05,909 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 23 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-15 16:56:19,100 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3278420.0, ans=0.125 2024-08-15 16:56:29,521 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 26 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-15 16:56:38,330 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3278520.0, ans=0.125 2024-08-15 16:56:52,182 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3278620.0, ans=0.125 2024-08-15 16:56:52,342 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.65 vs. limit=22.5 2024-08-15 16:56:56,852 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3278620.0, ans=0.0 2024-08-15 16:57:00,946 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 19 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-15 16:57:02,494 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-15 16:57:03,765 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 9050, loss[loss=0.1015, beats_loss=0.01005, ecapa_loss=0.0001718, whisper_loss=0.08975, over 22542.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01048, ecapa_loss=0.0001504, whisper_loss=0.09115, over 3892957.95 frames. ], batch size: 93, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:57:11,519 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 27 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-15 16:57:21,121 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.929e+01 2.395e+01 2.682e+01 2.921e+01 1.898e+02, threshold=5.364e+01, percent-clipped=1.0 2024-08-15 16:57:26,757 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3278820.0, ans=0.125 2024-08-15 16:57:32,115 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 32 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-15 16:57:38,577 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3278920.0, ans=0.1 2024-08-15 16:57:50,157 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 28 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-15 16:57:52,005 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3279020.0, ans=0.125 2024-08-15 16:58:09,085 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 20 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-15 16:58:17,921 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 9100, loss[loss=0.1074, beats_loss=0.01041, ecapa_loss=0.0001598, whisper_loss=0.09536, over 22876.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.0105, ecapa_loss=0.0001502, whisper_loss=0.09155, over 3886319.60 frames. ], batch size: 92, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:58:32,645 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3279320.0, ans=0.0 2024-08-15 16:58:39,463 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 19 from LS+wenet, 10 from Vox, 33 fro AS 2024-08-15 16:58:42,645 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 25 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-15 16:58:54,523 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3279420.0, ans=0.125 2024-08-15 16:59:00,269 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3279520.0, ans=0.0 2024-08-15 16:59:12,175 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3279520.0, ans=0.125 2024-08-15 16:59:23,913 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3279620.0, ans=0.0 2024-08-15 16:59:30,362 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 9150, loss[loss=0.1183, beats_loss=0.01049, ecapa_loss=0.0001568, whisper_loss=0.1063, over 17934.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01046, ecapa_loss=0.0001505, whisper_loss=0.09216, over 3882453.07 frames. ], batch size: 69, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:59:47,520 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.818e+01 2.265e+01 2.493e+01 2.729e+01 3.385e+01, threshold=4.985e+01, percent-clipped=0.0 2024-08-15 16:59:49,684 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3279820.0, ans=0.1 2024-08-15 16:59:49,699 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3279820.0, ans=0.125 2024-08-15 16:59:55,361 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 21 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-15 17:00:00,008 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.03 vs. limit=22.5 2024-08-15 17:00:05,059 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 23 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-15 17:00:13,055 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3279920.0, ans=0.0 2024-08-15 17:00:20,376 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3280020.0, ans=0.0 2024-08-15 17:00:21,754 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 17:00:23,225 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3280020.0, ans=0.125 2024-08-15 17:00:31,525 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 33 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-15 17:00:46,233 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 9200, loss[loss=0.0929, beats_loss=0.01139, ecapa_loss=0.0001535, whisper_loss=0.07998, over 17783.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01057, ecapa_loss=0.0001504, whisper_loss=0.09144, over 3898198.65 frames. ], batch size: 73, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:00:48,834 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.98 vs. limit=15.0 2024-08-15 17:00:54,563 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3280220.0, ans=0.125 2024-08-15 17:00:57,377 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3280220.0, ans=0.1 2024-08-15 17:00:59,083 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3280220.0, ans=0.125 2024-08-15 17:01:02,077 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=3280320.0, ans=15.0 2024-08-15 17:01:12,324 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 22 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-15 17:01:24,398 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.257e+05 2024-08-15 17:01:36,138 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3280520.0, ans=0.1 2024-08-15 17:02:00,522 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 9250, loss[loss=0.08523, beats_loss=0.01037, ecapa_loss=0.0001586, whisper_loss=0.07328, over 14859.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01057, ecapa_loss=0.0001507, whisper_loss=0.09117, over 3896733.81 frames. ], batch size: 58, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:02:01,251 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.48 vs. limit=10.0 2024-08-15 17:02:17,842 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.743e+01 2.360e+01 2.606e+01 2.888e+01 4.280e+01, threshold=5.211e+01, percent-clipped=0.0 2024-08-15 17:02:20,637 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=3280820.0, ans=0.95 2024-08-15 17:03:05,197 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3281120.0, ans=0.0 2024-08-15 17:03:14,942 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 9300, loss[loss=0.1088, beats_loss=0.01073, ecapa_loss=0.0001563, whisper_loss=0.09648, over 17728.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01048, ecapa_loss=0.0001502, whisper_loss=0.09201, over 3909241.00 frames. ], batch size: 71, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:03:21,123 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3281220.0, ans=0.1 2024-08-15 17:03:22,246 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 22 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-15 17:03:31,858 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-15 17:03:42,413 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3281320.0, ans=0.125 2024-08-15 17:03:48,416 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3281420.0, ans=0.0 2024-08-15 17:04:14,151 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3281520.0, ans=0.125 2024-08-15 17:04:17,644 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.97 vs. limit=15.0 2024-08-15 17:04:32,687 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 9350, loss[loss=0.1051, beats_loss=0.009152, ecapa_loss=0.0001782, whisper_loss=0.09412, over 22926.00 frames. ], tot_loss[loss=0.104, beats_loss=0.0105, ecapa_loss=0.0001499, whisper_loss=0.09204, over 3926362.04 frames. ], batch size: 94, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:04:36,747 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3281720.0, ans=0.025 2024-08-15 17:04:41,832 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 30 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-15 17:04:51,072 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.266e+01 2.526e+01 2.881e+01 4.072e+01, threshold=5.051e+01, percent-clipped=0.0 2024-08-15 17:04:54,817 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3281820.0, ans=0.1 2024-08-15 17:04:57,614 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 17 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-15 17:05:16,126 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 17 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-15 17:05:19,243 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3282020.0, ans=0.2 2024-08-15 17:05:37,969 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3282120.0, ans=0.0 2024-08-15 17:05:42,083 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3282120.0, ans=0.125 2024-08-15 17:05:43,657 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3282120.0, ans=0.125 2024-08-15 17:05:49,181 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 9400, loss[loss=0.08906, beats_loss=0.0122, ecapa_loss=0.0001701, whisper_loss=0.07517, over 20660.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01059, ecapa_loss=0.0001507, whisper_loss=0.0911, over 3907017.46 frames. ], batch size: 87, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:06:29,140 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3282420.0, ans=0.2 2024-08-15 17:06:36,670 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 22 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-15 17:06:49,846 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 24 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-15 17:07:07,575 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3282720.0, ans=0.09899494936611666 2024-08-15 17:07:08,896 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 9450, loss[loss=0.1021, beats_loss=0.01129, ecapa_loss=0.0001471, whisper_loss=0.08939, over 22007.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01062, ecapa_loss=0.0001504, whisper_loss=0.091, over 3902613.43 frames. ], batch size: 92, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:07:15,218 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 25 from LS+wenet, 10 from Vox, 20 fro AS 2024-08-15 17:07:21,971 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.39 vs. limit=15.0 2024-08-15 17:07:27,637 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.838e+01 2.277e+01 2.590e+01 2.788e+01 7.153e+01, threshold=5.181e+01, percent-clipped=1.0 2024-08-15 17:07:28,258 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3282820.0, ans=0.1 2024-08-15 17:07:28,326 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3282820.0, ans=0.2 2024-08-15 17:08:02,415 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3283020.0, ans=0.125 2024-08-15 17:08:07,391 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.98 vs. limit=12.0 2024-08-15 17:08:22,581 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten.whitening_limit, batch_count=3283120.0, ans=15.0 2024-08-15 17:08:26,375 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 9500, loss[loss=0.1053, beats_loss=0.006769, ecapa_loss=0.0002198, whisper_loss=0.09631, over 17828.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01062, ecapa_loss=0.0001498, whisper_loss=0.09115, over 3898991.65 frames. ], batch size: 76, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:08:31,302 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3283220.0, ans=0.0 2024-08-15 17:08:54,300 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.81 vs. limit=15.0 2024-08-15 17:09:15,745 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 18 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-15 17:09:32,543 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.98 vs. limit=10.0 2024-08-15 17:09:40,449 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 9550, loss[loss=0.08879, beats_loss=0.01048, ecapa_loss=0.0001544, whisper_loss=0.07677, over 17329.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01064, ecapa_loss=0.0001495, whisper_loss=0.0908, over 3898918.48 frames. ], batch size: 68, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:09:57,818 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.838e+01 2.381e+01 2.631e+01 2.929e+01 4.005e+01, threshold=5.261e+01, percent-clipped=0.0 2024-08-15 17:10:01,548 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3283820.0, ans=0.125 2024-08-15 17:10:06,135 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3283820.0, ans=0.125 2024-08-15 17:10:09,143 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3283920.0, ans=0.125 2024-08-15 17:10:15,927 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3283920.0, ans=0.1 2024-08-15 17:10:32,432 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3284020.0, ans=0.0 2024-08-15 17:10:39,836 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3284120.0, ans=0.125 2024-08-15 17:10:42,929 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3284120.0, ans=0.2 2024-08-15 17:10:51,505 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-15 17:10:54,232 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 9600, loss[loss=0.08383, beats_loss=0.01103, ecapa_loss=0.0001488, whisper_loss=0.07131, over 19483.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01057, ecapa_loss=0.0001505, whisper_loss=0.09086, over 3863331.64 frames. ], batch size: 78, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:10:58,177 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.16 vs. limit=22.5 2024-08-15 17:11:01,895 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 29 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-15 17:11:07,789 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-15 17:11:13,589 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-15 17:11:19,633 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 19 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-15 17:11:28,756 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 21 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-15 17:11:43,702 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 26 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-15 17:11:53,654 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-15 17:12:08,228 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 9650, loss[loss=0.0856, beats_loss=0.0136, ecapa_loss=0.0001601, whisper_loss=0.0704, over 13968.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01056, ecapa_loss=0.0001508, whisper_loss=0.09051, over 3857091.56 frames. ], batch size: 58, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:12:13,062 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3284720.0, ans=0.0 2024-08-15 17:12:24,878 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 17:12:25,649 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.860e+01 2.315e+01 2.525e+01 2.831e+01 4.515e+01, threshold=5.050e+01, percent-clipped=0.0 2024-08-15 17:12:26,391 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3284820.0, ans=0.125 2024-08-15 17:12:26,648 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.37 vs. limit=15.0 2024-08-15 17:12:31,090 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.93 vs. limit=15.0 2024-08-15 17:12:35,248 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3284820.0, ans=0.125 2024-08-15 17:12:48,627 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.25 vs. limit=15.0 2024-08-15 17:12:49,772 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3284920.0, ans=0.125 2024-08-15 17:12:51,221 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3285020.0, ans=0.0 2024-08-15 17:12:56,493 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.30 vs. limit=15.0 2024-08-15 17:13:00,432 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3285020.0, ans=0.2 2024-08-15 17:13:04,603 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 25 from LS+wenet, 20 from Vox, 49 fro AS 2024-08-15 17:13:19,254 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3285120.0, ans=0.125 2024-08-15 17:13:21,432 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 9700, loss[loss=0.08331, beats_loss=0.01145, ecapa_loss=0.0001433, whisper_loss=0.07043, over 18618.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01053, ecapa_loss=0.000151, whisper_loss=0.09058, over 3852649.23 frames. ], batch size: 77, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:13:25,131 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3285220.0, ans=0.125 2024-08-15 17:14:01,552 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3285220.0, ans=0.125 2024-08-15 17:14:11,608 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 17:14:20,377 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.02 vs. limit=15.0 2024-08-15 17:14:22,877 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3285320.0, ans=0.125 2024-08-15 17:14:45,464 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 17 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-15 17:14:55,784 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3285520.0, ans=0.125 2024-08-15 17:15:17,432 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3285620.0, ans=0.125 2024-08-15 17:15:19,944 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 9750, loss[loss=0.1188, beats_loss=0.009166, ecapa_loss=0.0001547, whisper_loss=0.1081, over 22728.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01059, ecapa_loss=0.0001505, whisper_loss=0.09, over 3805733.15 frames. ], batch size: 89, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:15:27,432 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 20 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-15 17:15:40,252 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.777e+01 2.300e+01 2.530e+01 2.812e+01 4.314e+01, threshold=5.060e+01, percent-clipped=0.0 2024-08-15 17:15:40,442 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 21 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-15 17:15:42,254 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 31 from LS+wenet, 26 from Vox, 24 fro AS 2024-08-15 17:15:43,259 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3285820.0, ans=0.025 2024-08-15 17:15:56,424 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 37 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-15 17:16:15,527 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.70 vs. limit=22.5 2024-08-15 17:16:30,710 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3286020.0, ans=0.125 2024-08-15 17:16:34,379 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3286020.0, ans=0.125 2024-08-15 17:16:41,206 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 24 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-15 17:16:42,100 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3286120.0, ans=0.125 2024-08-15 17:16:43,058 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 18 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-15 17:16:46,999 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-15 17:17:02,578 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 9800, loss[loss=0.09161, beats_loss=0.01111, ecapa_loss=0.0001801, whisper_loss=0.0787, over 19304.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01061, ecapa_loss=0.0001516, whisper_loss=0.09006, over 3811849.21 frames. ], batch size: 81, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:17:11,896 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 24 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-15 17:17:12,249 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3286220.0, ans=0.1 2024-08-15 17:17:16,795 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3286220.0, ans=0.125 2024-08-15 17:17:25,229 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 22 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-15 17:17:33,695 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.72 vs. limit=6.0 2024-08-15 17:17:35,745 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.15 vs. limit=15.0 2024-08-15 17:17:49,142 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.58 vs. limit=15.0 2024-08-15 17:18:25,438 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 17 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-15 17:18:35,297 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3286620.0, ans=0.125 2024-08-15 17:18:45,840 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-15 17:18:53,638 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 9850, loss[loss=0.09738, beats_loss=0.01138, ecapa_loss=0.0001508, whisper_loss=0.0845, over 22644.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01056, ecapa_loss=0.0001519, whisper_loss=0.09082, over 3818253.41 frames. ], batch size: 92, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:19:24,296 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.372e+01 2.632e+01 2.935e+01 4.456e+01, threshold=5.264e+01, percent-clipped=0.0 2024-08-15 17:19:28,448 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3286820.0, ans=0.1 2024-08-15 17:19:35,910 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 24 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-15 17:20:12,516 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.42 vs. limit=10.0 2024-08-15 17:20:34,075 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 17:20:36,147 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3287120.0, ans=0.1 2024-08-15 17:20:43,553 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.29 vs. limit=15.0 2024-08-15 17:20:58,525 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 9900, loss[loss=0.104, beats_loss=0.01183, ecapa_loss=0.0001229, whisper_loss=0.09094, over 23395.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0106, ecapa_loss=0.0001513, whisper_loss=0.09006, over 3861278.85 frames. ], batch size: 92, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:21:03,246 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.85 vs. limit=15.0 2024-08-15 17:21:19,440 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-15 17:21:43,535 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.17 vs. limit=10.0 2024-08-15 17:21:45,422 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3287320.0, ans=0.1 2024-08-15 17:21:55,148 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3287420.0, ans=0.125 2024-08-15 17:22:15,012 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3287520.0, ans=0.1 2024-08-15 17:22:42,639 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3287620.0, ans=0.0 2024-08-15 17:22:42,667 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3287620.0, ans=0.125 2024-08-15 17:22:54,026 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-15 17:23:01,454 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 9950, loss[loss=0.1052, beats_loss=0.008948, ecapa_loss=0.000171, whisper_loss=0.09455, over 20712.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01061, ecapa_loss=0.0001512, whisper_loss=0.09047, over 3856937.74 frames. ], batch size: 87, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:23:19,782 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.90 vs. limit=15.0 2024-08-15 17:23:31,270 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.852e+01 2.466e+01 2.723e+01 3.016e+01 5.091e+01, threshold=5.446e+01, percent-clipped=0.0 2024-08-15 17:23:58,363 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.24 vs. limit=15.0 2024-08-15 17:24:08,179 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 20 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-15 17:24:34,042 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3288120.0, ans=0.0 2024-08-15 17:24:35,391 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3288120.0, ans=0.125 2024-08-15 17:24:50,087 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 10000, loss[loss=0.08131, beats_loss=0.0122, ecapa_loss=0.0001544, whisper_loss=0.06756, over 17886.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01066, ecapa_loss=0.0001503, whisper_loss=0.09043, over 3884714.83 frames. ], batch size: 76, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:25:09,775 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3288320.0, ans=0.0 2024-08-15 17:25:34,706 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3288420.0, ans=10.0 2024-08-15 17:25:37,428 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 27 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-15 17:25:52,368 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3288520.0, ans=0.125 2024-08-15 17:25:52,434 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3288520.0, ans=0.1 2024-08-15 17:25:54,336 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.57 vs. limit=12.0 2024-08-15 17:25:59,646 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3288620.0, ans=0.125 2024-08-15 17:26:18,159 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 10050, loss[loss=0.08877, beats_loss=0.01272, ecapa_loss=0.0001109, whisper_loss=0.07494, over 20366.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01059, ecapa_loss=0.0001497, whisper_loss=0.09021, over 3870780.77 frames. ], batch size: 81, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:26:20,180 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 22 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-15 17:26:23,938 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3288720.0, ans=0.2 2024-08-15 17:26:26,884 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.88 vs. limit=15.0 2024-08-15 17:26:30,590 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=3288720.0, ans=0.05 2024-08-15 17:26:34,055 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.36 vs. limit=15.0 2024-08-15 17:26:40,557 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.730e+01 2.323e+01 2.519e+01 2.738e+01 4.374e+01, threshold=5.039e+01, percent-clipped=0.0 2024-08-15 17:26:45,916 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3288820.0, ans=0.5 2024-08-15 17:27:28,541 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 23 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-15 17:27:45,038 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 18 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-15 17:27:47,834 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 10100, loss[loss=0.09516, beats_loss=0.01034, ecapa_loss=0.0001276, whisper_loss=0.08354, over 16427.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01056, ecapa_loss=0.0001504, whisper_loss=0.09072, over 3899848.42 frames. ], batch size: 64, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:27:59,848 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3289220.0, ans=0.0 2024-08-15 17:28:18,003 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 24 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-15 17:28:20,135 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.45 vs. limit=10.0 2024-08-15 17:28:30,363 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3289420.0, ans=0.125 2024-08-15 17:28:31,571 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 20 from LS+wenet, 13 from Vox, 42 fro AS 2024-08-15 17:28:32,342 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3289420.0, ans=0.0 2024-08-15 17:28:57,471 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 18 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-15 17:29:09,194 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.73 vs. limit=22.5 2024-08-15 17:29:18,209 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 10150, loss[loss=0.1174, beats_loss=0.01048, ecapa_loss=0.0001772, whisper_loss=0.1051, over 21898.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01058, ecapa_loss=0.0001511, whisper_loss=0.0907, over 3928053.36 frames. ], batch size: 86, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:29:26,228 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3289720.0, ans=0.1 2024-08-15 17:29:28,091 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3289720.0, ans=0.125 2024-08-15 17:29:39,397 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.736e+01 2.314e+01 2.537e+01 2.890e+01 1.648e+02, threshold=5.074e+01, percent-clipped=2.0 2024-08-15 17:29:48,418 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3289820.0, ans=0.125 2024-08-15 17:30:08,568 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3290020.0, ans=0.1 2024-08-15 17:30:19,949 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3290020.0, ans=0.125 2024-08-15 17:30:39,914 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-15 17:30:41,118 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 10200, loss[loss=0.1141, beats_loss=0.009424, ecapa_loss=0.0001866, whisper_loss=0.1028, over 22273.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01059, ecapa_loss=0.0001513, whisper_loss=0.09075, over 3928556.49 frames. ], batch size: 92, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:30:48,408 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 26 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-15 17:31:07,603 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-15 17:31:13,917 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3290420.0, ans=0.2 2024-08-15 17:31:17,958 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3290420.0, ans=0.1 2024-08-15 17:31:24,346 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 24 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-15 17:31:24,583 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3290420.0, ans=0.0 2024-08-15 17:31:24,824 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.59 vs. limit=15.0 2024-08-15 17:31:35,022 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 28 from LS+wenet, 15 from Vox, 51 fro AS 2024-08-15 17:31:49,243 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3290620.0, ans=0.0 2024-08-15 17:32:04,159 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 17 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-15 17:32:05,221 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 10250, loss[loss=0.08032, beats_loss=0.01265, ecapa_loss=0.0001196, whisper_loss=0.06647, over 17541.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01064, ecapa_loss=0.0001511, whisper_loss=0.09029, over 3958156.69 frames. ], batch size: 72, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:32:12,501 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.656e+01 2024-08-15 17:32:25,137 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.623e+01 2.349e+01 2.551e+01 2.894e+01 3.006e+02, threshold=5.101e+01, percent-clipped=3.0 2024-08-15 17:32:37,937 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3290920.0, ans=0.0 2024-08-15 17:32:43,323 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3290920.0, ans=0.125 2024-08-15 17:32:46,194 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 22 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-15 17:33:26,878 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 10300, loss[loss=0.1271, beats_loss=0.008802, ecapa_loss=0.0001267, whisper_loss=0.117, over 21647.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01067, ecapa_loss=0.000151, whisper_loss=0.08988, over 3957587.19 frames. ], batch size: 80, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:33:31,295 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 13 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-15 17:33:32,979 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3291220.0, ans=0.0 2024-08-15 17:33:32,997 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3291220.0, ans=0.125 2024-08-15 17:33:42,248 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 20 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-15 17:33:49,060 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3291320.0, ans=0.2 2024-08-15 17:33:53,754 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 32 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-15 17:33:54,058 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=3291320.0, ans=0.02 2024-08-15 17:34:05,021 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 27 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-15 17:34:21,842 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3291520.0, ans=0.0 2024-08-15 17:34:34,069 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3291620.0, ans=0.125 2024-08-15 17:34:35,596 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.53 vs. limit=15.0 2024-08-15 17:34:38,223 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-15 17:34:43,149 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 10350, loss[loss=0.1208, beats_loss=0.008785, ecapa_loss=0.0001345, whisper_loss=0.1107, over 21046.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01069, ecapa_loss=0.0001509, whisper_loss=0.09001, over 3973092.08 frames. ], batch size: 79, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:34:43,688 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 17 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-15 17:35:02,258 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.326e+01 2.650e+01 2.958e+01 2.904e+02, threshold=5.299e+01, percent-clipped=2.0 2024-08-15 17:35:04,588 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3291820.0, ans=0.2 2024-08-15 17:35:07,342 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 24 from LS+wenet, 33 from Vox, 34 fro AS 2024-08-15 17:35:12,053 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3291820.0, ans=0.1 2024-08-15 17:35:16,877 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-15 17:35:20,479 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3291920.0, ans=0.125 2024-08-15 17:35:21,443 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 20 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-15 17:35:21,867 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3291920.0, ans=0.1 2024-08-15 17:35:23,058 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 16 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-15 17:35:35,265 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3292020.0, ans=0.0 2024-08-15 17:35:42,995 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3292020.0, ans=0.125 2024-08-15 17:36:00,803 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 10400, loss[loss=0.07024, beats_loss=0.01558, ecapa_loss=0.0001377, whisper_loss=0.05328, over 20278.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01066, ecapa_loss=0.0001501, whisper_loss=0.09015, over 3955331.35 frames. ], batch size: 86, lr: 2.69e-03, grad_scale: 1.152921504606847e+18 2024-08-15 17:36:12,690 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.60 vs. limit=10.0 2024-08-15 17:36:14,968 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3292320.0, ans=0.0 2024-08-15 17:36:23,649 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 17:36:24,898 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 21 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-15 17:36:35,066 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 16 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-15 17:36:36,441 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 23 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-15 17:36:47,989 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 24 from LS+wenet, 12 from Vox, 19 fro AS 2024-08-15 17:36:48,282 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3292520.0, ans=0.125 2024-08-15 17:37:02,402 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.20 vs. limit=10.0 2024-08-15 17:37:03,442 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3292620.0, ans=0.0 2024-08-15 17:37:14,070 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 10450, loss[loss=0.09875, beats_loss=0.01173, ecapa_loss=0.0001263, whisper_loss=0.08576, over 22541.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01067, ecapa_loss=0.0001497, whisper_loss=0.08968, over 3918039.95 frames. ], batch size: 90, lr: 2.69e-03, grad_scale: 1.152921504606847e+18 2024-08-15 17:37:23,704 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.32 vs. limit=15.0 2024-08-15 17:37:25,980 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3292720.0, ans=0.125 2024-08-15 17:37:31,378 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.718e+01 2.170e+01 2.524e+01 2.860e+01 1.816e+02, threshold=5.048e+01, percent-clipped=1.0 2024-08-15 17:37:36,929 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3292820.0, ans=10.0 2024-08-15 17:37:37,542 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.38 vs. limit=6.0 2024-08-15 17:37:38,413 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3292820.0, ans=0.2 2024-08-15 17:37:55,144 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.78 vs. limit=22.5 2024-08-15 17:37:58,894 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3293020.0, ans=0.125 2024-08-15 17:38:01,259 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.31 vs. limit=15.0 2024-08-15 17:38:07,798 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3293020.0, ans=0.0 2024-08-15 17:38:12,304 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3293120.0, ans=0.1 2024-08-15 17:38:13,399 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 14 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-15 17:38:19,248 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 34 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-15 17:38:28,090 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 10500, loss[loss=0.1332, beats_loss=0.00897, ecapa_loss=0.0001244, whisper_loss=0.123, over 20377.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01069, ecapa_loss=0.0001496, whisper_loss=0.08947, over 3877207.87 frames. ], batch size: 77, lr: 2.69e-03, grad_scale: 1.152921504606847e+18 2024-08-15 17:38:29,135 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.55 vs. limit=15.0 2024-08-15 17:38:34,997 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.67 vs. limit=15.0 2024-08-15 17:39:02,499 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3293420.0, ans=0.07 2024-08-15 17:39:03,894 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3293420.0, ans=0.0 2024-08-15 17:39:14,240 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3293520.0, ans=0.0 2024-08-15 17:39:21,340 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3293520.0, ans=0.125 2024-08-15 17:39:26,401 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.78 vs. limit=12.0 2024-08-15 17:39:27,066 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 25 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-15 17:39:35,754 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3293620.0, ans=0.0 2024-08-15 17:39:36,212 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.50 vs. limit=6.0 2024-08-15 17:39:42,579 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 10550, loss[loss=0.08535, beats_loss=0.01082, ecapa_loss=0.0001443, whisper_loss=0.07309, over 20901.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01069, ecapa_loss=0.0001505, whisper_loss=0.08848, over 3836714.91 frames. ], batch size: 84, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:39:43,294 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3293720.0, ans=0.1 2024-08-15 17:40:01,125 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.311e+01 2.619e+01 2.877e+01 4.261e+01, threshold=5.238e+01, percent-clipped=0.0 2024-08-15 17:40:07,334 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3293820.0, ans=0.2 2024-08-15 17:40:40,986 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3294120.0, ans=0.125 2024-08-15 17:40:54,213 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 10600, loss[loss=0.08961, beats_loss=0.01087, ecapa_loss=0.000154, whisper_loss=0.0772, over 21625.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01062, ecapa_loss=0.000152, whisper_loss=0.08919, over 3866026.33 frames. ], batch size: 88, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:40:57,243 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 27 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-15 17:41:00,508 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 22 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-15 17:41:01,353 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.23 vs. limit=15.0 2024-08-15 17:41:07,600 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 26 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-15 17:41:27,275 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 35 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-15 17:41:36,882 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3294520.0, ans=0.07 2024-08-15 17:41:37,952 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 19 from LS+wenet, 27 from Vox, 48 fro AS 2024-08-15 17:41:42,233 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3294520.0, ans=0.1 2024-08-15 17:41:49,217 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-15 17:42:06,131 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 10650, loss[loss=0.1183, beats_loss=0.01015, ecapa_loss=0.0001183, whisper_loss=0.1069, over 23479.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01062, ecapa_loss=0.0001507, whisper_loss=0.08936, over 3858287.39 frames. ], batch size: 88, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:42:09,367 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 19 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-15 17:42:13,945 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3294720.0, ans=0.125 2024-08-15 17:42:19,501 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 24 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-15 17:42:24,882 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.761e+01 2.386e+01 2.620e+01 3.005e+01 5.015e+01, threshold=5.240e+01, percent-clipped=0.0 2024-08-15 17:42:32,299 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 21 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-15 17:42:34,271 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3294920.0, ans=0.0 2024-08-15 17:42:41,253 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3294920.0, ans=0.125 2024-08-15 17:42:43,846 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3294920.0, ans=0.125 2024-08-15 17:42:57,185 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-15 17:43:04,990 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3295120.0, ans=0.2 2024-08-15 17:43:07,819 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3295120.0, ans=0.2 2024-08-15 17:43:19,807 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 10700, loss[loss=0.08113, beats_loss=0.01443, ecapa_loss=0.0001282, whisper_loss=0.06542, over 21854.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01056, ecapa_loss=0.0001494, whisper_loss=0.09016, over 3888148.54 frames. ], batch size: 91, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:43:29,279 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3295220.0, ans=0.125 2024-08-15 17:43:29,297 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3295220.0, ans=0.0 2024-08-15 17:43:34,187 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.32 vs. limit=10.0 2024-08-15 17:43:37,024 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.205e+00 2024-08-15 17:43:48,619 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3295420.0, ans=0.125 2024-08-15 17:43:49,148 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.49 vs. limit=15.0 2024-08-15 17:44:01,892 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.47 vs. limit=15.0 2024-08-15 17:44:05,914 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3295520.0, ans=0.1 2024-08-15 17:44:09,866 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 22 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-15 17:44:30,236 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 17:44:32,576 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 10750, loss[loss=0.09012, beats_loss=0.0127, ecapa_loss=0.0001245, whisper_loss=0.07617, over 19577.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01062, ecapa_loss=0.0001486, whisper_loss=0.09072, over 3883685.75 frames. ], batch size: 79, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:44:32,887 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 30 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-15 17:44:50,767 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.745e+01 2.313e+01 2.649e+01 2.924e+01 4.383e+01, threshold=5.299e+01, percent-clipped=0.0 2024-08-15 17:44:54,323 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3295820.0, ans=0.05 2024-08-15 17:44:55,737 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 27 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-15 17:44:57,609 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3295820.0, ans=0.125 2024-08-15 17:45:22,183 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3296020.0, ans=0.2 2024-08-15 17:45:33,888 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3296120.0, ans=0.2 2024-08-15 17:45:35,134 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3296120.0, ans=0.1 2024-08-15 17:45:42,327 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3296120.0, ans=0.125 2024-08-15 17:45:44,503 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 10800, loss[loss=0.1131, beats_loss=0.01031, ecapa_loss=0.0001358, whisper_loss=0.1014, over 18386.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01077, ecapa_loss=0.0001471, whisper_loss=0.09039, over 3894631.33 frames. ], batch size: 73, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:45:55,103 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3296220.0, ans=0.0 2024-08-15 17:46:12,279 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3296420.0, ans=0.125 2024-08-15 17:46:21,629 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 28 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-15 17:46:47,334 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 28 from LS+wenet, 11 from Vox, 39 fro AS 2024-08-15 17:46:54,316 WARNING [optim.py:496] (1/4) Scaling gradients by 0.04891674965620041, model_norm_threshold=52.98820877075195 2024-08-15 17:46:54,482 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.34, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.961e+05, grad_sumsq=3.961e+05, orig_rms_sq=1.000e+00 2024-08-15 17:46:55,800 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 10850, loss[loss=0.09306, beats_loss=0.009314, ecapa_loss=0.0001662, whisper_loss=0.08209, over 16129.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01077, ecapa_loss=0.000148, whisper_loss=0.09041, over 3873212.47 frames. ], batch size: 65, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:46:56,527 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3296720.0, ans=0.125 2024-08-15 17:47:05,975 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3296720.0, ans=0.125 2024-08-15 17:47:09,861 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 12 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-15 17:47:13,810 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 2.310e+01 2.561e+01 2.919e+01 1.083e+03, threshold=5.121e+01, percent-clipped=1.0 2024-08-15 17:47:18,461 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-15 17:47:23,317 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.14 vs. limit=12.0 2024-08-15 17:47:24,305 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3296920.0, ans=0.1 2024-08-15 17:47:39,349 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3297020.0, ans=0.0 2024-08-15 17:47:44,865 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 19 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-15 17:47:45,177 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3297020.0, ans=0.1 2024-08-15 17:47:51,425 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.47 vs. limit=15.0 2024-08-15 17:47:56,173 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.81 vs. limit=22.5 2024-08-15 17:48:00,025 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 16 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-15 17:48:01,380 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 22 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-15 17:48:08,175 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 10900, loss[loss=0.07294, beats_loss=0.01219, ecapa_loss=0.0001101, whisper_loss=0.05965, over 14132.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01071, ecapa_loss=0.0001479, whisper_loss=0.09086, over 3860115.85 frames. ], batch size: 55, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:48:08,421 INFO [train_multi_KD3.py:844] (1/4) A total of 96 cuts. 23 from LS+wenet, 26 from Vox, 47 fro AS 2024-08-15 17:48:14,637 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 17:48:29,135 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3297320.0, ans=0.125 2024-08-15 17:48:43,576 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3297420.0, ans=0.0 2024-08-15 17:48:53,457 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 34 from LS+wenet, 13 from Vox, 40 fro AS 2024-08-15 17:48:54,887 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 20 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-15 17:49:08,317 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3297620.0, ans=0.5 2024-08-15 17:49:11,393 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3297620.0, ans=0.125 2024-08-15 17:49:20,706 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 10950, loss[loss=0.1048, beats_loss=0.01066, ecapa_loss=0.0001676, whisper_loss=0.09244, over 21793.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01065, ecapa_loss=0.0001487, whisper_loss=0.09094, over 3860900.27 frames. ], batch size: 91, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 17:49:29,088 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 30 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-15 17:49:40,219 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.851e+01 2.372e+01 2.632e+01 3.011e+01 4.357e+01, threshold=5.265e+01, percent-clipped=0.0 2024-08-15 17:49:46,284 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3297820.0, ans=0.125 2024-08-15 17:49:46,302 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3297820.0, ans=0.0 2024-08-15 17:49:51,705 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 32 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-15 17:49:55,008 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3297920.0, ans=0.125 2024-08-15 17:49:55,072 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3297920.0, ans=0.0 2024-08-15 17:50:27,852 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.86 vs. limit=22.5 2024-08-15 17:50:31,772 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 11000, loss[loss=0.1123, beats_loss=0.01023, ecapa_loss=0.0001613, whisper_loss=0.1005, over 22661.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01059, ecapa_loss=0.0001497, whisper_loss=0.09108, over 3875126.05 frames. ], batch size: 91, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 17:50:43,856 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3298220.0, ans=0.1 2024-08-15 17:50:49,405 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3298320.0, ans=0.2 2024-08-15 17:51:02,001 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 33 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-15 17:51:06,329 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3298420.0, ans=0.125 2024-08-15 17:51:10,394 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3298420.0, ans=0.1 2024-08-15 17:51:25,242 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3298520.0, ans=0.2 2024-08-15 17:51:27,141 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.32 vs. limit=22.5 2024-08-15 17:51:35,488 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3298620.0, ans=0.125 2024-08-15 17:51:41,976 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 11050, loss[loss=0.08316, beats_loss=0.009771, ecapa_loss=0.0001369, whisper_loss=0.07202, over 17649.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01056, ecapa_loss=0.0001501, whisper_loss=0.09124, over 3904008.73 frames. ], batch size: 71, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 17:52:03,239 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.406e+01 2.620e+01 2.867e+01 4.013e+01, threshold=5.239e+01, percent-clipped=0.0 2024-08-15 17:52:24,976 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3299020.0, ans=0.07 2024-08-15 17:52:38,768 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 17 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-15 17:52:54,537 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 11100, loss[loss=0.0926, beats_loss=0.01123, ecapa_loss=0.0001728, whisper_loss=0.07964, over 17225.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01059, ecapa_loss=0.0001496, whisper_loss=0.09112, over 3899839.14 frames. ], batch size: 71, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 17:53:36,137 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 21 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-15 17:53:42,323 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3299520.0, ans=0.95 2024-08-15 17:54:05,873 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-15 17:54:08,309 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 11150, loss[loss=0.09779, beats_loss=0.01106, ecapa_loss=0.0001613, whisper_loss=0.08512, over 18577.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0105, ecapa_loss=0.0001496, whisper_loss=0.09129, over 3894562.23 frames. ], batch size: 75, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 17:54:11,813 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3299720.0, ans=0.0 2024-08-15 17:54:16,073 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3299720.0, ans=0.0 2024-08-15 17:54:28,194 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.377e+01 2.635e+01 3.031e+01 4.135e+01, threshold=5.270e+01, percent-clipped=0.0 2024-08-15 17:54:33,079 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3299820.0, ans=0.1 2024-08-15 17:54:37,172 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3299920.0, ans=0.1 2024-08-15 17:54:42,107 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.99 vs. limit=22.5 2024-08-15 17:54:57,618 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3300020.0, ans=0.125 2024-08-15 17:55:04,535 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 28 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-15 17:55:06,254 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3300120.0, ans=0.125 2024-08-15 17:55:17,567 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 13 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-15 17:55:20,141 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 11200, loss[loss=0.1151, beats_loss=0.009901, ecapa_loss=0.0001474, whisper_loss=0.1038, over 18020.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0106, ecapa_loss=0.000148, whisper_loss=0.09068, over 3908336.71 frames. ], batch size: 73, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 17:55:29,705 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.10 vs. limit=15.0 2024-08-15 17:55:33,940 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.11 vs. limit=15.0 2024-08-15 17:55:33,978 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.16 vs. limit=22.5 2024-08-15 17:55:42,969 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3300320.0, ans=0.125 2024-08-15 17:55:49,029 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3300420.0, ans=0.125 2024-08-15 17:55:52,936 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 40 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-15 17:55:53,391 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.05 vs. limit=15.0 2024-08-15 17:55:54,592 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3300420.0, ans=0.125 2024-08-15 17:56:01,090 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 28 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-15 17:56:10,399 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3300520.0, ans=0.125 2024-08-15 17:56:14,686 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 22 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-15 17:56:33,537 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 11250, loss[loss=0.1461, beats_loss=0.006421, ecapa_loss=0.000154, whisper_loss=0.1382, over 22348.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01059, ecapa_loss=0.0001483, whisper_loss=0.0911, over 3926494.14 frames. ], batch size: 84, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 17:56:43,558 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 17 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-15 17:56:53,243 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.294e+01 2.493e+01 2.758e+01 4.504e+01, threshold=4.985e+01, percent-clipped=0.0 2024-08-15 17:56:55,352 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3300820.0, ans=0.125 2024-08-15 17:57:00,991 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 31 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-15 17:57:05,640 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3300920.0, ans=0.1 2024-08-15 17:57:12,972 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3300920.0, ans=0.2 2024-08-15 17:57:22,452 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 15 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-15 17:57:28,637 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-08-15 17:57:35,456 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3301120.0, ans=0.0 2024-08-15 17:57:44,728 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 11300, loss[loss=0.1167, beats_loss=0.008359, ecapa_loss=0.000152, whisper_loss=0.1068, over 18459.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01059, ecapa_loss=0.0001486, whisper_loss=0.09055, over 3928323.36 frames. ], batch size: 72, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 17:57:45,201 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3301220.0, ans=0.0 2024-08-15 17:57:52,721 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3301220.0, ans=0.1 2024-08-15 17:57:57,370 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.88 vs. limit=22.5 2024-08-15 17:57:58,442 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-15 17:58:40,350 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-15 17:58:57,322 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 11350, loss[loss=0.07882, beats_loss=0.01182, ecapa_loss=0.0002021, whisper_loss=0.06498, over 15072.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01047, ecapa_loss=0.0001493, whisper_loss=0.09136, over 3902051.42 frames. ], batch size: 67, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 17:59:13,009 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.50 vs. limit=10.0 2024-08-15 17:59:13,791 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 23 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-15 17:59:15,321 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 28 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-15 17:59:17,743 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.884e+01 2.304e+01 2.607e+01 2.921e+01 2.640e+02, threshold=5.213e+01, percent-clipped=2.0 2024-08-15 17:59:50,532 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 13 from LS+wenet, 26 from Vox, 20 fro AS 2024-08-15 18:00:08,912 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3302120.0, ans=0.125 2024-08-15 18:00:11,391 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 11400, loss[loss=0.07842, beats_loss=0.01153, ecapa_loss=0.000126, whisper_loss=0.06563, over 14076.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01045, ecapa_loss=0.0001492, whisper_loss=0.09148, over 3838682.74 frames. ], batch size: 55, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:00:23,493 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-15 18:00:37,181 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3302320.0, ans=0.1 2024-08-15 18:00:49,590 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3302420.0, ans=0.125 2024-08-15 18:00:53,086 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3302420.0, ans=0.125 2024-08-15 18:00:54,427 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3302420.0, ans=0.0 2024-08-15 18:01:02,210 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3302520.0, ans=0.0 2024-08-15 18:01:05,722 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.52 vs. limit=15.0 2024-08-15 18:01:16,035 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.33 vs. limit=15.0 2024-08-15 18:01:26,344 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 11450, loss[loss=0.101, beats_loss=0.01246, ecapa_loss=0.0001504, whisper_loss=0.087, over 22756.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01039, ecapa_loss=0.0001502, whisper_loss=0.09164, over 3860334.12 frames. ], batch size: 96, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:01:46,559 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.784e+01 2.318e+01 2.537e+01 2.814e+01 4.367e+01, threshold=5.074e+01, percent-clipped=0.0 2024-08-15 18:01:49,597 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 24 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-15 18:01:52,747 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3302820.0, ans=0.125 2024-08-15 18:01:55,623 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 25 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-15 18:02:18,168 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 15 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-15 18:02:20,802 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.42 vs. limit=22.5 2024-08-15 18:02:24,426 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 26 from LS+wenet, 15 from Vox, 16 fro AS 2024-08-15 18:02:29,069 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3303120.0, ans=0.04949747468305833 2024-08-15 18:02:36,636 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3303120.0, ans=0.0 2024-08-15 18:02:38,019 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3303220.0, ans=0.0 2024-08-15 18:02:38,999 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 11500, loss[loss=0.09203, beats_loss=0.01291, ecapa_loss=0.0001153, whisper_loss=0.07797, over 22480.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01041, ecapa_loss=0.00015, whisper_loss=0.09202, over 3874605.64 frames. ], batch size: 91, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:02:40,562 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 23 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-15 18:02:40,896 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3303220.0, ans=0.125 2024-08-15 18:02:44,285 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3303220.0, ans=0.125 2024-08-15 18:02:48,343 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 21 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-15 18:03:06,294 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3303320.0, ans=0.125 2024-08-15 18:03:33,809 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 31 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-15 18:03:35,765 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3303520.0, ans=0.125 2024-08-15 18:03:37,579 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.46 vs. limit=15.0 2024-08-15 18:03:42,634 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3303620.0, ans=0.1 2024-08-15 18:03:42,689 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3303620.0, ans=0.0 2024-08-15 18:03:52,762 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 11550, loss[loss=0.1053, beats_loss=0.008753, ecapa_loss=0.0001533, whisper_loss=0.09503, over 16378.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01039, ecapa_loss=0.0001501, whisper_loss=0.09198, over 3870863.16 frames. ], batch size: 66, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:03:53,196 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3303720.0, ans=0.125 2024-08-15 18:04:12,859 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.935e+01 2.433e+01 2.629e+01 2.861e+01 8.078e+01, threshold=5.258e+01, percent-clipped=1.0 2024-08-15 18:04:21,459 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.10 vs. limit=15.0 2024-08-15 18:04:47,896 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 15 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-15 18:05:08,389 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 11600, loss[loss=0.08025, beats_loss=0.01224, ecapa_loss=0.0001403, whisper_loss=0.06661, over 14335.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0104, ecapa_loss=0.0001492, whisper_loss=0.09173, over 3844425.24 frames. ], batch size: 57, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:05:15,775 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3304220.0, ans=0.125 2024-08-15 18:05:20,155 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3304220.0, ans=0.125 2024-08-15 18:05:23,414 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.44 vs. limit=15.0 2024-08-15 18:05:30,354 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3304320.0, ans=0.07 2024-08-15 18:05:44,567 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3304420.0, ans=0.0 2024-08-15 18:05:44,818 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.21 vs. limit=10.0 2024-08-15 18:05:44,835 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.38 vs. limit=15.0 2024-08-15 18:05:53,140 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.43 vs. limit=15.0 2024-08-15 18:05:55,300 WARNING [optim.py:496] (1/4) Scaling gradients by 0.0450630709528923, model_norm_threshold=52.58251190185547 2024-08-15 18:05:55,469 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.906e+05, grad_sumsq=1.906e+05, orig_rms_sq=1.000e+00 2024-08-15 18:06:03,354 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3304520.0, ans=0.0 2024-08-15 18:06:08,102 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.18 vs. limit=10.0 2024-08-15 18:06:20,093 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 11650, loss[loss=0.1018, beats_loss=0.01021, ecapa_loss=0.0001759, whisper_loss=0.08978, over 21546.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0104, ecapa_loss=0.0001503, whisper_loss=0.09179, over 3881983.86 frames. ], batch size: 90, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:06:32,127 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3304720.0, ans=0.125 2024-08-15 18:06:35,317 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.61 vs. limit=15.0 2024-08-15 18:06:40,688 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.450e+01 2.772e+01 2.999e+01 1.167e+03, threshold=5.544e+01, percent-clipped=1.0 2024-08-15 18:07:16,090 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-15 18:07:22,392 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3305120.0, ans=0.2 2024-08-15 18:07:26,500 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3305120.0, ans=0.05 2024-08-15 18:07:31,352 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 11700, loss[loss=0.104, beats_loss=0.01167, ecapa_loss=0.0001367, whisper_loss=0.09095, over 21907.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01045, ecapa_loss=0.0001503, whisper_loss=0.09202, over 3905593.64 frames. ], batch size: 88, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:07:31,885 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3305220.0, ans=0.0 2024-08-15 18:07:38,656 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 20 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-15 18:08:00,018 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 41 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-15 18:08:10,016 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-15 18:08:17,648 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3305520.0, ans=0.0 2024-08-15 18:08:18,743 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 21 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-15 18:08:27,757 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3305620.0, ans=0.025 2024-08-15 18:08:37,934 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3305620.0, ans=0.025 2024-08-15 18:08:40,498 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 15 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-15 18:08:43,486 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 11750, loss[loss=0.0993, beats_loss=0.01155, ecapa_loss=0.0001607, whisper_loss=0.08614, over 22132.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01052, ecapa_loss=0.0001496, whisper_loss=0.09146, over 3897035.89 frames. ], batch size: 92, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:08:46,546 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 24 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-15 18:08:50,950 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3305720.0, ans=0.125 2024-08-15 18:09:03,517 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.784e+01 2.291e+01 2.526e+01 2.838e+01 3.948e+01, threshold=5.052e+01, percent-clipped=0.0 2024-08-15 18:09:12,648 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 18 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-15 18:09:18,526 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3305920.0, ans=0.1 2024-08-15 18:09:27,052 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 19 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-15 18:09:48,878 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 21 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-15 18:09:55,765 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 11800, loss[loss=0.1119, beats_loss=0.01084, ecapa_loss=0.0001537, whisper_loss=0.09949, over 22432.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01058, ecapa_loss=0.0001491, whisper_loss=0.09087, over 3875914.78 frames. ], batch size: 90, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:09:58,909 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=3306220.0, ans=0.025 2024-08-15 18:10:05,220 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.71 vs. limit=15.0 2024-08-15 18:10:11,871 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 24 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-15 18:10:20,953 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3306320.0, ans=0.125 2024-08-15 18:10:39,925 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3306520.0, ans=0.125 2024-08-15 18:10:51,929 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.89 vs. limit=15.0 2024-08-15 18:11:05,906 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 19 from LS+wenet, 22 from Vox, 50 fro AS 2024-08-15 18:11:08,368 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 11850, loss[loss=0.1198, beats_loss=0.01042, ecapa_loss=0.0001265, whisper_loss=0.1081, over 23705.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01071, ecapa_loss=0.0001483, whisper_loss=0.09017, over 3901339.39 frames. ], batch size: 89, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:11:17,952 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3306720.0, ans=0.5 2024-08-15 18:11:20,446 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 11 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-15 18:11:23,766 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3306820.0, ans=0.125 2024-08-15 18:11:28,385 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.920e+01 2.292e+01 2.620e+01 2.942e+01 3.993e+01, threshold=5.240e+01, percent-clipped=0.0 2024-08-15 18:11:44,698 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3306920.0, ans=0.125 2024-08-15 18:11:52,922 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-15 18:11:57,796 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3307020.0, ans=0.125 2024-08-15 18:12:14,396 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 18 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-15 18:12:20,222 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 11900, loss[loss=0.1092, beats_loss=0.01118, ecapa_loss=0.0001426, whisper_loss=0.09664, over 22368.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0107, ecapa_loss=0.0001488, whisper_loss=0.09027, over 3915171.48 frames. ], batch size: 89, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:12:34,564 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 36 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-15 18:12:38,935 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 28 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-15 18:12:40,181 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 20 from LS+wenet, 8 from Vox, 25 fro AS 2024-08-15 18:13:06,676 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3307520.0, ans=0.0 2024-08-15 18:13:11,654 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3307520.0, ans=0.0 2024-08-15 18:13:15,923 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3307520.0, ans=0.1 2024-08-15 18:13:27,705 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3307620.0, ans=0.125 2024-08-15 18:13:30,144 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 20 from LS+wenet, 21 from Vox, 49 fro AS 2024-08-15 18:13:33,020 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 11950, loss[loss=0.1087, beats_loss=0.01075, ecapa_loss=0.0001766, whisper_loss=0.0962, over 16521.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01061, ecapa_loss=0.0001489, whisper_loss=0.09018, over 3868170.12 frames. ], batch size: 71, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:13:42,164 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=14.34 vs. limit=15.0 2024-08-15 18:13:52,455 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.646e+01 2.261e+01 2.658e+01 2.941e+01 1.221e+02, threshold=5.315e+01, percent-clipped=2.0 2024-08-15 18:13:54,619 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3307820.0, ans=0.0 2024-08-15 18:13:57,835 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.91 vs. limit=15.0 2024-08-15 18:14:02,920 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3307920.0, ans=0.125 2024-08-15 18:14:03,988 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 29 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-15 18:14:24,365 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3308020.0, ans=0.0 2024-08-15 18:14:27,121 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-15 18:14:44,627 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 12000, loss[loss=0.09748, beats_loss=0.009878, ecapa_loss=0.000199, whisper_loss=0.08562, over 21223.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01062, ecapa_loss=0.0001493, whisper_loss=0.08983, over 3861774.63 frames. ], batch size: 94, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:14:44,627 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-15 18:15:24,370 INFO [train_multi_KD3.py:1149] (1/4) Epoch 23, validation on ASR_libri: loss=0.2516, beats_loss=0, ecapa_loss=0.0005315, whisper_loss=0.2463, over 922467.00 frames. 2024-08-15 18:15:43,411 INFO [train_multi_KD3.py:1149] (1/4) Epoch 23, validation on SV_voxceleb1: loss=0.004172, beats_loss=0, ecapa_loss=0.0004172, whisper_loss=0, over 939242.00 frames. 2024-08-15 18:16:35,517 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.9395, 5.7092, 5.8806, 5.9289], device='cuda:1') 2024-08-15 18:17:41,567 INFO [train_multi_KD3.py:1149] (1/4) Epoch 23, validation on AT_audioset: loss=0.02323, beats_loss=0.02323, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 18:17:41,570 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 30639MB 2024-08-15 18:17:58,886 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3308320.0, ans=0.1 2024-08-15 18:17:59,144 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.76 vs. limit=15.0 2024-08-15 18:18:08,060 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.94 vs. limit=15.0 2024-08-15 18:18:19,011 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 23 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-15 18:18:19,279 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3308420.0, ans=0.125 2024-08-15 18:18:20,410 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-15 18:18:37,106 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3308520.0, ans=0.2 2024-08-15 18:18:55,720 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 12050, loss[loss=0.09302, beats_loss=0.01006, ecapa_loss=0.0001015, whisper_loss=0.08195, over 15653.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01056, ecapa_loss=0.000149, whisper_loss=0.09001, over 3846106.25 frames. ], batch size: 57, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:19:02,291 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3308720.0, ans=0.0 2024-08-15 18:19:14,026 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 20 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-15 18:19:16,687 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.927e+01 2.282e+01 2.556e+01 2.851e+01 3.972e+01, threshold=5.113e+01, percent-clipped=0.0 2024-08-15 18:19:24,664 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3308920.0, ans=0.1 2024-08-15 18:19:37,723 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 24 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-15 18:19:54,645 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.04 vs. limit=6.0 2024-08-15 18:19:56,292 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.94 vs. limit=15.0 2024-08-15 18:20:01,569 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.13 vs. limit=15.0 2024-08-15 18:20:10,329 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 12100, loss[loss=0.09821, beats_loss=0.01144, ecapa_loss=0.0001177, whisper_loss=0.08559, over 21855.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01057, ecapa_loss=0.0001493, whisper_loss=0.09001, over 3829958.78 frames. ], batch size: 87, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:20:15,109 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3309220.0, ans=0.125 2024-08-15 18:20:19,260 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 20 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-15 18:20:20,750 INFO [train_multi_KD3.py:844] (1/4) A total of 97 cuts. 21 from LS+wenet, 31 from Vox, 45 fro AS 2024-08-15 18:20:23,656 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 33 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-15 18:20:29,705 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3309320.0, ans=0.125 2024-08-15 18:20:31,133 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3309320.0, ans=0.035 2024-08-15 18:20:56,600 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3309520.0, ans=0.125 2024-08-15 18:21:11,511 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 22 from LS+wenet, 31 from Vox, 38 fro AS 2024-08-15 18:21:17,859 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3309620.0, ans=0.0 2024-08-15 18:21:20,625 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3309620.0, ans=0.0 2024-08-15 18:21:24,703 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 12150, loss[loss=0.101, beats_loss=0.01175, ecapa_loss=0.000148, whisper_loss=0.08779, over 22946.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01048, ecapa_loss=0.0001503, whisper_loss=0.09018, over 3825660.73 frames. ], batch size: 91, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:21:25,033 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 30 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-15 18:21:33,114 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3309720.0, ans=0.0 2024-08-15 18:21:46,288 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.597e+01 2.187e+01 2.499e+01 2.897e+01 4.006e+01, threshold=4.998e+01, percent-clipped=0.0 2024-08-15 18:21:58,514 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-15 18:22:06,507 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3309920.0, ans=0.09899494936611666 2024-08-15 18:22:20,845 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-15 18:22:27,301 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3310120.0, ans=0.0 2024-08-15 18:22:33,318 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=3310120.0, ans=22.5 2024-08-15 18:22:39,999 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 12200, loss[loss=0.07596, beats_loss=0.01475, ecapa_loss=0.0001233, whisper_loss=0.05998, over 21730.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01057, ecapa_loss=0.0001495, whisper_loss=0.08953, over 3813487.74 frames. ], batch size: 88, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:22:43,406 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 23 from LS+wenet, 10 from Vox, 22 fro AS 2024-08-15 18:22:52,599 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 27 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-15 18:22:56,063 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.53 vs. limit=15.0 2024-08-15 18:23:00,813 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.05 vs. limit=15.0 2024-08-15 18:23:10,538 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3310420.0, ans=0.0 2024-08-15 18:23:13,737 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.72 vs. limit=15.0 2024-08-15 18:23:23,855 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3310520.0, ans=0.0 2024-08-15 18:23:38,827 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-15 18:23:47,688 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 26 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-15 18:23:54,452 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 12250, loss[loss=0.08424, beats_loss=0.01125, ecapa_loss=0.0001924, whisper_loss=0.07107, over 21486.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01048, ecapa_loss=0.0001492, whisper_loss=0.09034, over 3824027.62 frames. ], batch size: 91, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:24:11,703 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3310820.0, ans=0.0 2024-08-15 18:24:15,438 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.727e+01 2.354e+01 2.587e+01 2.883e+01 9.186e+01, threshold=5.174e+01, percent-clipped=2.0 2024-08-15 18:24:56,086 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3311120.0, ans=0.125 2024-08-15 18:25:08,680 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 12300, loss[loss=0.1081, beats_loss=0.008838, ecapa_loss=0.0001441, whisper_loss=0.09784, over 16475.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01043, ecapa_loss=0.0001497, whisper_loss=0.09009, over 3834934.07 frames. ], batch size: 64, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:25:16,698 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 21 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-15 18:25:32,256 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3311320.0, ans=0.1 2024-08-15 18:25:33,785 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3311320.0, ans=0.2 2024-08-15 18:26:05,715 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3311520.0, ans=0.1 2024-08-15 18:26:23,477 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=3311720.0, ans=0.02 2024-08-15 18:26:24,226 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 12350, loss[loss=0.112, beats_loss=0.00957, ecapa_loss=0.0001404, whisper_loss=0.101, over 23287.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01044, ecapa_loss=0.0001507, whisper_loss=0.09075, over 3869209.34 frames. ], batch size: 89, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:26:27,588 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 18 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-15 18:26:34,019 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3311720.0, ans=0.0 2024-08-15 18:26:41,274 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3311820.0, ans=0.0 2024-08-15 18:26:44,905 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.020e+01 2.423e+01 2.680e+01 3.098e+01 2.023e+02, threshold=5.359e+01, percent-clipped=1.0 2024-08-15 18:26:45,529 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=3311820.0, ans=0.2 2024-08-15 18:26:50,265 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3311820.0, ans=0.0 2024-08-15 18:26:55,564 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 24 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-15 18:26:57,777 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.87 vs. limit=5.0 2024-08-15 18:27:13,943 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3312020.0, ans=0.2 2024-08-15 18:27:27,390 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3312120.0, ans=0.2 2024-08-15 18:27:38,591 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 12400, loss[loss=0.09075, beats_loss=0.01349, ecapa_loss=0.0001045, whisper_loss=0.07621, over 23333.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0105, ecapa_loss=0.0001497, whisper_loss=0.09035, over 3883192.03 frames. ], batch size: 92, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:28:05,685 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 15 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-15 18:28:29,591 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3312520.0, ans=0.125 2024-08-15 18:28:33,913 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3312520.0, ans=0.0 2024-08-15 18:28:33,971 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3312520.0, ans=0.125 2024-08-15 18:28:40,582 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 14 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-15 18:28:52,978 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 12450, loss[loss=0.09818, beats_loss=0.01167, ecapa_loss=0.0001317, whisper_loss=0.08519, over 14425.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01051, ecapa_loss=0.0001497, whisper_loss=0.09052, over 3888483.17 frames. ], batch size: 55, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:28:54,968 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3312720.0, ans=0.125 2024-08-15 18:28:57,580 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 23 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-15 18:29:11,284 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 19 from LS+wenet, 29 from Vox, 46 fro AS 2024-08-15 18:29:13,844 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.349e+01 2.594e+01 2.911e+01 3.951e+02, threshold=5.187e+01, percent-clipped=3.0 2024-08-15 18:29:14,566 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3312820.0, ans=0.0 2024-08-15 18:29:20,280 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 20 from LS+wenet, 29 from Vox, 42 fro AS 2024-08-15 18:29:21,487 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 25 from LS+wenet, 13 from Vox, 19 fro AS 2024-08-15 18:29:23,534 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=3312920.0, ans=15.0 2024-08-15 18:29:31,967 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 20 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-15 18:29:36,485 INFO [train_multi_KD3.py:844] (1/4) A total of 96 cuts. 25 from LS+wenet, 18 from Vox, 53 fro AS 2024-08-15 18:29:56,789 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.25 vs. limit=15.0 2024-08-15 18:30:07,636 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 12500, loss[loss=0.1004, beats_loss=0.009457, ecapa_loss=0.0001448, whisper_loss=0.08951, over 22840.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01049, ecapa_loss=0.0001486, whisper_loss=0.09079, over 3901836.04 frames. ], batch size: 89, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:30:15,808 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3313220.0, ans=0.0 2024-08-15 18:30:19,782 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 24 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-15 18:30:32,961 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 18 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-15 18:30:54,485 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3313520.0, ans=0.125 2024-08-15 18:30:58,768 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 21 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-15 18:31:18,989 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3313620.0, ans=0.1 2024-08-15 18:31:23,185 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 12550, loss[loss=0.1107, beats_loss=0.009712, ecapa_loss=0.0001256, whisper_loss=0.09974, over 16858.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0105, ecapa_loss=0.0001488, whisper_loss=0.0908, over 3921699.92 frames. ], batch size: 61, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:31:39,519 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3313820.0, ans=0.125 2024-08-15 18:31:44,790 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.895e+01 2.273e+01 2.491e+01 2.694e+01 3.703e+01, threshold=4.981e+01, percent-clipped=0.0 2024-08-15 18:31:45,671 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3313820.0, ans=0.0 2024-08-15 18:31:50,946 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 21 from LS+wenet, 21 from Vox, 49 fro AS 2024-08-15 18:31:53,348 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.32 vs. limit=15.0 2024-08-15 18:32:00,188 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3313920.0, ans=0.125 2024-08-15 18:32:13,870 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3314020.0, ans=0.0 2024-08-15 18:32:15,304 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3314020.0, ans=0.125 2024-08-15 18:32:16,434 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 18 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-15 18:32:16,844 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3314020.0, ans=0.1 2024-08-15 18:32:23,178 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.27 vs. limit=6.0 2024-08-15 18:32:31,617 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3314120.0, ans=0.5 2024-08-15 18:32:37,837 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3314220.0, ans=0.1 2024-08-15 18:32:39,141 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 12600, loss[loss=0.09886, beats_loss=0.01132, ecapa_loss=0.0001438, whisper_loss=0.0861, over 21204.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01063, ecapa_loss=0.0001483, whisper_loss=0.09062, over 3931516.69 frames. ], batch size: 88, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:32:54,694 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 24 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-15 18:32:58,707 WARNING [optim.py:496] (1/4) Scaling gradients by 0.08482968807220459, model_norm_threshold=49.81049346923828 2024-08-15 18:32:58,879 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.1.norm.log_scale with proportion 0.20, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.887e+04, grad_sumsq=6.887e+04, orig_rms_sq=1.000e+00 2024-08-15 18:33:12,252 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3314420.0, ans=0.125 2024-08-15 18:33:19,963 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3314420.0, ans=0.125 2024-08-15 18:33:42,181 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3314620.0, ans=0.0 2024-08-15 18:33:47,639 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-15 18:33:53,193 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 12650, loss[loss=0.118, beats_loss=0.01131, ecapa_loss=0.0001217, whisper_loss=0.1055, over 22041.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01069, ecapa_loss=0.0001495, whisper_loss=0.09004, over 3916770.27 frames. ], batch size: 89, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:33:56,606 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3314720.0, ans=0.0 2024-08-15 18:33:59,690 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3314720.0, ans=0.0 2024-08-15 18:34:05,238 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3314720.0, ans=0.0 2024-08-15 18:34:13,529 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.371e+01 2.618e+01 2.895e+01 5.872e+02, threshold=5.236e+01, percent-clipped=1.0 2024-08-15 18:34:15,116 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 31 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-15 18:34:17,213 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3314820.0, ans=0.0 2024-08-15 18:34:17,216 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3314820.0, ans=0.125 2024-08-15 18:34:23,447 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.15 vs. limit=10.0 2024-08-15 18:34:27,864 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3314920.0, ans=0.125 2024-08-15 18:34:28,924 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 22 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-15 18:34:38,233 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3315020.0, ans=0.05 2024-08-15 18:34:40,741 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 15 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-15 18:34:45,446 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3315020.0, ans=0.0 2024-08-15 18:34:51,698 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.83 vs. limit=22.5 2024-08-15 18:34:55,910 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3315120.0, ans=0.125 2024-08-15 18:35:06,295 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 18:35:07,063 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 12700, loss[loss=0.1046, beats_loss=0.01225, ecapa_loss=0.0001199, whisper_loss=0.09115, over 17935.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01069, ecapa_loss=0.0001485, whisper_loss=0.09053, over 3899324.96 frames. ], batch size: 70, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:35:27,943 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.65 vs. limit=15.0 2024-08-15 18:35:37,688 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 18 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-15 18:35:54,270 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3315520.0, ans=0.0 2024-08-15 18:36:07,895 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3315620.0, ans=0.0 2024-08-15 18:36:22,300 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 12750, loss[loss=0.07322, beats_loss=0.01271, ecapa_loss=0.000169, whisper_loss=0.05882, over 18051.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01068, ecapa_loss=0.0001496, whisper_loss=0.0909, over 3885211.39 frames. ], batch size: 78, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:36:22,695 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 31 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-15 18:36:24,554 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3315720.0, ans=0.1 2024-08-15 18:36:31,718 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3315720.0, ans=0.125 2024-08-15 18:36:38,972 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 41 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-15 18:36:43,129 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.744e+01 2.256e+01 2.562e+01 2.837e+01 4.631e+01, threshold=5.124e+01, percent-clipped=0.0 2024-08-15 18:36:45,332 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.65 vs. limit=15.0 2024-08-15 18:36:46,645 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3315820.0, ans=0.125 2024-08-15 18:37:00,945 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3315920.0, ans=0.125 2024-08-15 18:37:06,424 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-15 18:37:21,262 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.27 vs. limit=6.0 2024-08-15 18:37:22,357 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=6.732e+01 2024-08-15 18:37:28,717 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.23 vs. limit=10.0 2024-08-15 18:37:29,393 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 35 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-15 18:37:32,707 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3316120.0, ans=0.0 2024-08-15 18:37:36,480 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 12800, loss[loss=0.1268, beats_loss=0.009001, ecapa_loss=0.0001315, whisper_loss=0.1165, over 24150.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01075, ecapa_loss=0.0001498, whisper_loss=0.09057, over 3928697.04 frames. ], batch size: 90, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:37:38,290 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3316220.0, ans=0.125 2024-08-15 18:37:41,053 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 16 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-15 18:37:47,390 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3316220.0, ans=0.125 2024-08-15 18:37:59,518 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-15 18:37:59,871 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3316320.0, ans=0.0 2024-08-15 18:38:26,346 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3316520.0, ans=0.125 2024-08-15 18:38:49,905 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3316620.0, ans=0.125 2024-08-15 18:38:51,821 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.04 vs. limit=15.0 2024-08-15 18:38:52,359 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 12850, loss[loss=0.1008, beats_loss=0.01026, ecapa_loss=0.0001311, whisper_loss=0.08923, over 16192.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01075, ecapa_loss=0.0001513, whisper_loss=0.08979, over 3901115.62 frames. ], batch size: 62, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:39:13,595 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.324e+01 2.629e+01 2.874e+01 4.372e+01, threshold=5.259e+01, percent-clipped=0.0 2024-08-15 18:39:14,144 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 26 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-15 18:39:19,133 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3316820.0, ans=0.125 2024-08-15 18:39:23,375 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3316920.0, ans=0.125 2024-08-15 18:39:30,720 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-15 18:39:32,402 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3316920.0, ans=0.125 2024-08-15 18:39:39,368 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3317020.0, ans=0.125 2024-08-15 18:39:50,186 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.54 vs. limit=15.0 2024-08-15 18:39:57,551 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3317120.0, ans=0.1 2024-08-15 18:39:58,695 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 29 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-15 18:40:03,054 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 31 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-15 18:40:07,007 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 12900, loss[loss=0.06511, beats_loss=0.01318, ecapa_loss=0.0001349, whisper_loss=0.05059, over 14034.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01077, ecapa_loss=0.0001504, whisper_loss=0.08973, over 3893586.79 frames. ], batch size: 59, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:40:31,607 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3317320.0, ans=0.125 2024-08-15 18:40:54,075 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.79 vs. limit=15.0 2024-08-15 18:41:00,769 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 19 from LS+wenet, 31 from Vox, 40 fro AS 2024-08-15 18:41:21,518 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 12950, loss[loss=0.09481, beats_loss=0.01144, ecapa_loss=0.0001307, whisper_loss=0.08207, over 20630.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01079, ecapa_loss=0.0001508, whisper_loss=0.08923, over 3862222.97 frames. ], batch size: 81, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:41:40,933 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.766e+01 2.294e+01 2.551e+01 2.879e+01 4.880e+01, threshold=5.103e+01, percent-clipped=0.0 2024-08-15 18:41:50,115 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3317920.0, ans=0.0 2024-08-15 18:42:20,194 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 27 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-15 18:42:34,473 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 13000, loss[loss=0.08567, beats_loss=0.01212, ecapa_loss=0.0001393, whisper_loss=0.07216, over 20018.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01079, ecapa_loss=0.0001502, whisper_loss=0.08931, over 3892361.26 frames. ], batch size: 80, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:42:39,132 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 24 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-15 18:42:45,259 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3318220.0, ans=0.2 2024-08-15 18:42:51,490 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3318320.0, ans=0.2 2024-08-15 18:43:01,069 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 24 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-15 18:43:18,754 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.98 vs. limit=15.0 2024-08-15 18:43:20,199 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3318520.0, ans=0.125 2024-08-15 18:43:29,697 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 17 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-15 18:43:30,067 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3318520.0, ans=0.2 2024-08-15 18:43:48,946 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 13050, loss[loss=0.09665, beats_loss=0.01398, ecapa_loss=0.0001168, whisper_loss=0.0815, over 23466.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01076, ecapa_loss=0.0001487, whisper_loss=0.08962, over 3878959.72 frames. ], batch size: 93, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:43:55,019 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 19 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-15 18:44:09,576 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.022e+01 2.364e+01 2.592e+01 2.928e+01 7.191e+01, threshold=5.184e+01, percent-clipped=1.0 2024-08-15 18:44:11,768 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3318820.0, ans=0.2 2024-08-15 18:44:17,204 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 25 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-15 18:44:23,343 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=3318920.0, ans=0.05 2024-08-15 18:44:29,327 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 18:44:35,100 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3319020.0, ans=0.125 2024-08-15 18:44:36,639 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3319020.0, ans=0.05 2024-08-15 18:44:49,027 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 19 from LS+wenet, 11 from Vox, 37 fro AS 2024-08-15 18:44:51,032 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3319120.0, ans=0.1 2024-08-15 18:44:51,101 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3319120.0, ans=0.125 2024-08-15 18:44:57,184 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.35 vs. limit=15.0 2024-08-15 18:45:01,109 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3319220.0, ans=0.125 2024-08-15 18:45:01,824 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 13100, loss[loss=0.1005, beats_loss=0.01109, ecapa_loss=0.0001419, whisper_loss=0.08801, over 22952.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01076, ecapa_loss=0.000148, whisper_loss=0.08998, over 3910594.11 frames. ], batch size: 93, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:45:13,385 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3319220.0, ans=0.0 2024-08-15 18:45:19,900 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3319320.0, ans=10.0 2024-08-15 18:45:24,620 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.36 vs. limit=22.5 2024-08-15 18:45:37,397 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 11 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-15 18:45:38,874 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 33 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-15 18:45:41,166 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.99 vs. limit=15.0 2024-08-15 18:45:42,296 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3319420.0, ans=0.2 2024-08-15 18:45:54,799 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3319520.0, ans=0.2 2024-08-15 18:46:06,441 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.15 vs. limit=15.0 2024-08-15 18:46:15,353 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 25 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-15 18:46:16,861 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 22 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-15 18:46:19,563 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 13150, loss[loss=0.08625, beats_loss=0.01053, ecapa_loss=0.0001228, whisper_loss=0.07449, over 15368.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01071, ecapa_loss=0.000148, whisper_loss=0.09043, over 3884565.02 frames. ], batch size: 58, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:46:25,927 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3319720.0, ans=0.2 2024-08-15 18:46:29,247 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3319720.0, ans=0.125 2024-08-15 18:46:32,066 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 27 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-15 18:46:33,710 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-15 18:46:40,380 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3319820.0, ans=0.0 2024-08-15 18:46:41,426 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.950e+01 2.374e+01 2.573e+01 2.884e+01 4.147e+01, threshold=5.146e+01, percent-clipped=0.0 2024-08-15 18:47:24,123 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-15 18:47:29,902 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.28 vs. limit=10.0 2024-08-15 18:47:32,399 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3320120.0, ans=0.1 2024-08-15 18:47:40,574 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3320120.0, ans=0.125 2024-08-15 18:47:43,747 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 13200, loss[loss=0.1277, beats_loss=0.008573, ecapa_loss=0.0001667, whisper_loss=0.1175, over 17429.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01061, ecapa_loss=0.0001496, whisper_loss=0.09099, over 3897695.08 frames. ], batch size: 69, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:48:05,565 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3320320.0, ans=0.035 2024-08-15 18:48:25,235 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3320420.0, ans=0.125 2024-08-15 18:48:25,453 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.98 vs. limit=15.0 2024-08-15 18:48:32,286 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 19 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-15 18:48:35,253 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 20 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-15 18:48:42,705 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.10 vs. limit=15.0 2024-08-15 18:48:51,643 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3320620.0, ans=0.125 2024-08-15 18:48:54,976 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3320620.0, ans=0.0 2024-08-15 18:49:06,122 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3320720.0, ans=0.125 2024-08-15 18:49:06,931 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 13250, loss[loss=0.0849, beats_loss=0.01059, ecapa_loss=0.0002209, whisper_loss=0.0721, over 13836.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01049, ecapa_loss=0.0001507, whisper_loss=0.09158, over 3875984.05 frames. ], batch size: 62, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:49:30,579 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.935e+01 2.301e+01 2.680e+01 3.189e+01 5.288e+01, threshold=5.359e+01, percent-clipped=1.0 2024-08-15 18:49:38,311 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 18:49:44,615 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.08 vs. limit=22.5 2024-08-15 18:49:45,774 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3320920.0, ans=0.0 2024-08-15 18:49:47,403 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.whiten.whitening_limit, batch_count=3320920.0, ans=12.0 2024-08-15 18:49:48,432 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-15 18:50:22,876 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 30 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-15 18:50:25,350 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3321120.0, ans=0.125 2024-08-15 18:50:26,375 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-15 18:50:28,630 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3321220.0, ans=0.0 2024-08-15 18:50:29,652 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 13300, loss[loss=0.104, beats_loss=0.01084, ecapa_loss=0.0001569, whisper_loss=0.09156, over 21266.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01052, ecapa_loss=0.0001505, whisper_loss=0.09105, over 3858542.59 frames. ], batch size: 88, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:50:41,389 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 16 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-15 18:50:50,111 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3321320.0, ans=0.125 2024-08-15 18:51:00,631 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3321320.0, ans=0.125 2024-08-15 18:51:39,634 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3321620.0, ans=0.0 2024-08-15 18:51:39,663 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3321620.0, ans=0.1 2024-08-15 18:51:45,214 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3321620.0, ans=0.125 2024-08-15 18:51:55,937 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 13350, loss[loss=0.08685, beats_loss=0.009707, ecapa_loss=0.0001616, whisper_loss=0.07553, over 16210.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01054, ecapa_loss=0.0001499, whisper_loss=0.09042, over 3834124.83 frames. ], batch size: 64, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:51:56,178 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 36 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-15 18:51:56,489 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3321720.0, ans=0.2 2024-08-15 18:52:07,893 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.82 vs. limit=22.5 2024-08-15 18:52:08,044 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.25 vs. limit=10.0 2024-08-15 18:52:11,468 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.52 vs. limit=10.0 2024-08-15 18:52:15,751 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 19 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-15 18:52:21,128 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.788e+01 2.271e+01 2.660e+01 2.983e+01 5.401e+01, threshold=5.319e+01, percent-clipped=1.0 2024-08-15 18:52:21,315 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 32 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-15 18:52:29,578 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.24 vs. limit=15.0 2024-08-15 18:52:33,612 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 33 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-15 18:52:51,306 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-15 18:52:51,575 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3322020.0, ans=0.125 2024-08-15 18:52:54,422 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 18 from LS+wenet, 22 from Vox, 18 fro AS 2024-08-15 18:52:59,461 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3322020.0, ans=0.125 2024-08-15 18:53:03,006 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3322020.0, ans=0.1 2024-08-15 18:53:15,364 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3322120.0, ans=0.125 2024-08-15 18:53:23,026 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 13400, loss[loss=0.1009, beats_loss=0.009819, ecapa_loss=0.0001709, whisper_loss=0.08934, over 13914.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01053, ecapa_loss=0.0001495, whisper_loss=0.09056, over 3828039.13 frames. ], batch size: 56, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:53:27,835 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3322220.0, ans=0.2 2024-08-15 18:53:53,890 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3322320.0, ans=0.2 2024-08-15 18:53:56,688 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 18:54:01,896 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 18:54:01,968 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3322420.0, ans=0.125 2024-08-15 18:54:10,067 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3322420.0, ans=0.0 2024-08-15 18:54:50,149 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 13450, loss[loss=0.09994, beats_loss=0.00905, ecapa_loss=0.0001678, whisper_loss=0.08922, over 22432.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01058, ecapa_loss=0.0001497, whisper_loss=0.09007, over 3857030.13 frames. ], batch size: 93, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:54:50,679 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3322720.0, ans=0.2 2024-08-15 18:55:06,496 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3322820.0, ans=0.025 2024-08-15 18:55:14,893 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.704e+01 2.424e+01 2.655e+01 2.945e+01 1.400e+03, threshold=5.311e+01, percent-clipped=0.0 2024-08-15 18:55:14,894 WARNING [optim.py:496] (1/4) Scaling gradients by 0.037934403866529465, model_norm_threshold=53.10542297363281 2024-08-15 18:55:15,065 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.27, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=5.249e+05, grad_sumsq=5.178e+07, orig_rms_sq=1.014e-02 2024-08-15 18:55:21,381 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 17 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-15 18:55:26,952 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3322920.0, ans=0.2 2024-08-15 18:55:37,702 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3322920.0, ans=0.1 2024-08-15 18:55:44,712 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 18:56:04,983 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3323120.0, ans=0.2 2024-08-15 18:56:16,031 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 13500, loss[loss=0.09629, beats_loss=0.01014, ecapa_loss=0.0001581, whisper_loss=0.08457, over 22952.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01061, ecapa_loss=0.0001495, whisper_loss=0.09013, over 3844016.15 frames. ], batch size: 94, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:56:31,232 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3323220.0, ans=0.2 2024-08-15 18:56:33,461 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3323320.0, ans=0.1 2024-08-15 18:56:39,028 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 17 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-15 18:56:44,390 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3323320.0, ans=0.125 2024-08-15 18:56:56,655 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3323420.0, ans=0.125 2024-08-15 18:57:02,174 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.383e-01 2024-08-15 18:57:07,322 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 20 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-15 18:57:24,484 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3323520.0, ans=0.2 2024-08-15 18:57:31,041 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 36 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-15 18:57:44,429 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 13550, loss[loss=0.1176, beats_loss=0.009252, ecapa_loss=0.0001696, whisper_loss=0.1067, over 22524.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01064, ecapa_loss=0.0001486, whisper_loss=0.08976, over 3845343.35 frames. ], batch size: 92, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:57:53,712 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-15 18:58:08,435 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.701e+01 2.273e+01 2.505e+01 2.907e+01 8.129e+01, threshold=5.010e+01, percent-clipped=4.0 2024-08-15 18:58:42,886 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 15 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-15 18:58:47,896 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3324020.0, ans=0.125 2024-08-15 18:59:10,018 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 13600, loss[loss=0.09588, beats_loss=0.009643, ecapa_loss=0.0001468, whisper_loss=0.08477, over 21878.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01063, ecapa_loss=0.000149, whisper_loss=0.0902, over 3878746.03 frames. ], batch size: 89, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:59:13,351 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 32 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-15 18:59:15,439 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 33 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-15 18:59:18,937 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.23 vs. limit=15.0 2024-08-15 18:59:29,317 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.32 vs. limit=15.0 2024-08-15 18:59:33,402 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 21 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-15 18:59:33,853 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.70 vs. limit=22.5 2024-08-15 18:59:39,115 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3324320.0, ans=0.0 2024-08-15 18:59:45,888 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 20 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-15 18:59:49,163 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 14 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-15 18:59:52,822 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3324420.0, ans=0.07 2024-08-15 19:00:01,270 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3324520.0, ans=0.0 2024-08-15 19:00:28,999 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3324620.0, ans=0.125 2024-08-15 19:00:29,040 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3324620.0, ans=0.125 2024-08-15 19:00:34,808 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 13650, loss[loss=0.1081, beats_loss=0.01166, ecapa_loss=0.0001008, whisper_loss=0.09542, over 18943.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01075, ecapa_loss=0.0001485, whisper_loss=0.08975, over 3875736.26 frames. ], batch size: 69, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 19:00:43,990 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 15 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-15 19:00:44,928 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3324720.0, ans=0.0 2024-08-15 19:00:58,852 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.624e+01 2.294e+01 2.538e+01 2.832e+01 8.240e+01, threshold=5.075e+01, percent-clipped=1.0 2024-08-15 19:01:04,918 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3324820.0, ans=0.0 2024-08-15 19:01:28,220 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3325020.0, ans=0.09899494936611666 2024-08-15 19:01:31,337 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3325020.0, ans=0.125 2024-08-15 19:01:40,126 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.33 vs. limit=12.0 2024-08-15 19:01:51,141 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 27 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-15 19:01:59,443 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 13700, loss[loss=0.1252, beats_loss=0.01034, ecapa_loss=0.0001675, whisper_loss=0.1132, over 16794.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01074, ecapa_loss=0.0001481, whisper_loss=0.08984, over 3819058.30 frames. ], batch size: 62, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 19:02:12,988 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3325220.0, ans=0.125 2024-08-15 19:02:21,767 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3325320.0, ans=0.125 2024-08-15 19:02:44,875 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3325420.0, ans=0.125 2024-08-15 19:02:48,278 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.210e+01 2024-08-15 19:02:51,635 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 22 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-15 19:25:26,856 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3325620.0, ans=0.0 2024-08-15 19:25:26,889 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3325620.0, ans=0.125 2024-08-15 19:59:43,254 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 13750, loss[loss=0.09609, beats_loss=0.01338, ecapa_loss=0.0001365, whisper_loss=0.08135, over 15640.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01073, ecapa_loss=0.0001476, whisper_loss=0.0899, over 3809993.67 frames. ], batch size: 62, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 19:59:43,441 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 33 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-15 20:03:59,785 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3325720.0, ans=0.0 2024-08-15 20:07:56,194 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3325720.0, ans=0.125 2024-08-15 20:11:09,008 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 15 from LS+wenet, 9 from Vox, 31 fro AS 2024-08-15 20:47:28,694 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.373e+01 2.597e+01 2.828e+01 1.512e+02, threshold=5.195e+01, percent-clipped=2.0 2024-08-15 20:52:22,109 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.67 vs. limit=15.0 2024-08-15 22:02:37,140 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 39 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-15 22:06:33,154 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3326020.0, ans=0.125 2024-08-15 22:32:16,921 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 19 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-15 22:34:48,818 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3326120.0, ans=0.125 2024-08-15 22:41:52,440 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 13800, loss[loss=0.1136, beats_loss=0.00957, ecapa_loss=0.0001431, whisper_loss=0.1026, over 23037.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0106, ecapa_loss=0.0001486, whisper_loss=0.09117, over 3852368.48 frames. ], batch size: 90, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 22:51:07,073 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3326220.0, ans=0.125 2024-08-15 23:34:42,178 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3326320.0, ans=0.125 2024-08-16 00:34:03,373 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 15 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-16 00:39:14,838 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.21 vs. limit=22.5 2024-08-16 01:12:17,700 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 13850, loss[loss=0.1078, beats_loss=0.01099, ecapa_loss=0.0001709, whisper_loss=0.09508, over 18308.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01062, ecapa_loss=0.000148, whisper_loss=0.091, over 3856965.30 frames. ], batch size: 75, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-16 01:25:30,940 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3326720.0, ans=0.125 2024-08-16 01:33:57,512 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.46 vs. limit=22.5 2024-08-16 01:55:15,105 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.680e+01 2.228e+01 2.445e+01 2.757e+01 2.786e+02, threshold=4.891e+01, percent-clipped=1.0 2024-08-16 02:20:56,423 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3326920.0, ans=0.125 2024-08-16 02:24:07,783 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3326920.0, ans=0.0 2024-08-16 03:08:56,745 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 24 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-16 03:09:32,217 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3327020.0, ans=0.125 2024-08-16 03:42:04,374 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3327120.0, ans=0.125 2024-08-16 03:47:18,316 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 13900, loss[loss=0.1108, beats_loss=0.009938, ecapa_loss=0.0001794, whisper_loss=0.09909, over 22565.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01064, ecapa_loss=0.000148, whisper_loss=0.09108, over 3854683.70 frames. ], batch size: 92, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-16 03:47:46,854 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3327220.0, ans=0.0 2024-08-16 04:04:58,363 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3327220.0, ans=0.0 2024-08-16 04:26:44,449 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3327320.0, ans=0.0 2024-08-16 04:57:15,729 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 35 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-16 05:06:23,729 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 25 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-16 05:30:45,861 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 19 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-16 05:56:32,922 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3327620.0, ans=0.5 2024-08-16 06:12:42,844 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3327620.0, ans=0.125 2024-08-16 06:16:02,365 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 13950, loss[loss=0.1298, beats_loss=0.008264, ecapa_loss=0.0001327, whisper_loss=0.1202, over 20910.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01051, ecapa_loss=0.0001499, whisper_loss=0.09257, over 3876533.14 frames. ], batch size: 81, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-16 06:26:03,575 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-16 06:39:16,597 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.69 vs. limit=10.0 2024-08-16 06:42:00,788 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-16 06:57:05,965 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.387e+01 2.658e+01 3.040e+01 4.567e+01, threshold=5.316e+01, percent-clipped=0.0 2024-08-16 07:08:53,402 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.62 vs. limit=15.0 2024-08-16 07:41:47,952 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3327920.0, ans=0.125 2024-08-16 07:43:44,938 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3327920.0, ans=0.0 2024-08-16 08:22:17,120 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-16 08:41:08,323 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3328120.0, ans=0.125 2024-08-16 09:02:46,048 INFO [train_multi_KD3.py:1116] (1/4) Epoch 23, batch 14000, loss[loss=0.09693, beats_loss=0.01145, ecapa_loss=0.0001447, whisper_loss=0.08403, over 23458.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01051, ecapa_loss=0.0001492, whisper_loss=0.09308, over 3908860.25 frames. ], batch size: 94, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-16 09:11:13,985 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3328220.0, ans=0.07 2024-08-16 09:17:58,844 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3328220.0, ans=0.2 2024-08-16 09:24:52,783 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 25 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-16 09:38:59,065 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.72 vs. limit=10.0 2024-08-16 09:44:14,504 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3328320.0, ans=0.2