2024-08-13 14:40:41,662 INFO [train_multi_KD3.py:1187] (0/4) Training started 2024-08-13 14:40:41,667 INFO [train_multi_KD3.py:1197] (0/4) Device: cuda:0 2024-08-13 14:40:41,669 INFO [train_multi_KD3.py:1212] (0/4) Using dtype=torch.bfloat16 2024-08-13 14:40:41,669 INFO [train_multi_KD3.py:1214] (0/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': 'e400fa3b456faf8afe0ee5bfe572946b4921a3db', 'k2-git-date': 'Sat Jul 15 04:21:50 2023', 'lhotse-version': '1.16.0', 'torch-version': '2.0.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.9', 'icefall-git-branch': 'multi_KD_with_wenet', 'icefall-git-sha1': 'a6c2f7a4-dirty', 'icefall-git-date': 'Thu Aug 8 16:21:21 2024', 'icefall-path': '/xy/mnt/yangxiaoyu/workspace/icefall_multi_KD', 'k2-path': '/root/anaconda3/lib/python3.9/site-packages/k2/__init__.py', 'lhotse-path': '/root/anaconda3/lib/python3.9/site-packages/lhotse/__init__.py', 'hostname': 'NGK_xiaoyu'}, 'world_size': 4, 'master_port': 13440, 'tensorboard': True, 'num_epochs': 35, 'start_epoch': 16, 'start_batch': 0, 'exp_dir': PosixPath('multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16'), 'bpe_model': 'data/lang_bpe_500/bpe.model', 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'stop_early': True, 'use_fp16': False, 'use_bf16': True, 'share_asr': True, 'beats_loss_scale': 1.0, 'ecapa_loss_scale': 10.0, 'whisper_loss_scale': 1.0, 'whisper_cb_loss_scale': 0.01, 'repeat_librispeech': 5, 'repeat_wenetspeech': 0, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': True, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'use_transducer': True, 'use_ctc': False, 'speaker_input_idx': 2, 'whisper_dim': 1280, 'use_task_id': True, 'num_codebooks': 32, 'mvq_kd_layer_idx': -1, 'use_subsampled_output': True, 'delta_t': 6, 'full_libri': True, 'mini_libri': False, 'use_libriheavy': False, 'libriheavy_subset': 'small', 'use_librispeech': True, 'use_wenetspeech': False, 'use_audioset': True, 'audioset_subset': 'unbalanced', 'use_voxceleb': True, 'voxceleb_subset': 'vox2', 'use_fma': False, 'fma_subset': 'large', 'manifest_dir': PosixPath('data/fbank_LSVoxAs_with_whisper_large-v3_with_taskID'), 'max_duration': 1500, 'bucketing_sampler': False, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 1, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'enable_musan': False, 'enable_audioset': False, 'use_musan_separately': False, 'input_strategy': 'PrecomputedFeatures', 'drop_features': False, 'return_audio': False, 'use_beats': True, 'use_ecapa': True, 'use_whisper': True, 'whisper_mvq': False, 'beats_ckpt': 'data/models/BEATs/BEATs_iter3_plus_AS2M_finetuned_on_AS2M_cpt2.pt', 'whisper_version': 'large-v3', 'use_mert': False, 'blank_id': 0, 'vocab_size': 500, 'dtype': torch.bfloat16, 'use_amp': True} 2024-08-13 14:40:41,669 INFO [train_multi_KD3.py:1216] (0/4) About to create model 2024-08-13 14:40:42,113 INFO [model_shift.py:142] (0/4) Delta_t: 6 when computing the distillation loss 2024-08-13 14:40:42,119 INFO [train_multi_KD3.py:1220] (0/4) Number of model parameters: 66484678 2024-08-13 14:40:42,750 INFO [checkpoint.py:112] (0/4) Loading checkpoint from multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/epoch-15.pt 2024-08-13 14:40:43,639 INFO [checkpoint.py:131] (0/4) Loading averaged model 2024-08-13 14:40:44,993 INFO [train_multi_KD3.py:1235] (0/4) Using DDP 2024-08-13 14:40:46,710 INFO [train_multi_KD3.py:1247] (0/4) Loading optimizer state dict 2024-08-13 14:40:47,086 INFO [train_multi_KD3.py:1255] (0/4) Loading scheduler state dict 2024-08-13 14:40:47,086 INFO [kd_datamodule.py:690] (0/4) About to get train 960 cuts 2024-08-13 14:40:47,135 INFO [train_multi_KD3.py:1306] (0/4) Getting audioset cuts 2024-08-13 14:40:47,135 INFO [kd_datamodule.py:900] (0/4) About to get the audioset cuts for KD. 2024-08-13 14:40:47,138 INFO [kd_datamodule.py:869] (0/4) About to get the voxceleb cuts. 2024-08-13 14:40:47,139 INFO [kd_datamodule.py:880] (0/4) Adding voxceleb2 cuts. 2024-08-13 14:40:47,141 INFO [train_multi_KD3.py:1320] (0/4) Using mux to combine Librispeech: True, WenetSpeech: False, audioset: True and voxceleb: True 2024-08-13 14:40:56,589 INFO [train_multi_KD3.py:1322] (0/4) Using mux to combine [CutSet(len=1406195) [underlying data type: ], CutSet(len=1904746) [underlying data type: ], CutSet(len=1187704) [underlying data type: ]] 2024-08-13 14:40:56,589 INFO [train_multi_KD3.py:1323] (0/4) Using weights: [1406195, 1904746, 1187704] 2024-08-13 14:40:56,589 INFO [train_multi_KD3.py:1332] (0/4) CutSet(len=4498645) [underlying data type: ] 2024-08-13 14:40:56,589 INFO [kd_datamodule.py:449] (0/4) Disable MUSAN 2024-08-13 14:40:56,591 INFO [kd_datamodule.py:489] (0/4) Disable SpecAugment 2024-08-13 14:40:56,591 INFO [kd_datamodule.py:491] (0/4) About to create train dataset 2024-08-13 14:40:56,592 INFO [kd_datamodule.py:528] (0/4) Using SimpleCutSampler 2024-08-13 14:40:56,592 INFO [kd_datamodule.py:536] (0/4) About to create train dataloader 2024-08-13 14:40:56,595 INFO [kd_datamodule.py:763] (0/4) About to get dev-clean cuts 2024-08-13 14:40:56,597 INFO [kd_datamodule.py:781] (0/4) About to get dev-other cuts 2024-08-13 14:40:56,598 INFO [kd_datamodule.py:570] (0/4) About to create dev dataset 2024-08-13 14:40:56,898 INFO [kd_datamodule.py:591] (0/4) About to create dev dataloader 2024-08-13 14:40:56,898 INFO [kd_datamodule.py:840] (0/4) About to get the test set of voxceleb1 set. 2024-08-13 14:40:56,899 INFO [kd_datamodule.py:570] (0/4) About to create dev dataset 2024-08-13 14:40:57,138 INFO [kd_datamodule.py:591] (0/4) About to create dev dataloader 2024-08-13 14:40:57,138 INFO [kd_datamodule.py:912] (0/4) About to get the audioset eval cuts. 2024-08-13 14:40:57,139 INFO [kd_datamodule.py:570] (0/4) About to create dev dataset 2024-08-13 14:40:57,744 INFO [kd_datamodule.py:591] (0/4) About to create dev dataloader 2024-08-13 14:40:57,744 INFO [train_multi_KD3.py:1412] (0/4) ['ASR_libri', 'SV_voxceleb1', 'AT_audioset'] 2024-08-13 14:40:57,744 INFO [train_multi_KD3.py:1416] (0/4) Loading grad scaler state dict 2024-08-13 14:41:09,430 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-13 14:41:13,688 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 0, loss[loss=0.1045, beats_loss=0.009272, ecapa_loss=0.0001665, whisper_loss=0.09354, over 22408.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.009272, ecapa_loss=0.0001665, whisper_loss=0.09354, over 22408.00 frames. ], batch size: 92, lr: 3.98e-03, grad_scale: 5.764607523034235e+17 2024-08-13 14:41:13,689 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-13 14:41:44,609 INFO [train_multi_KD3.py:1149] (0/4) Epoch 16, validation on ASR_libri: loss=0.2541, beats_loss=0, ecapa_loss=0.0005685, whisper_loss=0.2484, over 922467.00 frames. 2024-08-13 14:41:58,070 INFO [train_multi_KD3.py:1149] (0/4) Epoch 16, validation on SV_voxceleb1: loss=0.004519, beats_loss=0, ecapa_loss=0.0004519, whisper_loss=0, over 939242.00 frames. 2024-08-13 14:42:16,927 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.4865, 2.9866, 3.0745, 2.8007], device='cuda:0') 2024-08-13 14:42:42,062 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.5568, 1.8095, 1.9859, 1.9097], device='cuda:0') 2024-08-13 14:43:30,522 INFO [train_multi_KD3.py:1149] (0/4) Epoch 16, validation on AT_audioset: loss=0.02374, beats_loss=0.02374, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 14:43:30,525 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32601MB 2024-08-13 14:43:30,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2173810.0, ans=0.1 2024-08-13 14:43:31,300 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.01 vs. limit=15.0 2024-08-13 14:44:09,194 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 18 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-13 14:45:18,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2174110.0, ans=0.1 2024-08-13 14:45:25,832 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 31 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-13 14:45:28,781 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-13 14:45:59,451 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 50, loss[loss=0.1232, beats_loss=0.009746, ecapa_loss=0.0001596, whisper_loss=0.1119, over 23102.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.009706, ecapa_loss=0.0001691, whisper_loss=0.0914, over 902191.82 frames. ], batch size: 88, lr: 3.98e-03, grad_scale: 5.764607523034235e+17 2024-08-13 14:46:38,628 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.755e+01 2.624e+01 2.896e+01 3.246e+01 4.521e+01, threshold=5.792e+01, percent-clipped=0.0 2024-08-13 14:47:34,598 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 20 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-13 14:48:11,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2174610.0, ans=0.0 2024-08-13 14:49:14,390 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 100, loss[loss=0.1056, beats_loss=0.01099, ecapa_loss=0.0001116, whisper_loss=0.09353, over 17527.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01017, ecapa_loss=0.0001649, whisper_loss=0.0894, over 1583138.14 frames. ], batch size: 68, lr: 3.98e-03, grad_scale: 5.764607523034235e+17 2024-08-13 14:49:26,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2174810.0, ans=0.125 2024-08-13 14:50:07,766 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2174910.0, ans=0.1 2024-08-13 14:50:15,924 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 21 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-13 14:50:28,581 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 20 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-13 14:50:34,031 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2174910.0, ans=0.1 2024-08-13 14:52:29,790 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.08 vs. limit=15.0 2024-08-13 14:52:32,767 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 150, loss[loss=0.1006, beats_loss=0.0109, ecapa_loss=0.0001889, whisper_loss=0.08781, over 22691.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01015, ecapa_loss=0.0001645, whisper_loss=0.09053, over 2096617.56 frames. ], batch size: 97, lr: 3.98e-03, grad_scale: 5.764607523034235e+17 2024-08-13 14:52:44,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2175310.0, ans=0.125 2024-08-13 14:53:06,876 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.105e+01 2.627e+01 2.921e+01 3.180e+01 8.449e+01, threshold=5.841e+01, percent-clipped=2.0 2024-08-13 14:53:36,575 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 22 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-13 14:53:41,356 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2175510.0, ans=0.0 2024-08-13 14:54:20,989 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 25 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-13 14:54:26,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2175610.0, ans=0.0 2024-08-13 14:54:30,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2175610.0, ans=0.125 2024-08-13 14:54:33,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2175610.0, ans=0.1 2024-08-13 14:55:05,149 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 200, loss[loss=0.1219, beats_loss=0.009219, ecapa_loss=0.0001634, whisper_loss=0.1111, over 22668.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01015, ecapa_loss=0.0001645, whisper_loss=0.09156, over 2485033.81 frames. ], batch size: 87, lr: 3.98e-03, grad_scale: 5.764607523034235e+17 2024-08-13 14:55:16,250 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 20 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-13 14:55:32,218 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2175910.0, ans=0.125 2024-08-13 14:55:35,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2175910.0, ans=0.0 2024-08-13 14:55:38,099 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 22 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-13 14:55:46,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2176010.0, ans=0.0 2024-08-13 14:55:58,395 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2176110.0, ans=0.125 2024-08-13 14:56:03,510 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 16 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-13 14:56:14,394 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 26 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-13 14:56:15,895 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2176210.0, ans=0.1 2024-08-13 14:56:28,044 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2176210.0, ans=0.125 2024-08-13 14:56:29,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2176210.0, ans=0.125 2024-08-13 14:56:31,711 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2176310.0, ans=0.125 2024-08-13 14:56:32,525 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 250, loss[loss=0.1208, beats_loss=0.01005, ecapa_loss=0.0001647, whisper_loss=0.1091, over 19366.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01026, ecapa_loss=0.0001634, whisper_loss=0.09206, over 2776898.48 frames. ], batch size: 71, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 14:56:34,168 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-13 14:56:40,636 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2176310.0, ans=0.125 2024-08-13 14:56:45,276 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2176310.0, ans=0.125 2024-08-13 14:56:47,767 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.702e+01 2.294e+01 2.573e+01 2.919e+01 5.746e+01, threshold=5.146e+01, percent-clipped=0.0 2024-08-13 14:56:58,636 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2176410.0, ans=0.0 2024-08-13 14:57:09,883 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2176510.0, ans=0.125 2024-08-13 14:57:11,977 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 27 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-13 14:57:21,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2176610.0, ans=0.2 2024-08-13 14:57:32,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2176610.0, ans=0.125 2024-08-13 14:57:33,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2176610.0, ans=0.125 2024-08-13 14:57:34,917 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2176610.0, ans=0.0 2024-08-13 14:57:38,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2176710.0, ans=0.2 2024-08-13 14:57:50,660 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.60 vs. limit=15.0 2024-08-13 14:57:56,046 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 300, loss[loss=0.09773, beats_loss=0.01042, ecapa_loss=0.0001748, whisper_loss=0.08556, over 22301.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01035, ecapa_loss=0.0001639, whisper_loss=0.09138, over 3011003.15 frames. ], batch size: 91, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 14:58:05,623 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2176810.0, ans=0.0 2024-08-13 14:58:05,873 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.82 vs. limit=6.0 2024-08-13 14:58:54,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2177110.0, ans=0.125 2024-08-13 14:59:13,284 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-13 14:59:15,959 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 350, loss[loss=0.09219, beats_loss=0.01149, ecapa_loss=0.0001716, whisper_loss=0.07898, over 17334.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01056, ecapa_loss=0.0001609, whisper_loss=0.09043, over 3208707.28 frames. ], batch size: 71, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 14:59:25,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2177310.0, ans=0.2 2024-08-13 14:59:31,452 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.781e+01 2.361e+01 2.664e+01 2.951e+01 4.705e+01, threshold=5.328e+01, percent-clipped=0.0 2024-08-13 14:59:43,199 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.64 vs. limit=15.0 2024-08-13 15:00:01,705 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 19 from LS+wenet, 28 from Vox, 46 fro AS 2024-08-13 15:00:03,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2177610.0, ans=0.0 2024-08-13 15:00:22,519 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 23 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-13 15:00:30,713 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.16 vs. limit=15.0 2024-08-13 15:00:32,697 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 400, loss[loss=0.09056, beats_loss=0.01195, ecapa_loss=0.0001801, whisper_loss=0.0768, over 18066.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01063, ecapa_loss=0.0001611, whisper_loss=0.08966, over 3354116.98 frames. ], batch size: 76, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:00:56,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2177910.0, ans=0.0 2024-08-13 15:00:58,366 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 20 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-13 15:01:11,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2178010.0, ans=0.0 2024-08-13 15:01:13,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2178010.0, ans=0.0 2024-08-13 15:01:17,556 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2178110.0, ans=0.125 2024-08-13 15:01:26,503 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-13 15:01:39,685 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2178210.0, ans=0.125 2024-08-13 15:01:39,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2178210.0, ans=0.0 2024-08-13 15:01:42,275 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2178210.0, ans=0.125 2024-08-13 15:01:47,807 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 450, loss[loss=0.114, beats_loss=0.008926, ecapa_loss=0.000157, whisper_loss=0.1035, over 18592.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01062, ecapa_loss=0.0001614, whisper_loss=0.09063, over 3462100.05 frames. ], batch size: 68, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:02:02,098 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.915e+01 2.405e+01 2.560e+01 2.967e+01 1.017e+02, threshold=5.120e+01, percent-clipped=1.0 2024-08-13 15:02:08,958 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 18 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-13 15:02:11,097 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2024-08-13 15:02:16,143 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 23 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-13 15:02:34,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2178610.0, ans=0.1 2024-08-13 15:02:41,363 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-13 15:02:56,767 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 12 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-13 15:02:59,397 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2178810.0, ans=0.125 2024-08-13 15:03:00,370 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 500, loss[loss=0.1116, beats_loss=0.009262, ecapa_loss=0.0001329, whisper_loss=0.101, over 19707.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01067, ecapa_loss=0.0001607, whisper_loss=0.0901, over 3534361.21 frames. ], batch size: 73, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:03:17,437 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2178910.0, ans=0.1 2024-08-13 15:03:25,747 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 22 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-13 15:03:36,665 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 27 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-13 15:04:09,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2179210.0, ans=0.125 2024-08-13 15:04:09,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2179210.0, ans=0.125 2024-08-13 15:04:14,720 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 550, loss[loss=0.1149, beats_loss=0.01009, ecapa_loss=0.0001984, whisper_loss=0.1028, over 14709.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01069, ecapa_loss=0.0001607, whisper_loss=0.09007, over 3583552.13 frames. ], batch size: 60, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:04:20,684 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-13 15:04:29,273 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.286e+01 2.542e+01 2.908e+01 4.014e+01, threshold=5.083e+01, percent-clipped=0.0 2024-08-13 15:04:29,548 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2179410.0, ans=0.125 2024-08-13 15:04:29,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2179410.0, ans=0.1 2024-08-13 15:04:34,429 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.03 vs. limit=12.0 2024-08-13 15:04:41,848 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2179410.0, ans=0.125 2024-08-13 15:04:56,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2179510.0, ans=0.125 2024-08-13 15:04:57,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2179610.0, ans=0.2 2024-08-13 15:05:04,459 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.704e-02 2024-08-13 15:05:04,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2179610.0, ans=0.0 2024-08-13 15:05:08,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2179610.0, ans=0.0 2024-08-13 15:05:12,944 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.38 vs. limit=15.0 2024-08-13 15:05:15,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2179710.0, ans=0.125 2024-08-13 15:05:28,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2179810.0, ans=0.125 2024-08-13 15:05:29,026 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 600, loss[loss=0.1059, beats_loss=0.01114, ecapa_loss=0.0001597, whisper_loss=0.0932, over 18325.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01075, ecapa_loss=0.0001603, whisper_loss=0.09011, over 3665373.79 frames. ], batch size: 73, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:05:43,516 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.68 vs. limit=15.0 2024-08-13 15:06:08,581 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2180010.0, ans=0.2 2024-08-13 15:06:21,240 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 23 from LS+wenet, 18 from Vox, 17 fro AS 2024-08-13 15:06:26,932 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-13 15:06:31,097 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2180210.0, ans=0.125 2024-08-13 15:06:39,967 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.08 vs. limit=15.0 2024-08-13 15:06:40,929 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 650, loss[loss=0.088, beats_loss=0.01342, ecapa_loss=0.0001722, whisper_loss=0.07286, over 20793.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01064, ecapa_loss=0.0001615, whisper_loss=0.0904, over 3669429.64 frames. ], batch size: 91, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:06:42,229 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 18 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-13 15:06:48,451 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 25 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-13 15:06:53,134 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2180310.0, ans=0.125 2024-08-13 15:06:55,575 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.927e+01 2.462e+01 2.734e+01 3.167e+01 1.676e+02, threshold=5.468e+01, percent-clipped=3.0 2024-08-13 15:06:57,655 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 14 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-13 15:07:10,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2180510.0, ans=0.125 2024-08-13 15:07:32,704 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 26 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-13 15:07:39,178 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2180710.0, ans=0.0 2024-08-13 15:07:41,748 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 31 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-13 15:07:47,275 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2180710.0, ans=0.0 2024-08-13 15:07:54,010 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 700, loss[loss=0.07713, beats_loss=0.01044, ecapa_loss=0.0001754, whisper_loss=0.06494, over 16569.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01069, ecapa_loss=0.0001619, whisper_loss=0.09029, over 3686598.42 frames. ], batch size: 67, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:08:21,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2181010.0, ans=0.125 2024-08-13 15:08:40,748 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.33 vs. limit=15.0 2024-08-13 15:08:59,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2181210.0, ans=0.125 2024-08-13 15:09:06,667 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 750, loss[loss=0.09579, beats_loss=0.0116, ecapa_loss=0.0001671, whisper_loss=0.08252, over 17834.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01071, ecapa_loss=0.0001607, whisper_loss=0.09041, over 3739754.93 frames. ], batch size: 71, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:09:06,822 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-13 15:09:21,591 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.329e+01 2.542e+01 2.977e+01 1.085e+02, threshold=5.083e+01, percent-clipped=1.0 2024-08-13 15:09:24,096 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 12 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-13 15:09:38,971 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2181510.0, ans=0.125 2024-08-13 15:09:42,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2181510.0, ans=0.05 2024-08-13 15:09:50,958 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.24 vs. limit=15.0 2024-08-13 15:09:53,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2181610.0, ans=0.1 2024-08-13 15:09:59,861 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.00 vs. limit=22.5 2024-08-13 15:10:01,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2181610.0, ans=0.0 2024-08-13 15:10:18,928 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 800, loss[loss=0.09511, beats_loss=0.01039, ecapa_loss=0.0001731, whisper_loss=0.08299, over 19424.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01068, ecapa_loss=0.0001606, whisper_loss=0.0906, over 3779489.96 frames. ], batch size: 79, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:10:19,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2181810.0, ans=0.2 2024-08-13 15:10:35,762 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.18 vs. limit=15.0 2024-08-13 15:10:42,204 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 19 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-13 15:10:42,524 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2181910.0, ans=0.0 2024-08-13 15:10:43,420 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-13 15:10:45,126 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2181910.0, ans=0.1 2024-08-13 15:10:52,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2182010.0, ans=0.125 2024-08-13 15:10:54,515 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.16 vs. limit=15.0 2024-08-13 15:10:58,927 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.22 vs. limit=15.0 2024-08-13 15:11:22,300 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 17 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-13 15:11:25,823 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2182210.0, ans=0.0 2024-08-13 15:11:32,346 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 850, loss[loss=0.1131, beats_loss=0.01039, ecapa_loss=0.000118, whisper_loss=0.1015, over 19043.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01063, ecapa_loss=0.0001609, whisper_loss=0.09056, over 3801159.02 frames. ], batch size: 69, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:11:46,252 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.901e+01 2.398e+01 2.663e+01 2.990e+01 7.176e+01, threshold=5.326e+01, percent-clipped=1.0 2024-08-13 15:11:49,674 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 20 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-13 15:11:58,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2182410.0, ans=0.125 2024-08-13 15:12:02,947 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 20 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-13 15:12:03,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2182510.0, ans=0.125 2024-08-13 15:12:05,814 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-13 15:12:16,096 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.85 vs. limit=22.5 2024-08-13 15:12:24,228 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 19 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-13 15:12:44,510 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 900, loss[loss=0.1106, beats_loss=0.01135, ecapa_loss=0.0001351, whisper_loss=0.09793, over 24371.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0107, ecapa_loss=0.0001599, whisper_loss=0.08998, over 3820928.70 frames. ], batch size: 92, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:13:18,087 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2183010.0, ans=0.07 2024-08-13 15:13:39,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2183110.0, ans=0.1 2024-08-13 15:13:45,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2183210.0, ans=0.2 2024-08-13 15:13:48,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2183210.0, ans=0.125 2024-08-13 15:13:53,790 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2183210.0, ans=0.125 2024-08-13 15:13:59,032 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 950, loss[loss=0.08981, beats_loss=0.01213, ecapa_loss=0.0001328, whisper_loss=0.07635, over 16100.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0107, ecapa_loss=0.0001594, whisper_loss=0.08944, over 3795168.35 frames. ], batch size: 62, lr: 3.97e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:14:11,383 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.85 vs. limit=6.0 2024-08-13 15:14:12,254 INFO [train_multi_KD3.py:844] (0/4) A total of 96 cuts. 27 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-13 15:14:13,554 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.758e+01 2.387e+01 2.716e+01 2.954e+01 4.081e+01, threshold=5.431e+01, percent-clipped=0.0 2024-08-13 15:14:13,679 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 17 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-13 15:14:18,470 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 23 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-13 15:14:29,432 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.63 vs. limit=15.0 2024-08-13 15:14:51,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2183610.0, ans=0.125 2024-08-13 15:14:53,180 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 10 from Vox, 39 fro AS 2024-08-13 15:14:59,010 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 15 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-13 15:15:07,101 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 16 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-13 15:15:09,201 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.07 vs. limit=15.0 2024-08-13 15:15:11,289 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.56 vs. limit=6.0 2024-08-13 15:15:14,398 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 1000, loss[loss=0.08886, beats_loss=0.01031, ecapa_loss=0.0001622, whisper_loss=0.07693, over 19067.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01074, ecapa_loss=0.0001585, whisper_loss=0.08893, over 3801216.58 frames. ], batch size: 77, lr: 3.97e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:15:43,671 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 19 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-13 15:15:53,850 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 19 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-13 15:16:03,266 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2184110.0, ans=0.125 2024-08-13 15:16:08,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2184110.0, ans=0.125 2024-08-13 15:16:10,549 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.05 vs. limit=22.5 2024-08-13 15:16:28,613 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2184310.0, ans=10.0 2024-08-13 15:16:29,351 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 1050, loss[loss=0.1305, beats_loss=0.00945, ecapa_loss=0.0001487, whisper_loss=0.1196, over 18217.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01066, ecapa_loss=0.0001577, whisper_loss=0.08948, over 3798312.64 frames. ], batch size: 69, lr: 3.97e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:16:34,079 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2184310.0, ans=0.125 2024-08-13 15:16:43,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2184410.0, ans=0.0 2024-08-13 15:16:43,889 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.952e+01 2.435e+01 2.686e+01 3.027e+01 6.105e+01, threshold=5.372e+01, percent-clipped=2.0 2024-08-13 15:16:44,829 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.01 vs. limit=15.0 2024-08-13 15:16:51,387 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 26 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-13 15:17:13,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2184610.0, ans=0.0 2024-08-13 15:17:36,020 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2184710.0, ans=0.0 2024-08-13 15:17:42,754 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 33 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-13 15:17:43,779 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 1100, loss[loss=0.1193, beats_loss=0.01037, ecapa_loss=0.0001625, whisper_loss=0.1074, over 21972.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01061, ecapa_loss=0.0001569, whisper_loss=0.09035, over 3808529.12 frames. ], batch size: 86, lr: 3.97e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:17:57,897 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-13 15:17:59,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2184910.0, ans=0.125 2024-08-13 15:18:02,310 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-13 15:18:14,504 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 23 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-13 15:18:41,254 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2185110.0, ans=0.1 2024-08-13 15:18:56,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2185210.0, ans=0.125 2024-08-13 15:19:00,547 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 1150, loss[loss=0.07914, beats_loss=0.01022, ecapa_loss=0.0001788, whisper_loss=0.06713, over 22166.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.0106, ecapa_loss=0.0001578, whisper_loss=0.0916, over 3851444.39 frames. ], batch size: 92, lr: 3.97e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:19:01,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2185310.0, ans=0.1 2024-08-13 15:19:16,614 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.961e+01 2.495e+01 2.743e+01 3.086e+01 4.866e+01, threshold=5.485e+01, percent-clipped=0.0 2024-08-13 15:19:16,823 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-13 15:19:17,380 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.15 vs. limit=6.0 2024-08-13 15:19:22,411 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 24 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-13 15:19:24,057 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2185410.0, ans=0.125 2024-08-13 15:19:48,019 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.20 vs. limit=15.0 2024-08-13 15:20:19,584 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2185610.0, ans=0.125 2024-08-13 15:20:45,422 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 1200, loss[loss=0.09806, beats_loss=0.01032, ecapa_loss=0.0002042, whisper_loss=0.08569, over 21206.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01064, ecapa_loss=0.0001585, whisper_loss=0.09119, over 3838709.82 frames. ], batch size: 90, lr: 3.97e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:21:03,545 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2185910.0, ans=0.125 2024-08-13 15:21:05,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2185910.0, ans=0.125 2024-08-13 15:21:09,987 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 20 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-13 15:21:12,230 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.45 vs. limit=15.0 2024-08-13 15:21:55,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2186210.0, ans=0.1 2024-08-13 15:21:55,523 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.59 vs. limit=6.0 2024-08-13 15:22:07,686 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 1250, loss[loss=0.1027, beats_loss=0.0101, ecapa_loss=0.0001713, whisper_loss=0.09089, over 19619.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01072, ecapa_loss=0.0001579, whisper_loss=0.08991, over 3846798.72 frames. ], batch size: 79, lr: 3.97e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:22:14,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2186310.0, ans=0.0 2024-08-13 15:22:22,789 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.801e+01 2.204e+01 2.472e+01 2.749e+01 3.995e+01, threshold=4.944e+01, percent-clipped=0.0 2024-08-13 15:22:29,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2186410.0, ans=0.1 2024-08-13 15:22:34,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2186410.0, ans=0.1 2024-08-13 15:22:35,456 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 22 from LS+wenet, 17 from Vox, 53 fro AS 2024-08-13 15:22:44,638 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 30 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-13 15:22:51,115 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 25 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-13 15:22:57,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2186610.0, ans=0.2 2024-08-13 15:23:25,583 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 1300, loss[loss=0.1234, beats_loss=0.009046, ecapa_loss=0.0001475, whisper_loss=0.1129, over 21853.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01074, ecapa_loss=0.000158, whisper_loss=0.09005, over 3857934.12 frames. ], batch size: 80, lr: 3.97e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:23:26,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2186810.0, ans=0.2 2024-08-13 15:23:27,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2186810.0, ans=0.0 2024-08-13 15:23:34,327 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 21 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-13 15:23:36,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2186810.0, ans=0.125 2024-08-13 15:23:53,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2186910.0, ans=0.1 2024-08-13 15:24:02,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2187010.0, ans=0.125 2024-08-13 15:24:15,204 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2187110.0, ans=0.125 2024-08-13 15:24:28,413 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 20 from LS+wenet, 13 from Vox, 40 fro AS 2024-08-13 15:24:30,502 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.78 vs. limit=22.5 2024-08-13 15:24:41,946 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 1350, loss[loss=0.1044, beats_loss=0.01215, ecapa_loss=0.0001556, whisper_loss=0.09074, over 22150.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01072, ecapa_loss=0.0001578, whisper_loss=0.09084, over 3856586.99 frames. ], batch size: 89, lr: 3.97e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:25:00,079 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.385e+01 2.728e+01 3.101e+01 1.009e+02, threshold=5.456e+01, percent-clipped=3.0 2024-08-13 15:25:03,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2187410.0, ans=0.1 2024-08-13 15:25:17,631 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2187510.0, ans=0.2 2024-08-13 15:25:23,266 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2187510.0, ans=0.125 2024-08-13 15:25:28,328 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.218e+00 2024-08-13 15:25:57,899 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 1400, loss[loss=0.1029, beats_loss=0.0123, ecapa_loss=0.0001634, whisper_loss=0.08898, over 21842.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01075, ecapa_loss=0.000158, whisper_loss=0.09033, over 3857177.01 frames. ], batch size: 90, lr: 3.97e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:26:11,579 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.92 vs. limit=22.5 2024-08-13 15:26:17,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2187910.0, ans=0.0 2024-08-13 15:26:17,569 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.62 vs. limit=6.0 2024-08-13 15:26:30,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2188010.0, ans=0.2 2024-08-13 15:26:52,267 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2188110.0, ans=0.125 2024-08-13 15:26:52,516 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.39 vs. limit=15.0 2024-08-13 15:27:23,878 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 1450, loss[loss=0.1017, beats_loss=0.01124, ecapa_loss=0.0001314, whisper_loss=0.08915, over 21007.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01081, ecapa_loss=0.0001568, whisper_loss=0.08975, over 3843220.65 frames. ], batch size: 81, lr: 3.97e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:27:31,051 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-13 15:27:31,496 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.67 vs. limit=15.0 2024-08-13 15:27:40,709 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.749e+01 2.325e+01 2.552e+01 2.880e+01 5.017e+01, threshold=5.104e+01, percent-clipped=1.0 2024-08-13 15:28:00,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=2188510.0, ans=15.0 2024-08-13 15:28:01,476 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 21 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-13 15:28:25,050 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-13 15:28:39,978 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.90 vs. limit=22.5 2024-08-13 15:28:43,955 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 1500, loss[loss=0.123, beats_loss=0.00851, ecapa_loss=0.0001322, whisper_loss=0.1132, over 19495.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01081, ecapa_loss=0.0001562, whisper_loss=0.08952, over 3836614.72 frames. ], batch size: 70, lr: 3.97e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:28:54,052 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-13 15:29:12,001 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2188910.0, ans=0.125 2024-08-13 15:29:14,361 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 25 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-13 15:29:15,271 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.25 vs. limit=22.5 2024-08-13 15:29:17,500 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 18 from LS+wenet, 32 from Vox, 43 fro AS 2024-08-13 15:29:43,080 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 33 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-13 15:29:54,839 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 20 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-13 15:30:01,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2189210.0, ans=0.1 2024-08-13 15:30:04,737 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 1550, loss[loss=0.1012, beats_loss=0.01067, ecapa_loss=0.0001629, whisper_loss=0.08886, over 15334.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01076, ecapa_loss=0.0001567, whisper_loss=0.0895, over 3817785.80 frames. ], batch size: 61, lr: 3.97e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:30:23,702 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.750e+01 2.248e+01 2.490e+01 2.864e+01 4.046e+01, threshold=4.981e+01, percent-clipped=0.0 2024-08-13 15:30:59,895 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=2189610.0, ans=0.2 2024-08-13 15:31:17,109 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2189710.0, ans=0.1 2024-08-13 15:31:19,091 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.61 vs. limit=12.0 2024-08-13 15:31:20,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2189710.0, ans=0.125 2024-08-13 15:31:24,372 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-13 15:31:26,168 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=9.163e-02 2024-08-13 15:31:26,985 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 1600, loss[loss=0.06494, beats_loss=0.01376, ecapa_loss=0.0001364, whisper_loss=0.04982, over 18343.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01079, ecapa_loss=0.0001565, whisper_loss=0.08964, over 3848331.29 frames. ], batch size: 77, lr: 3.97e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:32:46,621 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 1650, loss[loss=0.1152, beats_loss=0.01321, ecapa_loss=0.0001213, whisper_loss=0.1008, over 23854.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01072, ecapa_loss=0.000157, whisper_loss=0.09028, over 3852465.63 frames. ], batch size: 91, lr: 3.97e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:33:03,885 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.370e+01 2.654e+01 3.120e+01 7.882e+01, threshold=5.308e+01, percent-clipped=3.0 2024-08-13 15:33:14,266 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2190410.0, ans=0.09899494936611666 2024-08-13 15:33:20,588 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2190510.0, ans=0.125 2024-08-13 15:33:21,869 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 14 from LS+wenet, 29 from Vox, 27 fro AS 2024-08-13 15:33:39,646 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 28 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-13 15:33:42,515 INFO [train_multi_KD3.py:844] (0/4) A total of 96 cuts. 26 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-13 15:33:50,074 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 23 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-13 15:33:53,417 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2190710.0, ans=0.0 2024-08-13 15:34:04,930 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 1700, loss[loss=0.1004, beats_loss=0.008912, ecapa_loss=0.0001727, whisper_loss=0.0898, over 16731.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01073, ecapa_loss=0.0001569, whisper_loss=0.08992, over 3856346.11 frames. ], batch size: 66, lr: 3.97e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:34:05,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2190810.0, ans=0.0 2024-08-13 15:34:06,671 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2190810.0, ans=0.5 2024-08-13 15:34:18,417 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 25 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-13 15:34:27,603 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 21 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-13 15:34:30,725 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 26 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-13 15:34:32,429 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2190910.0, ans=0.0 2024-08-13 15:34:37,802 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=8.337e-01 2024-08-13 15:34:39,461 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.86 vs. limit=12.0 2024-08-13 15:34:45,929 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 16 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-13 15:35:07,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2191210.0, ans=0.0 2024-08-13 15:35:10,775 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 19 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-13 15:35:11,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2191210.0, ans=0.0 2024-08-13 15:35:18,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2191210.0, ans=0.125 2024-08-13 15:35:21,437 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 1750, loss[loss=0.09624, beats_loss=0.009979, ecapa_loss=0.0001641, whisper_loss=0.08462, over 17154.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01074, ecapa_loss=0.0001574, whisper_loss=0.08999, over 3836305.29 frames. ], batch size: 70, lr: 3.97e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:35:24,660 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 21 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-13 15:35:28,578 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 26 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-13 15:35:37,236 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.731e+01 2.448e+01 2.728e+01 3.089e+01 6.360e+01, threshold=5.456e+01, percent-clipped=3.0 2024-08-13 15:35:56,806 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 24 from LS+wenet, 5 from Vox, 27 fro AS 2024-08-13 15:36:08,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2191610.0, ans=0.0 2024-08-13 15:36:15,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2191610.0, ans=0.125 2024-08-13 15:36:16,651 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-13 15:36:35,833 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 1800, loss[loss=0.1002, beats_loss=0.009668, ecapa_loss=0.0001603, whisper_loss=0.08894, over 17671.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01065, ecapa_loss=0.0001573, whisper_loss=0.09035, over 3835706.19 frames. ], batch size: 69, lr: 3.97e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:36:38,769 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 27 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-13 15:36:40,820 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2191810.0, ans=0.2 2024-08-13 15:36:57,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2191910.0, ans=0.125 2024-08-13 15:37:02,452 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.034e-01 2024-08-13 15:37:03,112 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.42 vs. limit=5.0 2024-08-13 15:37:10,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2192010.0, ans=0.125 2024-08-13 15:37:11,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2192010.0, ans=0.0 2024-08-13 15:37:15,453 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 19 from LS+wenet, 24 from Vox, 20 fro AS 2024-08-13 15:37:17,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2192010.0, ans=0.0 2024-08-13 15:37:46,044 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2192210.0, ans=0.0 2024-08-13 15:37:50,779 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 1850, loss[loss=0.09664, beats_loss=0.01303, ecapa_loss=0.0001446, whisper_loss=0.08216, over 19556.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01065, ecapa_loss=0.0001581, whisper_loss=0.09019, over 3833309.16 frames. ], batch size: 78, lr: 3.97e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:37:58,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2192310.0, ans=0.09899494936611666 2024-08-13 15:38:06,967 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.120e+01 2.440e+01 2.626e+01 2.890e+01 6.922e+01, threshold=5.252e+01, percent-clipped=1.0 2024-08-13 15:38:08,651 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 19 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-13 15:38:12,631 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 24 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-13 15:38:17,454 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-13 15:38:23,079 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2192510.0, ans=0.1 2024-08-13 15:38:38,473 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 23 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-13 15:39:03,419 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 1900, loss[loss=0.09206, beats_loss=0.01246, ecapa_loss=0.0001559, whisper_loss=0.07804, over 22326.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01065, ecapa_loss=0.0001587, whisper_loss=0.09, over 3820275.48 frames. ], batch size: 90, lr: 3.97e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:39:14,397 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2192810.0, ans=0.125 2024-08-13 15:39:34,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2193010.0, ans=0.125 2024-08-13 15:40:02,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2193210.0, ans=0.125 2024-08-13 15:40:03,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2193210.0, ans=0.125 2024-08-13 15:40:09,760 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 33 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-13 15:40:18,155 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 1950, loss[loss=0.1073, beats_loss=0.008554, ecapa_loss=0.0002045, whisper_loss=0.09667, over 16940.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01066, ecapa_loss=0.0001594, whisper_loss=0.09024, over 3854729.65 frames. ], batch size: 68, lr: 3.97e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:40:34,167 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.351e+01 2.582e+01 2.888e+01 8.249e+01, threshold=5.164e+01, percent-clipped=1.0 2024-08-13 15:40:48,144 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 24 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-13 15:40:49,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2193510.0, ans=0.125 2024-08-13 15:40:51,171 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 27 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-13 15:41:06,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2193610.0, ans=0.125 2024-08-13 15:41:09,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2193610.0, ans=0.04949747468305833 2024-08-13 15:41:27,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2193710.0, ans=0.125 2024-08-13 15:41:33,084 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 2000, loss[loss=0.1026, beats_loss=0.0121, ecapa_loss=0.0001835, whisper_loss=0.08862, over 21338.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01074, ecapa_loss=0.0001592, whisper_loss=0.08992, over 3866059.19 frames. ], batch size: 84, lr: 3.97e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:41:48,816 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.39 vs. limit=15.0 2024-08-13 15:41:51,040 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=4.227e-01 2024-08-13 15:41:58,830 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2193910.0, ans=0.1 2024-08-13 15:42:12,231 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 19 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-13 15:42:16,031 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.37 vs. limit=22.5 2024-08-13 15:42:27,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2194110.0, ans=0.1 2024-08-13 15:42:31,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2194210.0, ans=0.0 2024-08-13 15:42:32,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2194210.0, ans=0.0 2024-08-13 15:42:36,804 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 27 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-13 15:42:41,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2194210.0, ans=0.125 2024-08-13 15:42:45,903 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2194210.0, ans=0.09899494936611666 2024-08-13 15:42:46,935 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2194310.0, ans=0.1 2024-08-13 15:42:47,760 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 2050, loss[loss=0.09921, beats_loss=0.011, ecapa_loss=0.0001789, whisper_loss=0.08642, over 21646.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01071, ecapa_loss=0.0001591, whisper_loss=0.0903, over 3861235.83 frames. ], batch size: 91, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:42:49,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2194310.0, ans=0.125 2024-08-13 15:42:56,942 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-13 15:42:57,137 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2194310.0, ans=0.125 2024-08-13 15:43:01,135 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 24 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-13 15:43:03,942 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.358e+01 2.622e+01 3.012e+01 4.492e+01, threshold=5.245e+01, percent-clipped=0.0 2024-08-13 15:43:12,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2194410.0, ans=0.125 2024-08-13 15:43:13,914 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2194410.0, ans=0.07 2024-08-13 15:43:18,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2194510.0, ans=0.125 2024-08-13 15:43:23,744 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 15 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-13 15:43:37,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2194610.0, ans=0.125 2024-08-13 15:43:45,607 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 24 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-13 15:43:47,116 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-13 15:44:02,114 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 2100, loss[loss=0.1122, beats_loss=0.01236, ecapa_loss=0.0001755, whisper_loss=0.09812, over 17606.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01079, ecapa_loss=0.0001584, whisper_loss=0.09045, over 3862556.12 frames. ], batch size: 71, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:44:21,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2194910.0, ans=0.125 2024-08-13 15:44:22,195 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-13 15:44:37,625 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 20 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-13 15:44:47,327 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2195110.0, ans=0.125 2024-08-13 15:45:14,436 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 2150, loss[loss=0.08894, beats_loss=0.01382, ecapa_loss=8.195e-05, whisper_loss=0.0743, over 18145.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01086, ecapa_loss=0.0001581, whisper_loss=0.09014, over 3850143.33 frames. ], batch size: 66, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:45:29,271 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 27 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-13 15:45:30,868 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.824e+01 2.427e+01 2.711e+01 3.071e+01 5.101e+01, threshold=5.422e+01, percent-clipped=0.0 2024-08-13 15:45:31,054 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 37 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-13 15:45:48,127 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.42 vs. limit=8.0 2024-08-13 15:45:59,082 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 21 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-13 15:46:02,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2195610.0, ans=0.0 2024-08-13 15:46:09,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2195610.0, ans=0.1 2024-08-13 15:46:16,300 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2195710.0, ans=0.125 2024-08-13 15:46:23,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2195710.0, ans=0.125 2024-08-13 15:46:29,485 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 2200, loss[loss=0.09184, beats_loss=0.01057, ecapa_loss=0.0001547, whisper_loss=0.07972, over 13486.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01088, ecapa_loss=0.0001571, whisper_loss=0.0904, over 3804312.46 frames. ], batch size: 53, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:46:29,674 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 27 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-13 15:46:31,538 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2195810.0, ans=0.1 2024-08-13 15:46:32,627 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-13 15:46:42,266 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 24 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-13 15:46:48,982 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 17 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-13 15:47:05,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2196010.0, ans=0.125 2024-08-13 15:47:24,116 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.79 vs. limit=22.5 2024-08-13 15:47:26,152 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 21 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-13 15:47:26,588 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.30 vs. limit=12.0 2024-08-13 15:47:30,741 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 19 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-13 15:47:45,301 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 2250, loss[loss=0.09676, beats_loss=0.01003, ecapa_loss=0.0001769, whisper_loss=0.08496, over 20844.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0109, ecapa_loss=0.0001591, whisper_loss=0.09012, over 3797608.68 frames. ], batch size: 88, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:47:50,596 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.47 vs. limit=22.5 2024-08-13 15:47:57,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2196310.0, ans=0.0 2024-08-13 15:48:01,998 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.872e+01 2.332e+01 2.611e+01 2.967e+01 5.729e+01, threshold=5.223e+01, percent-clipped=1.0 2024-08-13 15:48:41,837 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 21 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-13 15:48:43,312 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 15 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-13 15:49:00,485 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 2300, loss[loss=0.1039, beats_loss=0.01016, ecapa_loss=0.0001455, whisper_loss=0.09229, over 23460.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01092, ecapa_loss=0.0001588, whisper_loss=0.09052, over 3848001.64 frames. ], batch size: 92, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:49:03,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2196810.0, ans=0.0 2024-08-13 15:49:05,268 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 25 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-13 15:49:12,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2196810.0, ans=0.0 2024-08-13 15:49:38,302 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 32 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-13 15:49:48,287 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 21 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-13 15:49:56,072 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.71 vs. limit=10.0 2024-08-13 15:49:59,379 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 23 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-13 15:50:07,399 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-13 15:50:14,894 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 2350, loss[loss=0.1077, beats_loss=0.01135, ecapa_loss=0.0001854, whisper_loss=0.09447, over 22572.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01079, ecapa_loss=0.00016, whisper_loss=0.09144, over 3856053.05 frames. ], batch size: 92, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:50:16,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2197310.0, ans=0.125 2024-08-13 15:50:22,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2197310.0, ans=0.125 2024-08-13 15:50:31,450 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.996e+01 2.477e+01 2.777e+01 3.066e+01 6.337e+01, threshold=5.554e+01, percent-clipped=1.0 2024-08-13 15:50:31,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2197410.0, ans=0.1 2024-08-13 15:50:44,342 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.29 vs. limit=6.0 2024-08-13 15:51:15,867 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.07 vs. limit=22.5 2024-08-13 15:51:25,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2197710.0, ans=0.125 2024-08-13 15:51:27,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2197710.0, ans=0.125 2024-08-13 15:51:30,151 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 2400, loss[loss=0.09941, beats_loss=0.008749, ecapa_loss=0.0001701, whisper_loss=0.08896, over 17351.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01074, ecapa_loss=0.0001602, whisper_loss=0.09166, over 3882404.24 frames. ], batch size: 65, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:51:40,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2197810.0, ans=0.125 2024-08-13 15:52:08,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2198010.0, ans=0.125 2024-08-13 15:52:09,809 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.45 vs. limit=10.0 2024-08-13 15:52:29,425 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 23 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-13 15:52:39,849 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 15:52:41,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2198310.0, ans=0.125 2024-08-13 15:52:42,503 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 2450, loss[loss=0.1066, beats_loss=0.01102, ecapa_loss=0.0001518, whisper_loss=0.09401, over 22343.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01071, ecapa_loss=0.000161, whisper_loss=0.09141, over 3866952.47 frames. ], batch size: 90, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:52:47,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2198310.0, ans=0.0 2024-08-13 15:52:48,854 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 16 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-13 15:52:54,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2198310.0, ans=0.0 2024-08-13 15:52:58,744 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.941e+01 2.483e+01 2.773e+01 3.111e+01 4.520e+01, threshold=5.546e+01, percent-clipped=0.0 2024-08-13 15:53:00,250 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-13 15:53:05,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2198410.0, ans=0.0 2024-08-13 15:53:06,946 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 23 from LS+wenet, 24 from Vox, 20 fro AS 2024-08-13 15:53:14,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2198510.0, ans=0.125 2024-08-13 15:53:18,300 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2198510.0, ans=0.125 2024-08-13 15:53:24,247 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2198610.0, ans=0.05 2024-08-13 15:53:36,805 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.51 vs. limit=22.5 2024-08-13 15:53:38,184 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2198710.0, ans=0.1 2024-08-13 15:53:53,732 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 2500, loss[loss=0.1074, beats_loss=0.01173, ecapa_loss=0.0001952, whisper_loss=0.09374, over 22448.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01076, ecapa_loss=0.0001601, whisper_loss=0.0914, over 3859245.64 frames. ], batch size: 93, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:53:56,532 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 19 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-13 15:54:02,732 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2198810.0, ans=0.0 2024-08-13 15:54:06,904 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 28 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-13 15:54:08,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2198910.0, ans=0.0 2024-08-13 15:54:57,655 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2199210.0, ans=0.125 2024-08-13 15:55:01,383 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2199210.0, ans=0.125 2024-08-13 15:55:06,669 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 2550, loss[loss=0.1142, beats_loss=0.009474, ecapa_loss=0.0001655, whisper_loss=0.1031, over 21760.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01069, ecapa_loss=0.0001603, whisper_loss=0.09162, over 3877529.24 frames. ], batch size: 86, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:55:14,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2199310.0, ans=0.0 2024-08-13 15:55:16,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2199310.0, ans=0.125 2024-08-13 15:55:19,598 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.73 vs. limit=15.0 2024-08-13 15:55:21,479 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.351e+01 2.676e+01 3.107e+01 6.569e+01, threshold=5.353e+01, percent-clipped=1.0 2024-08-13 15:55:38,053 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 19 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-13 15:56:01,653 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 15 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-13 15:56:01,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2199610.0, ans=0.125 2024-08-13 15:56:01,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2199610.0, ans=0.125 2024-08-13 15:56:17,022 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2199810.0, ans=0.0 2024-08-13 15:56:17,884 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 2600, loss[loss=0.114, beats_loss=0.008819, ecapa_loss=0.0001675, whisper_loss=0.1036, over 15831.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01068, ecapa_loss=0.000161, whisper_loss=0.09177, over 3876769.99 frames. ], batch size: 60, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:56:21,011 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-13 15:56:28,788 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 22 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-13 15:56:29,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2199810.0, ans=0.2 2024-08-13 15:56:37,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2199910.0, ans=0.125 2024-08-13 15:56:43,987 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-220000.pt 2024-08-13 15:56:49,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2200010.0, ans=0.125 2024-08-13 15:56:59,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2200010.0, ans=0.1 2024-08-13 15:57:04,723 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-13 15:57:06,471 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 30 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-13 15:57:32,071 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2200310.0, ans=0.1 2024-08-13 15:57:33,071 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 2650, loss[loss=0.08502, beats_loss=0.01404, ecapa_loss=0.0001103, whisper_loss=0.06987, over 17858.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01067, ecapa_loss=0.0001607, whisper_loss=0.09111, over 3836907.76 frames. ], batch size: 70, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:57:35,983 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 20 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-13 15:57:43,969 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 22 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-13 15:57:44,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2200310.0, ans=0.0 2024-08-13 15:57:49,548 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.816e+01 2.280e+01 2.561e+01 2.894e+01 4.049e+01, threshold=5.122e+01, percent-clipped=0.0 2024-08-13 15:57:54,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2200410.0, ans=0.0 2024-08-13 15:58:10,279 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 26 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-13 15:58:14,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2200510.0, ans=0.2 2024-08-13 15:58:44,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2200710.0, ans=0.015 2024-08-13 15:58:52,135 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 2700, loss[loss=0.1158, beats_loss=0.007771, ecapa_loss=0.0001625, whisper_loss=0.1064, over 19139.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0108, ecapa_loss=0.0001599, whisper_loss=0.09014, over 3842524.90 frames. ], batch size: 75, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:58:52,790 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2200810.0, ans=0.2 2024-08-13 15:58:58,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2200810.0, ans=0.125 2024-08-13 15:59:03,953 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-13 15:59:05,505 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-13 15:59:07,289 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 19 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-13 15:59:24,613 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2200910.0, ans=0.2 2024-08-13 15:59:30,008 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 22 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-13 16:00:00,158 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2201210.0, ans=0.125 2024-08-13 16:00:04,894 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=2201210.0, ans=6.0 2024-08-13 16:00:17,254 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 2750, loss[loss=0.09362, beats_loss=0.01261, ecapa_loss=0.0001478, whisper_loss=0.07954, over 21564.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01077, ecapa_loss=0.0001606, whisper_loss=0.09083, over 3859295.20 frames. ], batch size: 89, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 16:00:34,192 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.818e+01 2.357e+01 2.643e+01 3.055e+01 4.900e+01, threshold=5.285e+01, percent-clipped=0.0 2024-08-13 16:00:56,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2201510.0, ans=0.1 2024-08-13 16:00:56,218 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2201510.0, ans=0.125 2024-08-13 16:01:02,343 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2201610.0, ans=0.0 2024-08-13 16:01:04,061 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.49 vs. limit=22.5 2024-08-13 16:01:07,681 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.93 vs. limit=15.0 2024-08-13 16:01:09,207 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.98 vs. limit=22.5 2024-08-13 16:01:27,021 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 27 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-13 16:01:33,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2201810.0, ans=0.125 2024-08-13 16:01:33,831 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 2800, loss[loss=0.07897, beats_loss=0.0137, ecapa_loss=0.0001248, whisper_loss=0.06402, over 16728.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01075, ecapa_loss=0.0001608, whisper_loss=0.09133, over 3851113.39 frames. ], batch size: 64, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 16:01:48,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2201910.0, ans=0.2 2024-08-13 16:01:54,134 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 19 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-13 16:02:09,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2202010.0, ans=0.2 2024-08-13 16:02:10,793 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-13 16:02:33,836 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-13 16:02:50,897 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 2850, loss[loss=0.1119, beats_loss=0.009337, ecapa_loss=0.0001783, whisper_loss=0.1008, over 23005.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0108, ecapa_loss=0.0001591, whisper_loss=0.09121, over 3853310.81 frames. ], batch size: 93, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 16:03:08,649 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.979e+01 2.307e+01 2.674e+01 3.004e+01 5.549e+01, threshold=5.349e+01, percent-clipped=1.0 2024-08-13 16:03:18,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2202410.0, ans=0.1 2024-08-13 16:03:20,513 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 16:03:24,485 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 22 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-13 16:03:40,657 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 25 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-13 16:03:42,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2202610.0, ans=0.0 2024-08-13 16:03:44,397 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2202610.0, ans=0.2 2024-08-13 16:04:10,244 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 2900, loss[loss=0.07335, beats_loss=0.01411, ecapa_loss=0.0001371, whisper_loss=0.05787, over 13860.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01083, ecapa_loss=0.0001614, whisper_loss=0.09071, over 3842199.52 frames. ], batch size: 55, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 16:04:32,995 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.93 vs. limit=12.0 2024-08-13 16:04:33,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2202910.0, ans=0.125 2024-08-13 16:04:46,853 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2203010.0, ans=0.0 2024-08-13 16:04:51,671 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.75 vs. limit=10.0 2024-08-13 16:04:58,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2203110.0, ans=0.125 2024-08-13 16:05:02,812 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2203110.0, ans=0.125 2024-08-13 16:05:04,590 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.83 vs. limit=10.0 2024-08-13 16:05:08,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2203210.0, ans=0.0 2024-08-13 16:05:12,076 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2203210.0, ans=0.125 2024-08-13 16:05:19,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2203210.0, ans=0.125 2024-08-13 16:05:19,926 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2203210.0, ans=0.125 2024-08-13 16:05:21,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2203210.0, ans=0.0 2024-08-13 16:05:21,978 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 23 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-13 16:05:23,051 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 2950, loss[loss=0.09922, beats_loss=0.01195, ecapa_loss=0.0001429, whisper_loss=0.08584, over 18842.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01085, ecapa_loss=0.0001609, whisper_loss=0.09056, over 3858123.42 frames. ], batch size: 77, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 16:05:38,911 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.996e+01 2.341e+01 2.613e+01 3.038e+01 5.265e+01, threshold=5.226e+01, percent-clipped=0.0 2024-08-13 16:05:40,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2203410.0, ans=0.125 2024-08-13 16:06:20,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2203710.0, ans=0.1 2024-08-13 16:06:22,817 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 21 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-13 16:06:25,351 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-13 16:06:29,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2203710.0, ans=0.0 2024-08-13 16:06:29,622 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=4.862e+00 2024-08-13 16:06:32,161 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 3000, loss[loss=0.1009, beats_loss=0.008377, ecapa_loss=0.000239, whisper_loss=0.09014, over 17303.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01082, ecapa_loss=0.0001613, whisper_loss=0.09058, over 3873234.01 frames. ], batch size: 77, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 16:06:32,162 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-13 16:07:12,407 INFO [train_multi_KD3.py:1149] (0/4) Epoch 16, validation on ASR_libri: loss=0.253, beats_loss=0, ecapa_loss=0.0005592, whisper_loss=0.2474, over 922467.00 frames. 2024-08-13 16:07:30,379 INFO [train_multi_KD3.py:1149] (0/4) Epoch 16, validation on SV_voxceleb1: loss=0.004334, beats_loss=0, ecapa_loss=0.0004334, whisper_loss=0, over 939242.00 frames. 2024-08-13 16:09:55,735 INFO [train_multi_KD3.py:1149] (0/4) Epoch 16, validation on AT_audioset: loss=0.02373, beats_loss=0.02373, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 16:09:55,740 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32601MB 2024-08-13 16:09:57,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2203810.0, ans=0.125 2024-08-13 16:10:04,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2203810.0, ans=0.0 2024-08-13 16:10:10,250 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=12.98 vs. limit=12.0 2024-08-13 16:10:26,588 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 27 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-13 16:10:26,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2203910.0, ans=0.0 2024-08-13 16:10:32,237 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 22 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-13 16:10:36,254 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.24 vs. limit=15.0 2024-08-13 16:11:23,894 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 36 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-13 16:11:25,783 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 22 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-13 16:11:27,480 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 3050, loss[loss=0.1119, beats_loss=0.009264, ecapa_loss=0.0001639, whisper_loss=0.101, over 17523.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01086, ecapa_loss=0.0001618, whisper_loss=0.09087, over 3906787.26 frames. ], batch size: 67, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 16:11:29,350 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2204310.0, ans=0.2 2024-08-13 16:11:29,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2204310.0, ans=0.5 2024-08-13 16:11:33,184 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-13 16:11:42,241 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.018e+01 2.448e+01 2.786e+01 3.089e+01 5.850e+01, threshold=5.572e+01, percent-clipped=2.0 2024-08-13 16:11:50,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2204410.0, ans=0.1 2024-08-13 16:12:04,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2204510.0, ans=0.125 2024-08-13 16:12:06,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2204610.0, ans=0.0 2024-08-13 16:12:09,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2204610.0, ans=0.2 2024-08-13 16:12:16,406 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2204610.0, ans=0.125 2024-08-13 16:12:25,166 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.99 vs. limit=15.0 2024-08-13 16:12:33,949 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.05 vs. limit=15.0 2024-08-13 16:12:34,735 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2204810.0, ans=0.2 2024-08-13 16:12:35,617 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 3100, loss[loss=0.08767, beats_loss=0.0131, ecapa_loss=0.0001392, whisper_loss=0.07318, over 16854.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01088, ecapa_loss=0.0001618, whisper_loss=0.09127, over 3908334.97 frames. ], batch size: 67, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 16:12:38,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2204810.0, ans=0.0 2024-08-13 16:12:39,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2204810.0, ans=0.125 2024-08-13 16:12:39,894 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-13 16:12:49,281 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 16 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-13 16:12:52,512 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2204910.0, ans=0.125 2024-08-13 16:12:58,458 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2204910.0, ans=0.125 2024-08-13 16:13:20,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2205110.0, ans=0.1 2024-08-13 16:13:29,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2205110.0, ans=0.125 2024-08-13 16:13:31,052 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=2205210.0, ans=15.0 2024-08-13 16:13:37,487 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 37 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-13 16:13:38,234 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.24 vs. limit=22.5 2024-08-13 16:13:43,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2205210.0, ans=0.07 2024-08-13 16:13:45,563 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 3150, loss[loss=0.1121, beats_loss=0.009604, ecapa_loss=0.0001552, whisper_loss=0.1009, over 19198.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01083, ecapa_loss=0.0001625, whisper_loss=0.09145, over 3895383.39 frames. ], batch size: 74, lr: 3.95e-03, grad_scale: 5.764607523034235e+17 2024-08-13 16:13:46,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2205310.0, ans=0.1 2024-08-13 16:14:00,895 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.782e+01 2.370e+01 2.702e+01 3.002e+01 4.700e+01, threshold=5.405e+01, percent-clipped=0.0 2024-08-13 16:14:32,401 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 19 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-13 16:14:56,991 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 3200, loss[loss=0.1041, beats_loss=0.01178, ecapa_loss=0.0001706, whisper_loss=0.09059, over 17444.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01088, ecapa_loss=0.0001623, whisper_loss=0.09146, over 3892647.18 frames. ], batch size: 70, lr: 3.95e-03, grad_scale: 5.764607523034235e+17 2024-08-13 16:15:10,952 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.55 vs. limit=22.5 2024-08-13 16:15:15,772 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 25 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-13 16:15:32,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2206010.0, ans=0.0 2024-08-13 16:15:37,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2206010.0, ans=0.125 2024-08-13 16:15:37,976 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2206010.0, ans=0.0 2024-08-13 16:15:42,749 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.99 vs. limit=15.0 2024-08-13 16:15:44,313 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.35 vs. limit=15.0 2024-08-13 16:15:50,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2206110.0, ans=0.05 2024-08-13 16:16:06,831 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 20 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-13 16:16:09,593 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 23 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-13 16:16:10,662 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 3250, loss[loss=0.09569, beats_loss=0.01255, ecapa_loss=0.00018, whisper_loss=0.08134, over 18910.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01084, ecapa_loss=0.0001611, whisper_loss=0.09235, over 3871293.05 frames. ], batch size: 77, lr: 3.95e-03, grad_scale: 5.764607523034235e+17 2024-08-13 16:16:21,641 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-13 16:16:25,160 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.975e+01 2.428e+01 2.754e+01 3.023e+01 4.086e+01, threshold=5.507e+01, percent-clipped=0.0 2024-08-13 16:16:38,399 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.69 vs. limit=15.0 2024-08-13 16:16:45,522 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 19 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-13 16:17:22,016 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 3300, loss[loss=0.09361, beats_loss=0.008358, ecapa_loss=0.0002107, whisper_loss=0.08315, over 21477.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01093, ecapa_loss=0.0001607, whisper_loss=0.09135, over 3862854.71 frames. ], batch size: 91, lr: 3.95e-03, grad_scale: 5.764607523034235e+17 2024-08-13 16:17:24,831 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2206810.0, ans=0.125 2024-08-13 16:17:29,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2206810.0, ans=0.04949747468305833 2024-08-13 16:17:57,923 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 37 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-13 16:18:18,708 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.34 vs. limit=6.0 2024-08-13 16:18:28,464 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 16:18:33,284 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 3350, loss[loss=0.1249, beats_loss=0.007563, ecapa_loss=0.0001961, whisper_loss=0.1154, over 17045.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01096, ecapa_loss=0.0001612, whisper_loss=0.09094, over 3860164.37 frames. ], batch size: 68, lr: 3.95e-03, grad_scale: 5.764607523034235e+17 2024-08-13 16:18:44,923 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2207310.0, ans=0.125 2024-08-13 16:18:49,650 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.895e+01 2.421e+01 2.639e+01 2.919e+01 4.017e+01, threshold=5.278e+01, percent-clipped=0.0 2024-08-13 16:18:55,990 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2207410.0, ans=0.2 2024-08-13 16:19:04,029 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2207510.0, ans=0.125 2024-08-13 16:19:20,924 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 27 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-13 16:19:24,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2207610.0, ans=0.125 2024-08-13 16:19:27,002 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2207610.0, ans=0.1 2024-08-13 16:19:35,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2207710.0, ans=0.0 2024-08-13 16:19:38,548 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.39 vs. limit=15.0 2024-08-13 16:19:39,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2207710.0, ans=0.0 2024-08-13 16:19:41,168 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.33 vs. limit=15.0 2024-08-13 16:19:44,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2207810.0, ans=0.125 2024-08-13 16:19:45,951 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 3400, loss[loss=0.09104, beats_loss=0.008507, ecapa_loss=0.000178, whisper_loss=0.08075, over 15078.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01094, ecapa_loss=0.0001618, whisper_loss=0.09011, over 3849651.66 frames. ], batch size: 58, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:19:46,166 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 14 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-13 16:19:52,565 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-13 16:20:19,928 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.629e+05 2024-08-13 16:20:43,126 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.19 vs. limit=15.0 2024-08-13 16:20:51,162 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 21 from LS+wenet, 31 from Vox, 32 fro AS 2024-08-13 16:20:56,400 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 3450, loss[loss=0.1149, beats_loss=0.01111, ecapa_loss=0.0001734, whisper_loss=0.102, over 22518.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01093, ecapa_loss=0.0001619, whisper_loss=0.08956, over 3871668.73 frames. ], batch size: 91, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:21:05,349 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.72 vs. limit=15.0 2024-08-13 16:21:11,654 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.838e+01 2.477e+01 2.807e+01 3.322e+01 1.527e+02, threshold=5.614e+01, percent-clipped=5.0 2024-08-13 16:21:13,106 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 24 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-13 16:21:14,560 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 23 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-13 16:21:20,016 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=2208410.0, ans=0.05 2024-08-13 16:21:27,705 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.69 vs. limit=12.0 2024-08-13 16:21:35,758 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 19 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-13 16:21:36,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2208510.0, ans=0.2 2024-08-13 16:21:36,151 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2208510.0, ans=0.0 2024-08-13 16:21:46,703 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.99 vs. limit=10.0 2024-08-13 16:21:52,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2208710.0, ans=0.1 2024-08-13 16:21:54,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2208710.0, ans=0.2 2024-08-13 16:21:58,703 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.53 vs. limit=15.0 2024-08-13 16:22:06,932 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 3500, loss[loss=0.1031, beats_loss=0.009064, ecapa_loss=0.0002132, whisper_loss=0.09187, over 22806.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01084, ecapa_loss=0.0001633, whisper_loss=0.09025, over 3890099.38 frames. ], batch size: 94, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:22:09,231 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 19 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-13 16:22:12,729 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 16:22:30,077 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2208910.0, ans=0.0 2024-08-13 16:22:32,739 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 20 from LS+wenet, 31 from Vox, 26 fro AS 2024-08-13 16:22:35,348 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 23 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-13 16:22:45,261 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-13 16:22:51,593 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 27 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-13 16:23:05,098 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 10 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-13 16:23:06,848 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.06 vs. limit=15.0 2024-08-13 16:23:13,602 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 25 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-13 16:23:20,195 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 3550, loss[loss=0.1109, beats_loss=0.009753, ecapa_loss=0.0001823, whisper_loss=0.09936, over 20349.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01086, ecapa_loss=0.0001639, whisper_loss=0.09002, over 3886277.52 frames. ], batch size: 83, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:23:22,504 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-13 16:23:23,864 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=7.788e+01 2024-08-13 16:23:25,201 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2209310.0, ans=0.2 2024-08-13 16:23:26,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2209310.0, ans=0.0 2024-08-13 16:23:35,750 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2209410.0, ans=0.0 2024-08-13 16:23:36,561 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.460e+01 2.758e+01 3.003e+01 5.341e+01, threshold=5.516e+01, percent-clipped=0.0 2024-08-13 16:23:53,327 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 12 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-13 16:24:04,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2209610.0, ans=0.2 2024-08-13 16:24:14,336 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.29 vs. limit=12.0 2024-08-13 16:24:16,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2209610.0, ans=0.1 2024-08-13 16:24:34,780 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 3600, loss[loss=0.1055, beats_loss=0.01368, ecapa_loss=0.0001603, whisper_loss=0.09026, over 15971.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01091, ecapa_loss=0.000163, whisper_loss=0.09, over 3884170.87 frames. ], batch size: 64, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:24:34,964 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 39 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-13 16:24:39,334 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-13 16:25:15,635 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.51 vs. limit=15.0 2024-08-13 16:25:19,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2210110.0, ans=0.125 2024-08-13 16:25:21,076 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.05 vs. limit=10.0 2024-08-13 16:25:27,682 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 26 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-13 16:25:40,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2210210.0, ans=0.0 2024-08-13 16:25:41,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2210210.0, ans=0.0 2024-08-13 16:25:46,884 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 3650, loss[loss=0.1217, beats_loss=0.009563, ecapa_loss=0.0001557, whisper_loss=0.1106, over 22883.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01096, ecapa_loss=0.0001626, whisper_loss=0.09078, over 3918008.03 frames. ], batch size: 89, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:25:46,995 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 18 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-13 16:25:55,039 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-13 16:26:02,826 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.319e+01 2.686e+01 3.119e+01 4.845e+01, threshold=5.372e+01, percent-clipped=0.0 2024-08-13 16:26:03,562 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.33 vs. limit=15.0 2024-08-13 16:26:14,963 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-13 16:26:16,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2210510.0, ans=0.0 2024-08-13 16:26:30,427 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2210610.0, ans=0.125 2024-08-13 16:26:41,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2210710.0, ans=0.1 2024-08-13 16:26:50,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2210710.0, ans=0.125 2024-08-13 16:26:50,076 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2210710.0, ans=0.0 2024-08-13 16:26:56,162 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 3700, loss[loss=0.1085, beats_loss=0.009074, ecapa_loss=0.0001847, whisper_loss=0.09761, over 18657.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01086, ecapa_loss=0.0001632, whisper_loss=0.0913, over 3911098.39 frames. ], batch size: 73, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:27:16,180 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 21 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-13 16:27:41,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2211110.0, ans=0.025 2024-08-13 16:27:42,549 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 29 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-13 16:27:48,151 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2211110.0, ans=0.125 2024-08-13 16:28:03,112 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 3750, loss[loss=0.1007, beats_loss=0.01168, ecapa_loss=0.000138, whisper_loss=0.0876, over 23576.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01086, ecapa_loss=0.0001634, whisper_loss=0.09189, over 3936629.10 frames. ], batch size: 94, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:28:11,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2211310.0, ans=0.2 2024-08-13 16:28:17,735 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.983e+01 2.410e+01 2.677e+01 3.009e+01 6.113e+01, threshold=5.354e+01, percent-clipped=1.0 2024-08-13 16:28:34,727 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.687e-02 2024-08-13 16:28:36,002 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2211510.0, ans=0.1 2024-08-13 16:28:37,330 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 25 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-13 16:28:46,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2211610.0, ans=0.125 2024-08-13 16:28:50,301 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-13 16:28:51,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2211610.0, ans=0.2 2024-08-13 16:29:08,278 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 3800, loss[loss=0.1157, beats_loss=0.009464, ecapa_loss=0.0001836, whisper_loss=0.1044, over 16893.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01095, ecapa_loss=0.0001619, whisper_loss=0.09133, over 3953586.72 frames. ], batch size: 67, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:29:13,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2211810.0, ans=0.0 2024-08-13 16:29:16,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2211810.0, ans=0.125 2024-08-13 16:29:32,947 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 27 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-13 16:29:36,157 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.96 vs. limit=15.0 2024-08-13 16:29:40,544 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 27 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-13 16:29:40,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2212010.0, ans=0.05 2024-08-13 16:29:44,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2212010.0, ans=0.125 2024-08-13 16:29:51,139 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-13 16:30:11,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2212210.0, ans=0.1 2024-08-13 16:30:13,163 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 3850, loss[loss=0.1167, beats_loss=0.01187, ecapa_loss=0.0001596, whisper_loss=0.1033, over 22761.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01089, ecapa_loss=0.0001621, whisper_loss=0.09147, over 3936879.39 frames. ], batch size: 92, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:30:21,150 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2212310.0, ans=0.2 2024-08-13 16:30:23,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2212310.0, ans=0.0 2024-08-13 16:30:27,497 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.456e+01 2.760e+01 3.181e+01 8.437e+01, threshold=5.521e+01, percent-clipped=2.0 2024-08-13 16:30:28,865 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 23 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-13 16:30:39,422 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-13 16:30:39,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2212510.0, ans=0.125 2024-08-13 16:30:44,786 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 36 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-13 16:30:51,594 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2212610.0, ans=0.1 2024-08-13 16:30:54,198 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2212610.0, ans=0.1 2024-08-13 16:31:11,801 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.41 vs. limit=10.0 2024-08-13 16:31:13,850 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 19 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-13 16:31:18,620 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 3900, loss[loss=0.1176, beats_loss=0.008267, ecapa_loss=0.0001615, whisper_loss=0.1077, over 18490.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01082, ecapa_loss=0.0001631, whisper_loss=0.09224, over 3943311.94 frames. ], batch size: 68, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:31:21,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2212810.0, ans=0.125 2024-08-13 16:31:36,132 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2212910.0, ans=0.0 2024-08-13 16:31:41,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2212910.0, ans=0.125 2024-08-13 16:31:45,189 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2213010.0, ans=0.125 2024-08-13 16:31:46,114 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 27 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-13 16:31:47,901 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2213010.0, ans=0.5 2024-08-13 16:31:48,296 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.50 vs. limit=10.0 2024-08-13 16:31:51,554 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2213010.0, ans=0.125 2024-08-13 16:31:56,693 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-13 16:32:09,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2213210.0, ans=0.1 2024-08-13 16:32:23,152 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 3950, loss[loss=0.1144, beats_loss=0.01028, ecapa_loss=0.0001543, whisper_loss=0.1025, over 23856.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01075, ecapa_loss=0.0001633, whisper_loss=0.09283, over 3957537.56 frames. ], batch size: 94, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:32:35,398 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.983e+05 2024-08-13 16:32:35,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=2213410.0, ans=15.0 2024-08-13 16:32:37,363 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.503e+01 2.824e+01 3.168e+01 4.630e+01, threshold=5.649e+01, percent-clipped=0.0 2024-08-13 16:32:41,670 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 18 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-13 16:32:45,738 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 16 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-13 16:32:57,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2213510.0, ans=0.1 2024-08-13 16:33:12,031 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2213610.0, ans=0.125 2024-08-13 16:33:24,797 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 18 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-13 16:33:25,098 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2213710.0, ans=0.125 2024-08-13 16:33:28,415 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 4000, loss[loss=0.09933, beats_loss=0.01131, ecapa_loss=0.0001419, whisper_loss=0.08661, over 17799.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01074, ecapa_loss=0.0001641, whisper_loss=0.09208, over 3925867.92 frames. ], batch size: 71, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:33:39,384 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 17 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-13 16:33:59,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2214010.0, ans=0.125 2024-08-13 16:34:03,010 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2214010.0, ans=0.0 2024-08-13 16:34:04,304 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2214010.0, ans=0.1 2024-08-13 16:34:05,532 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 27 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-13 16:34:12,021 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 21 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-13 16:34:14,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2214110.0, ans=0.05 2024-08-13 16:34:17,474 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2214110.0, ans=0.0 2024-08-13 16:34:19,663 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 22 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-13 16:34:26,571 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2214210.0, ans=0.1 2024-08-13 16:34:30,449 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2214210.0, ans=0.125 2024-08-13 16:34:33,800 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 4050, loss[loss=0.1009, beats_loss=0.009647, ecapa_loss=0.0001725, whisper_loss=0.08957, over 18571.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.0107, ecapa_loss=0.0001648, whisper_loss=0.09204, over 3907001.33 frames. ], batch size: 73, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:34:34,499 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2214310.0, ans=0.125 2024-08-13 16:34:42,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2214310.0, ans=0.125 2024-08-13 16:34:48,342 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.740e+01 2.510e+01 2.777e+01 3.045e+01 5.508e+01, threshold=5.554e+01, percent-clipped=0.0 2024-08-13 16:34:56,498 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2214410.0, ans=0.125 2024-08-13 16:35:16,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=2214610.0, ans=15.0 2024-08-13 16:35:28,956 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2214710.0, ans=0.125 2024-08-13 16:35:29,265 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.37 vs. limit=15.0 2024-08-13 16:35:33,523 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.34 vs. limit=22.5 2024-08-13 16:35:38,959 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 4100, loss[loss=0.1005, beats_loss=0.01134, ecapa_loss=0.0001525, whisper_loss=0.08767, over 23390.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01071, ecapa_loss=0.0001657, whisper_loss=0.09243, over 3887622.16 frames. ], batch size: 94, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:35:40,578 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2214810.0, ans=0.125 2024-08-13 16:35:53,542 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2214910.0, ans=0.0 2024-08-13 16:36:28,936 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 16:36:29,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2215110.0, ans=0.125 2024-08-13 16:36:32,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2215210.0, ans=0.1 2024-08-13 16:36:43,817 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 4150, loss[loss=0.1214, beats_loss=0.008571, ecapa_loss=0.0001559, whisper_loss=0.1113, over 20733.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01075, ecapa_loss=0.0001661, whisper_loss=0.09161, over 3857339.58 frames. ], batch size: 80, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:36:45,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2215310.0, ans=0.2 2024-08-13 16:36:54,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2215310.0, ans=0.1 2024-08-13 16:36:56,793 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2215410.0, ans=0.125 2024-08-13 16:36:57,847 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.841e+01 2.337e+01 2.557e+01 2.975e+01 8.257e+01, threshold=5.114e+01, percent-clipped=2.0 2024-08-13 16:37:15,878 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.92 vs. limit=15.0 2024-08-13 16:37:19,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2215510.0, ans=0.125 2024-08-13 16:37:27,525 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.12 vs. limit=15.0 2024-08-13 16:37:41,578 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.15 vs. limit=10.0 2024-08-13 16:37:45,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2215710.0, ans=0.125 2024-08-13 16:37:46,620 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2215710.0, ans=0.125 2024-08-13 16:37:48,715 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 4200, loss[loss=0.1053, beats_loss=0.01044, ecapa_loss=0.0001532, whisper_loss=0.09334, over 23433.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01082, ecapa_loss=0.0001647, whisper_loss=0.09166, over 3872658.01 frames. ], batch size: 92, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:37:55,425 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 21 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-13 16:38:36,951 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.01 vs. limit=15.0 2024-08-13 16:38:56,483 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 4250, loss[loss=0.1097, beats_loss=0.01094, ecapa_loss=0.0001516, whisper_loss=0.09727, over 22270.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01082, ecapa_loss=0.0001639, whisper_loss=0.09142, over 3878399.53 frames. ], batch size: 89, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:39:01,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2216310.0, ans=0.125 2024-08-13 16:39:12,423 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.310e+01 2.639e+01 2.854e+01 4.176e+01, threshold=5.278e+01, percent-clipped=0.0 2024-08-13 16:39:12,779 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2216410.0, ans=0.125 2024-08-13 16:39:18,841 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 22 from LS+wenet, 11 from Vox, 21 fro AS 2024-08-13 16:39:23,657 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.21 vs. limit=15.0 2024-08-13 16:39:29,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2216510.0, ans=0.0 2024-08-13 16:40:05,883 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2216710.0, ans=0.125 2024-08-13 16:40:11,858 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 4300, loss[loss=0.08456, beats_loss=0.01184, ecapa_loss=0.0001649, whisper_loss=0.07108, over 23284.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01078, ecapa_loss=0.0001644, whisper_loss=0.09135, over 3894287.44 frames. ], batch size: 93, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:40:14,956 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 23 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-13 16:40:18,042 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-13 16:40:58,591 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2217110.0, ans=0.125 2024-08-13 16:41:15,408 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2217210.0, ans=0.0 2024-08-13 16:41:25,511 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=2217210.0, ans=10.0 2024-08-13 16:41:27,573 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 4350, loss[loss=0.0668, beats_loss=0.01286, ecapa_loss=0.0001395, whisper_loss=0.05255, over 18558.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01083, ecapa_loss=0.0001645, whisper_loss=0.09079, over 3887952.82 frames. ], batch size: 76, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:41:32,880 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 37 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-13 16:41:34,042 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 17 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-13 16:41:34,335 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2217310.0, ans=0.125 2024-08-13 16:41:41,873 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.008e+01 2.464e+01 2.794e+01 3.090e+01 4.694e+01, threshold=5.588e+01, percent-clipped=0.0 2024-08-13 16:41:46,585 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2217410.0, ans=0.125 2024-08-13 16:41:53,976 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2217510.0, ans=0.125 2024-08-13 16:41:56,610 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.95 vs. limit=22.5 2024-08-13 16:42:00,326 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 24 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-13 16:42:15,759 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-13 16:42:32,833 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 4400, loss[loss=0.1049, beats_loss=0.00841, ecapa_loss=0.0001654, whisper_loss=0.09484, over 18881.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01071, ecapa_loss=0.0001659, whisper_loss=0.09173, over 3914198.15 frames. ], batch size: 71, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:42:33,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2217810.0, ans=0.125 2024-08-13 16:42:34,327 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 20 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-13 16:42:35,512 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 19 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-13 16:42:43,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2217810.0, ans=0.125 2024-08-13 16:42:47,346 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 21 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-13 16:42:50,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2217910.0, ans=0.2 2024-08-13 16:42:51,623 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2217910.0, ans=0.125 2024-08-13 16:43:37,268 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 4450, loss[loss=0.1077, beats_loss=0.01223, ecapa_loss=0.0001165, whisper_loss=0.09427, over 23752.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01065, ecapa_loss=0.0001645, whisper_loss=0.09223, over 3909639.67 frames. ], batch size: 91, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:43:44,233 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-13 16:43:49,698 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2218410.0, ans=0.125 2024-08-13 16:43:51,885 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.861e+01 2.377e+01 2.551e+01 3.070e+01 5.212e+01, threshold=5.103e+01, percent-clipped=0.0 2024-08-13 16:43:53,554 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2218410.0, ans=0.125 2024-08-13 16:44:01,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2218410.0, ans=0.05 2024-08-13 16:44:10,345 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 16:44:29,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2218710.0, ans=0.125 2024-08-13 16:44:37,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=2218710.0, ans=15.0 2024-08-13 16:44:41,662 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 4500, loss[loss=0.1224, beats_loss=0.009563, ecapa_loss=0.000182, whisper_loss=0.111, over 23059.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01066, ecapa_loss=0.0001636, whisper_loss=0.09249, over 3934663.77 frames. ], batch size: 93, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:44:46,205 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2218810.0, ans=0.1 2024-08-13 16:44:48,439 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 25 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-13 16:44:57,790 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2218910.0, ans=0.0 2024-08-13 16:45:01,594 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2218910.0, ans=10.0 2024-08-13 16:45:01,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2218910.0, ans=0.125 2024-08-13 16:45:09,532 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 18 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-13 16:45:21,281 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.17 vs. limit=22.5 2024-08-13 16:45:22,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2219110.0, ans=0.125 2024-08-13 16:45:24,764 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2219110.0, ans=0.125 2024-08-13 16:45:35,038 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=21.17 vs. limit=22.5 2024-08-13 16:45:42,868 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 22 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-13 16:45:49,267 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-13 16:45:55,025 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 4550, loss[loss=0.09349, beats_loss=0.01416, ecapa_loss=0.0001157, whisper_loss=0.07817, over 22328.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01074, ecapa_loss=0.000164, whisper_loss=0.09168, over 3915455.06 frames. ], batch size: 88, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:45:56,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2219310.0, ans=0.0 2024-08-13 16:46:00,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2219310.0, ans=0.125 2024-08-13 16:46:05,628 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2219310.0, ans=0.1 2024-08-13 16:46:12,356 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.690e+01 2.447e+01 2.788e+01 3.187e+01 5.560e+01, threshold=5.575e+01, percent-clipped=2.0 2024-08-13 16:46:13,708 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 21 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 16:46:15,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2219410.0, ans=0.035 2024-08-13 16:46:16,119 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-13 16:46:22,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2219410.0, ans=0.125 2024-08-13 16:46:22,486 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.64 vs. limit=15.0 2024-08-13 16:46:30,633 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2219510.0, ans=0.07 2024-08-13 16:46:34,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2219510.0, ans=0.2 2024-08-13 16:46:37,772 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.13 vs. limit=15.0 2024-08-13 16:46:47,841 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 17 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-13 16:47:26,195 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.57 vs. limit=22.5 2024-08-13 16:47:31,882 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 4600, loss[loss=0.1135, beats_loss=0.01194, ecapa_loss=0.0001285, whisper_loss=0.1003, over 19942.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01076, ecapa_loss=0.0001644, whisper_loss=0.09194, over 3920446.74 frames. ], batch size: 76, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:47:37,061 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 26 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-13 16:47:46,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2219810.0, ans=0.1 2024-08-13 16:47:50,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2219810.0, ans=0.125 2024-08-13 16:47:57,273 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-13 16:47:57,511 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2219910.0, ans=0.2 2024-08-13 16:48:04,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2219910.0, ans=0.125 2024-08-13 16:48:06,411 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.89 vs. limit=15.0 2024-08-13 16:48:22,933 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.42 vs. limit=22.5 2024-08-13 16:48:27,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2220010.0, ans=0.125 2024-08-13 16:48:38,160 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2220110.0, ans=0.125 2024-08-13 16:49:10,029 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2220210.0, ans=0.125 2024-08-13 16:49:17,119 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.38 vs. limit=15.0 2024-08-13 16:49:17,617 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 23 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-13 16:49:22,910 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 25 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-13 16:49:24,657 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 4650, loss[loss=0.1059, beats_loss=0.01092, ecapa_loss=0.0001872, whisper_loss=0.09311, over 18138.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01077, ecapa_loss=0.0001641, whisper_loss=0.09196, over 3923592.03 frames. ], batch size: 75, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:49:48,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.whiten.whitening_limit, batch_count=2220410.0, ans=12.0 2024-08-13 16:49:50,319 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.768e+01 2.519e+01 2.721e+01 2.978e+01 4.976e+01, threshold=5.443e+01, percent-clipped=0.0 2024-08-13 16:50:03,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2220410.0, ans=0.125 2024-08-13 16:50:32,690 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 30 from LS+wenet, 32 from Vox, 32 fro AS 2024-08-13 16:50:52,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2220610.0, ans=0.1 2024-08-13 16:50:53,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2220710.0, ans=0.1 2024-08-13 16:51:10,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2220710.0, ans=0.125 2024-08-13 16:51:19,506 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 4700, loss[loss=0.07995, beats_loss=0.01108, ecapa_loss=0.0001329, whisper_loss=0.06753, over 14248.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01074, ecapa_loss=0.0001641, whisper_loss=0.09179, over 3892986.43 frames. ], batch size: 54, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:51:25,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2220810.0, ans=0.2 2024-08-13 16:51:25,463 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=22.54 vs. limit=22.5 2024-08-13 16:52:31,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2221110.0, ans=0.125 2024-08-13 16:52:39,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2221110.0, ans=0.125 2024-08-13 16:52:43,681 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 15 from LS+wenet, 24 from Vox, 20 fro AS 2024-08-13 16:52:54,161 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 19 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-13 16:53:01,955 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 4750, loss[loss=0.1249, beats_loss=0.007956, ecapa_loss=0.0002211, whisper_loss=0.1147, over 13916.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01079, ecapa_loss=0.0001641, whisper_loss=0.09155, over 3918861.72 frames. ], batch size: 56, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:53:13,246 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 27 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-13 16:53:15,081 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 26 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-13 16:53:17,624 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.389e+01 2.725e+01 3.065e+01 4.342e+01, threshold=5.451e+01, percent-clipped=0.0 2024-08-13 16:53:25,101 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 16:53:32,509 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 23 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-13 16:54:14,486 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 4800, loss[loss=0.1215, beats_loss=0.009773, ecapa_loss=0.0001745, whisper_loss=0.11, over 16753.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01084, ecapa_loss=0.0001641, whisper_loss=0.09134, over 3906341.19 frames. ], batch size: 65, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:54:16,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2221810.0, ans=0.2 2024-08-13 16:54:43,367 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2221910.0, ans=0.0 2024-08-13 16:54:59,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2222010.0, ans=0.2 2024-08-13 16:55:19,450 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-13 16:55:20,710 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 20 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-13 16:55:35,388 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 4850, loss[loss=0.08051, beats_loss=0.01034, ecapa_loss=0.0001864, whisper_loss=0.0683, over 17926.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01091, ecapa_loss=0.0001629, whisper_loss=0.09147, over 3930377.93 frames. ], batch size: 73, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:55:52,644 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.037e+01 2.475e+01 2.681e+01 3.157e+01 5.324e+01, threshold=5.362e+01, percent-clipped=0.0 2024-08-13 16:55:57,548 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-13 16:55:59,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2222410.0, ans=0.2 2024-08-13 16:56:04,048 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 35 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-13 16:56:33,511 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.16 vs. limit=15.0 2024-08-13 16:56:50,752 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 24 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-13 16:56:51,823 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 4900, loss[loss=0.1129, beats_loss=0.00822, ecapa_loss=0.000192, whisper_loss=0.1028, over 16189.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.0108, ecapa_loss=0.000164, whisper_loss=0.09189, over 3871979.42 frames. ], batch size: 64, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:56:59,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2222810.0, ans=0.2 2024-08-13 16:57:35,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2223010.0, ans=0.1 2024-08-13 16:57:36,236 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-13 16:58:01,572 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 28 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-13 16:58:02,406 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.26 vs. limit=15.0 2024-08-13 16:58:08,920 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 4950, loss[loss=0.1091, beats_loss=0.0103, ecapa_loss=0.0001691, whisper_loss=0.09713, over 17692.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.0108, ecapa_loss=0.0001638, whisper_loss=0.09137, over 3845545.11 frames. ], batch size: 71, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:58:10,377 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 47 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-13 16:58:17,231 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.30 vs. limit=6.0 2024-08-13 16:58:17,882 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 26 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-13 16:58:26,187 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.788e+01 2.318e+01 2.496e+01 2.819e+01 1.833e+02, threshold=4.991e+01, percent-clipped=1.0 2024-08-13 16:58:26,951 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.11 vs. limit=15.0 2024-08-13 16:58:36,681 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 24 from LS+wenet, 9 from Vox, 23 fro AS 2024-08-13 16:59:00,498 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2223610.0, ans=0.125 2024-08-13 16:59:07,084 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=2223610.0, ans=0.5 2024-08-13 16:59:08,484 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 23 from LS+wenet, 32 from Vox, 37 fro AS 2024-08-13 16:59:08,691 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 16:59:21,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2223710.0, ans=0.125 2024-08-13 16:59:21,547 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.83 vs. limit=22.5 2024-08-13 16:59:26,527 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 5000, loss[loss=0.08601, beats_loss=0.01219, ecapa_loss=0.0001166, whisper_loss=0.07265, over 18274.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01071, ecapa_loss=0.0001624, whisper_loss=0.09289, over 3881345.23 frames. ], batch size: 69, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:59:27,002 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2223810.0, ans=0.2 2024-08-13 16:59:28,165 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-13 16:59:37,055 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 29 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-13 16:59:53,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2223910.0, ans=0.0 2024-08-13 16:59:57,758 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 29 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-13 16:59:58,043 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2224010.0, ans=0.125 2024-08-13 16:59:59,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2224010.0, ans=0.1 2024-08-13 17:00:01,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2224010.0, ans=0.125 2024-08-13 17:00:08,275 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.97 vs. limit=15.0 2024-08-13 17:00:29,298 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.27 vs. limit=15.0 2024-08-13 17:00:41,660 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 5050, loss[loss=0.1194, beats_loss=0.009139, ecapa_loss=0.0001597, whisper_loss=0.1086, over 21044.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01076, ecapa_loss=0.0001626, whisper_loss=0.09294, over 3910340.37 frames. ], batch size: 83, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:01:00,461 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.944e+01 2.341e+01 2.653e+01 3.152e+01 4.271e+01, threshold=5.307e+01, percent-clipped=0.0 2024-08-13 17:01:04,248 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.42 vs. limit=15.0 2024-08-13 17:01:27,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2224610.0, ans=0.2 2024-08-13 17:01:27,455 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2224610.0, ans=0.125 2024-08-13 17:01:39,659 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.33 vs. limit=15.0 2024-08-13 17:01:49,478 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-13 17:01:55,447 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2224710.0, ans=0.1 2024-08-13 17:01:56,389 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-13 17:01:57,620 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 5100, loss[loss=0.1052, beats_loss=0.008951, ecapa_loss=0.0001678, whisper_loss=0.09455, over 22881.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01074, ecapa_loss=0.0001628, whisper_loss=0.09261, over 3900028.56 frames. ], batch size: 91, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:01:57,883 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 29 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-13 17:02:03,939 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 30 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-13 17:02:20,831 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2224910.0, ans=0.0 2024-08-13 17:02:24,937 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-13 17:02:40,891 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2225010.0, ans=0.2 2024-08-13 17:03:14,257 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 5150, loss[loss=0.1069, beats_loss=0.01038, ecapa_loss=0.000145, whisper_loss=0.09507, over 18407.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01075, ecapa_loss=0.0001624, whisper_loss=0.09315, over 3924336.22 frames. ], batch size: 72, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:03:25,943 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 25 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-13 17:03:29,598 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.384e+01 2.654e+01 2.972e+01 6.587e+01, threshold=5.307e+01, percent-clipped=1.0 2024-08-13 17:03:34,831 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2225410.0, ans=0.09899494936611666 2024-08-13 17:03:40,185 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.308e-02 2024-08-13 17:03:42,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2225510.0, ans=0.125 2024-08-13 17:04:12,031 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 21 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-13 17:04:28,254 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 5200, loss[loss=0.0794, beats_loss=0.01079, ecapa_loss=0.000138, whisper_loss=0.06724, over 14721.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01073, ecapa_loss=0.0001619, whisper_loss=0.09236, over 3875208.15 frames. ], batch size: 57, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:04:29,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2225810.0, ans=0.1 2024-08-13 17:04:30,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2225810.0, ans=0.125 2024-08-13 17:04:30,937 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 23 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-13 17:05:07,367 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.05 vs. limit=15.0 2024-08-13 17:05:34,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2226210.0, ans=0.2 2024-08-13 17:05:41,230 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 5250, loss[loss=0.1146, beats_loss=0.009783, ecapa_loss=0.0001564, whisper_loss=0.1032, over 18080.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01079, ecapa_loss=0.0001605, whisper_loss=0.09139, over 3902337.03 frames. ], batch size: 70, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:05:53,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2226310.0, ans=0.125 2024-08-13 17:05:58,278 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.945e+01 2.417e+01 2.576e+01 2.914e+01 4.655e+01, threshold=5.152e+01, percent-clipped=0.0 2024-08-13 17:06:02,105 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 19 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-13 17:06:07,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2226410.0, ans=0.125 2024-08-13 17:06:25,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2226510.0, ans=0.0 2024-08-13 17:06:33,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2226610.0, ans=0.125 2024-08-13 17:06:59,308 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 5300, loss[loss=0.09037, beats_loss=0.01218, ecapa_loss=0.0001726, whisper_loss=0.07647, over 20865.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01078, ecapa_loss=0.0001612, whisper_loss=0.09156, over 3901160.30 frames. ], batch size: 86, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:07:02,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2226810.0, ans=10.0 2024-08-13 17:07:20,077 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.16 vs. limit=15.0 2024-08-13 17:07:23,416 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=6.109e+01 2024-08-13 17:07:24,525 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 29 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-13 17:07:26,809 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.64 vs. limit=22.5 2024-08-13 17:07:44,364 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 28 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-13 17:07:46,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2227110.0, ans=0.1 2024-08-13 17:08:12,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2227210.0, ans=0.0 2024-08-13 17:08:16,599 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 5350, loss[loss=0.08878, beats_loss=0.01122, ecapa_loss=0.0001782, whisper_loss=0.07578, over 19424.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01082, ecapa_loss=0.0001599, whisper_loss=0.09067, over 3873045.49 frames. ], batch size: 83, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:08:18,886 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2227310.0, ans=0.125 2024-08-13 17:08:26,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2227310.0, ans=0.09899494936611666 2024-08-13 17:08:34,135 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.978e+01 2.323e+01 2.552e+01 2.858e+01 4.460e+01, threshold=5.104e+01, percent-clipped=0.0 2024-08-13 17:08:53,622 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 22 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-13 17:09:35,299 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 5400, loss[loss=0.09616, beats_loss=0.01054, ecapa_loss=0.0001724, whisper_loss=0.0839, over 18795.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01085, ecapa_loss=0.0001596, whisper_loss=0.08986, over 3862015.74 frames. ], batch size: 73, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:09:45,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2227810.0, ans=0.2 2024-08-13 17:10:12,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2228010.0, ans=0.1 2024-08-13 17:10:41,519 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2228210.0, ans=0.0 2024-08-13 17:10:53,100 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 5450, loss[loss=0.1093, beats_loss=0.01089, ecapa_loss=0.0001628, whisper_loss=0.09679, over 19208.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01081, ecapa_loss=0.0001613, whisper_loss=0.09049, over 3860475.42 frames. ], batch size: 77, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:10:59,192 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 19 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-13 17:11:03,291 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.75 vs. limit=10.0 2024-08-13 17:11:06,926 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.40 vs. limit=22.5 2024-08-13 17:11:11,584 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.921e+01 2.403e+01 2.600e+01 2.908e+01 1.736e+02, threshold=5.201e+01, percent-clipped=2.0 2024-08-13 17:11:28,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2228510.0, ans=0.0 2024-08-13 17:11:30,984 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=2228510.0, ans=0.025 2024-08-13 17:11:51,109 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=2228610.0, ans=0.95 2024-08-13 17:12:03,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2228710.0, ans=0.1 2024-08-13 17:12:04,759 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 27 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-13 17:12:08,318 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.63 vs. limit=8.0 2024-08-13 17:12:12,311 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 5500, loss[loss=0.1044, beats_loss=0.01094, ecapa_loss=0.000186, whisper_loss=0.09158, over 14586.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01077, ecapa_loss=0.0001614, whisper_loss=0.09114, over 3859017.41 frames. ], batch size: 58, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:12:22,873 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 18 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-13 17:12:30,185 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-13 17:12:32,109 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=2228910.0, ans=0.95 2024-08-13 17:12:41,711 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2229010.0, ans=0.125 2024-08-13 17:12:51,030 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 17 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-13 17:13:06,612 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-13 17:13:13,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2229210.0, ans=0.0 2024-08-13 17:13:20,986 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=2229210.0, ans=15.0 2024-08-13 17:13:22,420 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.09 vs. limit=15.0 2024-08-13 17:13:23,823 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2229210.0, ans=0.0 2024-08-13 17:13:30,608 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 5550, loss[loss=0.1159, beats_loss=0.01093, ecapa_loss=0.0001693, whisper_loss=0.1033, over 22153.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01077, ecapa_loss=0.0001629, whisper_loss=0.09094, over 3894076.71 frames. ], batch size: 91, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:13:33,444 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2229310.0, ans=0.0 2024-08-13 17:13:38,427 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2229310.0, ans=0.125 2024-08-13 17:13:42,001 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 26 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-13 17:13:43,984 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=2229310.0, ans=0.5 2024-08-13 17:13:45,010 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 21 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-13 17:13:46,473 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2229410.0, ans=0.125 2024-08-13 17:13:50,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2229410.0, ans=0.125 2024-08-13 17:13:51,538 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.725e+01 2.399e+01 2.709e+01 2.923e+01 5.241e+01, threshold=5.419e+01, percent-clipped=1.0 2024-08-13 17:13:55,106 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-13 17:13:58,640 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2229410.0, ans=0.125 2024-08-13 17:14:16,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2229510.0, ans=0.0 2024-08-13 17:14:17,158 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.83 vs. limit=15.0 2024-08-13 17:14:35,499 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2229610.0, ans=0.125 2024-08-13 17:14:35,593 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2229610.0, ans=0.125 2024-08-13 17:14:42,569 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2229710.0, ans=0.1 2024-08-13 17:14:43,653 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 19 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-13 17:14:54,204 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 5600, loss[loss=0.09578, beats_loss=0.01064, ecapa_loss=0.0001874, whisper_loss=0.08326, over 14226.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01086, ecapa_loss=0.0001631, whisper_loss=0.09094, over 3891832.95 frames. ], batch size: 58, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:15:01,125 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 30 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-13 17:15:28,895 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 25 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-13 17:15:34,261 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.86 vs. limit=15.0 2024-08-13 17:15:41,845 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 25 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-13 17:15:43,883 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 23 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-13 17:15:58,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2230210.0, ans=0.125 2024-08-13 17:16:11,142 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 5650, loss[loss=0.1096, beats_loss=0.01038, ecapa_loss=0.0001351, whisper_loss=0.09789, over 24164.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01092, ecapa_loss=0.0001629, whisper_loss=0.09065, over 3902926.70 frames. ], batch size: 93, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:16:24,435 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.00 vs. limit=15.0 2024-08-13 17:16:29,590 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.435e+01 2.742e+01 3.035e+01 1.015e+02, threshold=5.483e+01, percent-clipped=1.0 2024-08-13 17:16:39,020 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 21 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-13 17:16:42,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2230510.0, ans=0.0 2024-08-13 17:16:59,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2230610.0, ans=0.125 2024-08-13 17:17:29,053 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 5700, loss[loss=0.06282, beats_loss=0.01391, ecapa_loss=0.0001173, whisper_loss=0.04774, over 16533.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01087, ecapa_loss=0.0001643, whisper_loss=0.09081, over 3902340.79 frames. ], batch size: 63, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:17:37,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2230810.0, ans=0.0 2024-08-13 17:17:38,694 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 39 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-13 17:17:53,173 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2230910.0, ans=0.09899494936611666 2024-08-13 17:18:21,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2231110.0, ans=0.1 2024-08-13 17:18:37,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2231210.0, ans=0.0 2024-08-13 17:18:44,466 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 23 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-13 17:18:48,338 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 5750, loss[loss=0.1099, beats_loss=0.01234, ecapa_loss=0.0001404, whisper_loss=0.09613, over 22332.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0109, ecapa_loss=0.0001637, whisper_loss=0.09081, over 3918383.95 frames. ], batch size: 87, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:18:55,030 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 27 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-13 17:19:07,389 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.664e+01 2.349e+01 2.635e+01 2.966e+01 1.104e+02, threshold=5.269e+01, percent-clipped=1.0 2024-08-13 17:19:14,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2231410.0, ans=0.035 2024-08-13 17:19:17,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2231410.0, ans=0.125 2024-08-13 17:19:25,894 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2231510.0, ans=0.1 2024-08-13 17:19:37,985 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 23 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-13 17:19:59,381 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 27 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-13 17:20:05,063 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 5800, loss[loss=0.1018, beats_loss=0.01278, ecapa_loss=0.0001519, whisper_loss=0.08749, over 17042.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01092, ecapa_loss=0.0001633, whisper_loss=0.09058, over 3883340.13 frames. ], batch size: 68, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:20:28,650 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 15 from Vox, 47 fro AS 2024-08-13 17:20:37,190 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.99 vs. limit=22.5 2024-08-13 17:20:40,798 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.52 vs. limit=15.0 2024-08-13 17:21:10,313 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2232210.0, ans=0.125 2024-08-13 17:21:20,018 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 5850, loss[loss=0.1007, beats_loss=0.01047, ecapa_loss=0.0001913, whisper_loss=0.08828, over 22461.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01086, ecapa_loss=0.0001638, whisper_loss=0.0915, over 3905211.10 frames. ], batch size: 91, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:21:26,328 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.78 vs. limit=22.5 2024-08-13 17:21:37,620 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.300e+01 2.602e+01 2.847e+01 5.570e+01, threshold=5.204e+01, percent-clipped=1.0 2024-08-13 17:21:44,222 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.67 vs. limit=22.5 2024-08-13 17:22:03,660 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2232610.0, ans=0.1 2024-08-13 17:22:06,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2232610.0, ans=0.1 2024-08-13 17:22:10,545 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2232610.0, ans=0.0 2024-08-13 17:22:18,682 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2232710.0, ans=0.0 2024-08-13 17:22:33,015 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 5900, loss[loss=0.09461, beats_loss=0.01287, ecapa_loss=0.0001784, whisper_loss=0.07996, over 17392.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01085, ecapa_loss=0.0001641, whisper_loss=0.09109, over 3868562.81 frames. ], batch size: 73, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:22:38,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2232810.0, ans=0.0 2024-08-13 17:22:45,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2232810.0, ans=0.1 2024-08-13 17:23:00,752 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2233010.0, ans=0.0 2024-08-13 17:23:06,410 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2233010.0, ans=0.125 2024-08-13 17:23:14,743 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2233110.0, ans=0.1 2024-08-13 17:23:18,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2233110.0, ans=0.0 2024-08-13 17:23:21,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2233110.0, ans=0.0 2024-08-13 17:23:21,853 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2233110.0, ans=0.1 2024-08-13 17:23:27,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2233110.0, ans=0.2 2024-08-13 17:23:29,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2233210.0, ans=0.125 2024-08-13 17:23:44,684 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 5950, loss[loss=0.1224, beats_loss=0.009642, ecapa_loss=0.0001788, whisper_loss=0.111, over 22659.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01093, ecapa_loss=0.0001634, whisper_loss=0.0906, over 3874186.76 frames. ], batch size: 89, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:23:47,954 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.20 vs. limit=10.0 2024-08-13 17:24:01,016 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.824e+01 2.407e+01 2.699e+01 3.092e+01 2.272e+02, threshold=5.398e+01, percent-clipped=4.0 2024-08-13 17:24:14,033 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 20 from LS+wenet, 19 from Vox, 16 fro AS 2024-08-13 17:24:15,724 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2233510.0, ans=0.09899494936611666 2024-08-13 17:24:31,102 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 27 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-13 17:24:33,086 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 17:24:35,660 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 17 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-13 17:24:52,008 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 27 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-13 17:24:57,917 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 6000, loss[loss=0.1017, beats_loss=0.009251, ecapa_loss=0.0001603, whisper_loss=0.09089, over 20970.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01092, ecapa_loss=0.0001647, whisper_loss=0.09092, over 3911871.83 frames. ], batch size: 83, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:24:57,919 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-13 17:25:32,947 INFO [train_multi_KD3.py:1149] (0/4) Epoch 16, validation on ASR_libri: loss=0.2531, beats_loss=0, ecapa_loss=0.0005624, whisper_loss=0.2475, over 922467.00 frames. 2024-08-13 17:25:52,001 INFO [train_multi_KD3.py:1149] (0/4) Epoch 16, validation on SV_voxceleb1: loss=0.004549, beats_loss=0, ecapa_loss=0.0004549, whisper_loss=0, over 939242.00 frames. 2024-08-13 17:26:07,608 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.1621, 3.9206, 4.0130, 4.0767], device='cuda:0') 2024-08-13 17:26:15,779 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.0964, 3.4709, 3.8269, 3.8812], device='cuda:0') 2024-08-13 17:27:33,648 INFO [train_multi_KD3.py:1149] (0/4) Epoch 16, validation on AT_audioset: loss=0.02369, beats_loss=0.02369, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 17:27:33,654 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32601MB 2024-08-13 17:27:41,209 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=2233810.0, ans=0.025 2024-08-13 17:27:50,578 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 34 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 17:27:55,127 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 24 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-13 17:28:09,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2234010.0, ans=0.0 2024-08-13 17:28:11,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2234010.0, ans=0.0 2024-08-13 17:28:21,913 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.25 vs. limit=15.0 2024-08-13 17:28:23,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2234110.0, ans=0.2 2024-08-13 17:28:24,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2234110.0, ans=0.125 2024-08-13 17:28:24,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2234110.0, ans=0.0 2024-08-13 17:28:33,013 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.74 vs. limit=15.0 2024-08-13 17:28:45,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2234210.0, ans=0.125 2024-08-13 17:28:47,975 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 6050, loss[loss=0.09192, beats_loss=0.01174, ecapa_loss=0.0001433, whisper_loss=0.07875, over 22776.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01093, ecapa_loss=0.0001637, whisper_loss=0.09103, over 3912233.63 frames. ], batch size: 91, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:29:06,653 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.686e+01 2.365e+01 2.582e+01 2.840e+01 3.927e+01, threshold=5.165e+01, percent-clipped=0.0 2024-08-13 17:29:16,472 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2234410.0, ans=0.0 2024-08-13 17:29:21,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2234510.0, ans=0.125 2024-08-13 17:29:22,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2234510.0, ans=0.125 2024-08-13 17:29:28,655 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 22 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-13 17:29:41,879 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2234610.0, ans=0.125 2024-08-13 17:30:03,705 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 6100, loss[loss=0.0905, beats_loss=0.01252, ecapa_loss=0.0001912, whisper_loss=0.07608, over 19103.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01092, ecapa_loss=0.0001639, whisper_loss=0.09098, over 3908340.34 frames. ], batch size: 81, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:30:21,823 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 21 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-13 17:30:23,183 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 28 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-13 17:30:23,430 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2234910.0, ans=0.05 2024-08-13 17:30:26,150 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2234910.0, ans=0.0 2024-08-13 17:30:27,366 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 33 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-13 17:30:47,706 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-13 17:30:57,656 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-13 17:30:58,736 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 35 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-13 17:31:00,451 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2235210.0, ans=0.09899494936611666 2024-08-13 17:31:06,338 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 25 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-13 17:31:15,446 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 6150, loss[loss=0.1292, beats_loss=0.008184, ecapa_loss=0.0001586, whisper_loss=0.1194, over 22883.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01087, ecapa_loss=0.0001645, whisper_loss=0.09148, over 3918091.05 frames. ], batch size: 88, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:31:27,586 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 35 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-13 17:31:31,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2235410.0, ans=0.125 2024-08-13 17:31:33,222 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.323e+01 2.638e+01 2.979e+01 5.632e+01, threshold=5.276e+01, percent-clipped=1.0 2024-08-13 17:31:35,543 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2235410.0, ans=0.0 2024-08-13 17:31:38,665 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 17:32:29,449 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 6200, loss[loss=0.09992, beats_loss=0.01228, ecapa_loss=0.0001588, whisper_loss=0.08606, over 18396.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01086, ecapa_loss=0.0001635, whisper_loss=0.09099, over 3931640.67 frames. ], batch size: 74, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:32:35,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2235810.0, ans=0.125 2024-08-13 17:33:06,320 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 17:33:18,973 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.33 vs. limit=15.0 2024-08-13 17:33:21,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2236110.0, ans=0.2 2024-08-13 17:33:21,548 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2236110.0, ans=0.125 2024-08-13 17:33:27,763 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 17:33:29,633 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.38 vs. limit=15.0 2024-08-13 17:33:32,784 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-13 17:33:36,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2236210.0, ans=0.0 2024-08-13 17:33:42,891 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-13 17:33:45,598 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 21 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-13 17:33:48,090 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 6250, loss[loss=0.1374, beats_loss=0.007867, ecapa_loss=0.0001717, whisper_loss=0.1278, over 23590.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01081, ecapa_loss=0.0001638, whisper_loss=0.09127, over 3930722.76 frames. ], batch size: 90, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:34:05,991 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.800e+01 2.429e+01 2.660e+01 2.868e+01 5.842e+01, threshold=5.321e+01, percent-clipped=1.0 2024-08-13 17:34:06,979 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.91 vs. limit=15.0 2024-08-13 17:34:08,079 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 25 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-13 17:34:08,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2236410.0, ans=0.125 2024-08-13 17:34:14,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2236410.0, ans=0.125 2024-08-13 17:34:30,260 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 17:34:40,120 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-13 17:35:02,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2236710.0, ans=0.0 2024-08-13 17:35:03,615 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2236810.0, ans=0.035 2024-08-13 17:35:04,450 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 6300, loss[loss=0.119, beats_loss=0.009816, ecapa_loss=0.0001609, whisper_loss=0.1076, over 23495.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01082, ecapa_loss=0.0001639, whisper_loss=0.09091, over 3903224.69 frames. ], batch size: 94, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:35:07,760 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2236810.0, ans=0.125 2024-08-13 17:35:12,741 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2236810.0, ans=0.125 2024-08-13 17:35:12,758 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2236810.0, ans=0.125 2024-08-13 17:35:27,587 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 14 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-13 17:35:30,967 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 38 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-13 17:35:47,501 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 19 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-13 17:35:50,373 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-13 17:36:15,451 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2237210.0, ans=0.125 2024-08-13 17:36:22,936 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 6350, loss[loss=0.09488, beats_loss=0.01064, ecapa_loss=0.0001597, whisper_loss=0.08264, over 22163.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01077, ecapa_loss=0.0001635, whisper_loss=0.09134, over 3902983.23 frames. ], batch size: 92, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:36:24,534 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-13 17:36:29,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2237310.0, ans=0.0 2024-08-13 17:36:41,498 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.59 vs. limit=15.0 2024-08-13 17:36:42,416 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.942e+01 2.465e+01 2.710e+01 3.055e+01 1.101e+02, threshold=5.419e+01, percent-clipped=2.0 2024-08-13 17:36:56,527 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2237510.0, ans=0.2 2024-08-13 17:37:03,436 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 38 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-13 17:37:12,405 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 32 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-13 17:37:27,349 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2237610.0, ans=0.125 2024-08-13 17:37:32,014 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 29 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-13 17:37:37,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2237710.0, ans=0.0 2024-08-13 17:37:39,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2237710.0, ans=0.0 2024-08-13 17:37:42,261 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.93 vs. limit=22.5 2024-08-13 17:37:44,656 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 17 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-13 17:37:44,956 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2237810.0, ans=0.125 2024-08-13 17:37:45,752 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 6400, loss[loss=0.09848, beats_loss=0.01201, ecapa_loss=0.0001387, whisper_loss=0.08509, over 15103.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01074, ecapa_loss=0.0001629, whisper_loss=0.09165, over 3907856.15 frames. ], batch size: 59, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:37:46,427 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.53 vs. limit=22.5 2024-08-13 17:38:00,472 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-13 17:38:13,757 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2237910.0, ans=0.125 2024-08-13 17:38:23,059 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.24 vs. limit=15.0 2024-08-13 17:38:30,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2238010.0, ans=0.125 2024-08-13 17:38:57,523 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 25 from LS+wenet, 31 from Vox, 25 fro AS 2024-08-13 17:39:03,218 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 6450, loss[loss=0.1322, beats_loss=0.009742, ecapa_loss=0.0001651, whisper_loss=0.1208, over 22915.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01086, ecapa_loss=0.0001626, whisper_loss=0.0914, over 3930548.19 frames. ], batch size: 87, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:39:04,871 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 35 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-13 17:39:05,121 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2238310.0, ans=0.07 2024-08-13 17:39:12,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2238310.0, ans=0.0 2024-08-13 17:39:22,729 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.769e+01 2.439e+01 2.707e+01 3.110e+01 4.905e+01, threshold=5.413e+01, percent-clipped=0.0 2024-08-13 17:39:28,448 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.75 vs. limit=15.0 2024-08-13 17:39:51,338 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.237e-01 2024-08-13 17:40:08,062 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 16 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-13 17:40:18,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2238710.0, ans=0.125 2024-08-13 17:40:19,178 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-13 17:40:22,090 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 6500, loss[loss=0.1192, beats_loss=0.01089, ecapa_loss=0.0001515, whisper_loss=0.1067, over 22497.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01081, ecapa_loss=0.0001629, whisper_loss=0.0922, over 3922822.83 frames. ], batch size: 88, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:40:35,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2238910.0, ans=0.05 2024-08-13 17:40:38,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2238910.0, ans=0.0 2024-08-13 17:40:39,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2238910.0, ans=0.1 2024-08-13 17:40:41,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2238910.0, ans=0.125 2024-08-13 17:40:52,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2239010.0, ans=0.125 2024-08-13 17:41:11,081 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 21 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-13 17:41:21,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2239110.0, ans=0.125 2024-08-13 17:41:27,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2239210.0, ans=0.125 2024-08-13 17:41:30,347 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 29 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-13 17:41:30,510 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2239210.0, ans=0.125 2024-08-13 17:41:39,276 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 6550, loss[loss=0.1151, beats_loss=0.01069, ecapa_loss=0.0001525, whisper_loss=0.1028, over 22549.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01094, ecapa_loss=0.0001632, whisper_loss=0.09178, over 3960973.19 frames. ], batch size: 89, lr: 3.92e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:41:57,807 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.916e+01 2.430e+01 2.695e+01 2.938e+01 3.674e+01, threshold=5.390e+01, percent-clipped=0.0 2024-08-13 17:41:57,999 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 14 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-13 17:42:13,482 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2239510.0, ans=0.125 2024-08-13 17:42:32,911 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2239610.0, ans=0.1 2024-08-13 17:42:34,288 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 23 from LS+wenet, 23 from Vox, 48 fro AS 2024-08-13 17:42:47,440 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2239710.0, ans=0.125 2024-08-13 17:42:57,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2239810.0, ans=0.04949747468305833 2024-08-13 17:42:58,593 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 6600, loss[loss=0.1298, beats_loss=0.008285, ecapa_loss=0.0001355, whisper_loss=0.1202, over 18484.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01095, ecapa_loss=0.0001626, whisper_loss=0.09168, over 3946829.60 frames. ], batch size: 68, lr: 3.92e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:43:28,397 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-224000.pt 2024-08-13 17:43:34,450 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 26 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-13 17:43:44,261 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 36 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-13 17:43:52,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2240110.0, ans=0.125 2024-08-13 17:43:53,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2240110.0, ans=0.09899494936611666 2024-08-13 17:43:59,321 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 32 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-13 17:44:02,608 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 20 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-13 17:44:12,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2240210.0, ans=0.04949747468305833 2024-08-13 17:44:16,060 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2240210.0, ans=0.05 2024-08-13 17:44:24,683 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 24 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-13 17:44:26,127 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 6650, loss[loss=0.1035, beats_loss=0.009649, ecapa_loss=0.0001943, whisper_loss=0.09189, over 18181.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01092, ecapa_loss=0.0001627, whisper_loss=0.09123, over 3963883.47 frames. ], batch size: 76, lr: 3.92e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:44:46,503 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.019e+01 2.433e+01 2.609e+01 2.879e+01 3.999e+01, threshold=5.218e+01, percent-clipped=0.0 2024-08-13 17:45:03,839 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 19 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-13 17:45:27,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2240610.0, ans=0.2 2024-08-13 17:45:50,204 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 6700, loss[loss=0.1101, beats_loss=0.009802, ecapa_loss=0.0002081, whisper_loss=0.09825, over 20778.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01088, ecapa_loss=0.0001629, whisper_loss=0.09111, over 3936374.41 frames. ], batch size: 87, lr: 3.92e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:46:06,140 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 15 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-13 17:46:07,882 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 21 from LS+wenet, 14 from Vox, 18 fro AS 2024-08-13 17:46:25,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2241010.0, ans=0.125 2024-08-13 17:46:29,223 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.38 vs. limit=15.0 2024-08-13 17:46:39,838 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.99 vs. limit=22.5 2024-08-13 17:46:41,349 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 20 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-13 17:46:51,730 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-13 17:47:12,348 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.18 vs. limit=10.0 2024-08-13 17:47:15,822 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 6750, loss[loss=0.1213, beats_loss=0.01085, ecapa_loss=0.0001776, whisper_loss=0.1086, over 18891.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01091, ecapa_loss=0.0001643, whisper_loss=0.09061, over 3875943.23 frames. ], batch size: 74, lr: 3.92e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:47:16,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2241310.0, ans=0.0 2024-08-13 17:47:19,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2241310.0, ans=0.2 2024-08-13 17:47:25,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2241310.0, ans=0.125 2024-08-13 17:47:29,259 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 22 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-13 17:47:37,469 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.480e+01 2.825e+01 3.141e+01 1.321e+02, threshold=5.651e+01, percent-clipped=2.0 2024-08-13 17:47:39,545 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2241410.0, ans=0.2 2024-08-13 17:47:45,163 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-13 17:48:03,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2241610.0, ans=0.125 2024-08-13 17:48:05,543 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2241610.0, ans=0.0 2024-08-13 17:48:25,852 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 14 from Vox, 43 fro AS 2024-08-13 17:48:38,729 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 6800, loss[loss=0.102, beats_loss=0.0107, ecapa_loss=0.0001544, whisper_loss=0.08977, over 16543.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01081, ecapa_loss=0.0001637, whisper_loss=0.09131, over 3885607.88 frames. ], batch size: 64, lr: 3.92e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:48:45,228 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 27 from LS+wenet, 16 from Vox, 51 fro AS 2024-08-13 17:48:49,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2241810.0, ans=0.125 2024-08-13 17:49:37,541 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2242110.0, ans=0.0 2024-08-13 17:49:40,672 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2242110.0, ans=0.125 2024-08-13 17:49:48,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2242210.0, ans=0.0 2024-08-13 17:49:58,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2242210.0, ans=0.2 2024-08-13 17:50:00,776 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 6850, loss[loss=0.1083, beats_loss=0.01118, ecapa_loss=0.0001729, whisper_loss=0.09537, over 22370.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01079, ecapa_loss=0.0001631, whisper_loss=0.09089, over 3860540.27 frames. ], batch size: 90, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 17:50:15,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2242410.0, ans=0.0 2024-08-13 17:50:20,687 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.786e+01 2.361e+01 2.636e+01 2.867e+01 1.284e+02, threshold=5.272e+01, percent-clipped=1.0 2024-08-13 17:50:28,267 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2242410.0, ans=0.07 2024-08-13 17:50:42,721 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-13 17:51:16,146 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 30 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-13 17:51:20,996 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 6900, loss[loss=0.1206, beats_loss=0.008051, ecapa_loss=0.0001658, whisper_loss=0.1109, over 14296.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01083, ecapa_loss=0.0001638, whisper_loss=0.09125, over 3892072.44 frames. ], batch size: 55, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 17:51:26,003 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 23 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-13 17:51:54,894 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-13 17:51:55,485 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.28 vs. limit=15.0 2024-08-13 17:51:56,159 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-13 17:52:03,711 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2243010.0, ans=0.1 2024-08-13 17:52:04,857 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2243010.0, ans=0.2 2024-08-13 17:52:04,971 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2243010.0, ans=0.1 2024-08-13 17:52:11,667 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 32 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-13 17:52:42,060 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 6950, loss[loss=0.1152, beats_loss=0.009818, ecapa_loss=0.0001711, whisper_loss=0.1037, over 17020.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01086, ecapa_loss=0.0001626, whisper_loss=0.09154, over 3906094.21 frames. ], batch size: 65, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 17:52:52,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2243310.0, ans=0.0 2024-08-13 17:53:02,732 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.690e+01 2.338e+01 2.546e+01 2.937e+01 5.530e+01, threshold=5.093e+01, percent-clipped=1.0 2024-08-13 17:53:03,227 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 24 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-13 17:53:07,691 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-13 17:53:07,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2243410.0, ans=0.0 2024-08-13 17:53:33,796 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 25 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-13 17:54:03,724 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 7000, loss[loss=0.1167, beats_loss=0.01045, ecapa_loss=0.0001703, whisper_loss=0.1045, over 15030.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01085, ecapa_loss=0.0001633, whisper_loss=0.09181, over 3901521.88 frames. ], batch size: 57, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 17:54:05,466 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 26 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-13 17:54:05,752 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2243810.0, ans=0.125 2024-08-13 17:54:13,971 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2243810.0, ans=0.5 2024-08-13 17:54:37,278 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2244010.0, ans=0.125 2024-08-13 17:54:40,350 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 34 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-13 17:54:47,117 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=15.0 2024-08-13 17:55:09,150 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-13 17:55:22,577 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-13 17:55:23,163 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.38 vs. limit=15.0 2024-08-13 17:55:26,847 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 7050, loss[loss=0.1039, beats_loss=0.008545, ecapa_loss=0.000178, whisper_loss=0.0936, over 22594.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01086, ecapa_loss=0.0001644, whisper_loss=0.09147, over 3951043.74 frames. ], batch size: 93, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 17:55:26,999 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 25 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-13 17:55:33,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2244310.0, ans=0.125 2024-08-13 17:55:33,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2244310.0, ans=0.125 2024-08-13 17:55:39,780 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 21 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-13 17:55:48,082 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.036e+01 2.461e+01 2.675e+01 2.991e+01 1.291e+02, threshold=5.351e+01, percent-clipped=1.0 2024-08-13 17:55:53,116 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2244410.0, ans=0.125 2024-08-13 17:56:04,584 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 17:56:15,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2244610.0, ans=0.125 2024-08-13 17:56:16,832 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.83 vs. limit=22.5 2024-08-13 17:56:33,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2244710.0, ans=0.04949747468305833 2024-08-13 17:56:35,934 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-13 17:56:46,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2244810.0, ans=0.1 2024-08-13 17:56:47,667 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 7100, loss[loss=0.09867, beats_loss=0.006595, ecapa_loss=0.0002089, whisper_loss=0.08999, over 14680.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01091, ecapa_loss=0.000163, whisper_loss=0.09035, over 3897204.65 frames. ], batch size: 58, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 17:56:54,406 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 23 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-13 17:56:59,598 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2244810.0, ans=0.125 2024-08-13 17:57:09,035 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 17 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-13 17:57:38,336 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2245110.0, ans=0.0 2024-08-13 17:57:46,317 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 27 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-13 17:57:48,024 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2245110.0, ans=0.125 2024-08-13 17:58:02,925 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2245210.0, ans=0.0 2024-08-13 17:58:08,754 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 7150, loss[loss=0.1041, beats_loss=0.01067, ecapa_loss=0.0001583, whisper_loss=0.09189, over 23700.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01091, ecapa_loss=0.0001643, whisper_loss=0.09018, over 3898325.41 frames. ], batch size: 94, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 17:58:09,227 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-13 17:58:15,634 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 26 from LS+wenet, 9 from Vox, 22 fro AS 2024-08-13 17:58:19,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2245310.0, ans=0.125 2024-08-13 17:58:20,613 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 18 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-13 17:58:31,356 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.896e+01 2.399e+01 2.676e+01 3.068e+01 5.307e+01, threshold=5.353e+01, percent-clipped=0.0 2024-08-13 17:58:31,506 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 20 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-13 17:58:43,698 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=2245510.0, ans=0.02 2024-08-13 17:58:45,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2245510.0, ans=0.125 2024-08-13 17:59:01,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2245610.0, ans=0.1 2024-08-13 17:59:07,806 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-13 17:59:08,984 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-13 17:59:26,654 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 26 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-13 17:59:30,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2245710.0, ans=0.0 2024-08-13 17:59:33,174 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 7200, loss[loss=0.1018, beats_loss=0.01049, ecapa_loss=0.0001752, whisper_loss=0.08958, over 23176.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01086, ecapa_loss=0.0001639, whisper_loss=0.09129, over 3913743.39 frames. ], batch size: 93, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 17:59:33,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2245810.0, ans=0.125 2024-08-13 17:59:59,213 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2245910.0, ans=0.07 2024-08-13 18:00:17,084 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2246010.0, ans=0.5 2024-08-13 18:00:17,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2246010.0, ans=0.1 2024-08-13 18:00:26,226 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 18:00:46,427 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.56 vs. limit=15.0 2024-08-13 18:00:47,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2246210.0, ans=0.1 2024-08-13 18:00:53,819 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 7250, loss[loss=0.0997, beats_loss=0.01042, ecapa_loss=0.0001534, whisper_loss=0.08775, over 22038.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01086, ecapa_loss=0.0001632, whisper_loss=0.09139, over 3932784.72 frames. ], batch size: 90, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:01:02,628 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 28 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-13 18:01:05,740 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 25 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-13 18:01:12,502 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.00 vs. limit=15.0 2024-08-13 18:01:15,019 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.504e+01 2.815e+01 3.088e+01 1.145e+02, threshold=5.629e+01, percent-clipped=1.0 2024-08-13 18:01:26,236 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-13 18:01:42,281 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 33 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-13 18:02:08,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2246710.0, ans=0.125 2024-08-13 18:02:14,760 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.28 vs. limit=6.0 2024-08-13 18:02:15,654 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 7300, loss[loss=0.0961, beats_loss=0.01289, ecapa_loss=0.0001379, whisper_loss=0.08184, over 17740.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01083, ecapa_loss=0.0001627, whisper_loss=0.09123, over 3879407.77 frames. ], batch size: 71, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:02:36,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2246910.0, ans=0.5 2024-08-13 18:02:44,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2246910.0, ans=0.125 2024-08-13 18:02:52,672 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 23 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-13 18:02:54,138 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 17 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-13 18:02:59,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2247010.0, ans=0.1 2024-08-13 18:03:02,895 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2247110.0, ans=0.0 2024-08-13 18:03:03,220 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.74 vs. limit=15.0 2024-08-13 18:03:09,299 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.53 vs. limit=15.0 2024-08-13 18:03:17,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2247110.0, ans=0.125 2024-08-13 18:03:23,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2247210.0, ans=0.2 2024-08-13 18:03:25,617 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2247210.0, ans=0.1 2024-08-13 18:03:27,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2247210.0, ans=0.1 2024-08-13 18:03:36,857 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 7350, loss[loss=0.11, beats_loss=0.009282, ecapa_loss=0.0001609, whisper_loss=0.09907, over 22533.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0108, ecapa_loss=0.0001633, whisper_loss=0.0909, over 3853532.49 frames. ], batch size: 90, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:03:44,642 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 15 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-13 18:03:54,500 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 25 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-13 18:03:58,282 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.974e+01 2.425e+01 2.697e+01 3.109e+01 4.252e+01, threshold=5.395e+01, percent-clipped=0.0 2024-08-13 18:04:14,237 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 31 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-13 18:04:17,146 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2247510.0, ans=0.1 2024-08-13 18:04:18,914 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2247510.0, ans=0.125 2024-08-13 18:04:48,652 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2247710.0, ans=0.0 2024-08-13 18:04:48,738 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2247710.0, ans=0.125 2024-08-13 18:04:59,185 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 7400, loss[loss=0.09159, beats_loss=0.009399, ecapa_loss=0.0001609, whisper_loss=0.08058, over 14494.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01082, ecapa_loss=0.0001633, whisper_loss=0.09129, over 3874722.56 frames. ], batch size: 57, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:05:54,116 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.76 vs. limit=15.0 2024-08-13 18:05:55,332 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 26 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-13 18:05:55,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2248110.0, ans=0.125 2024-08-13 18:06:17,272 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 7450, loss[loss=0.1182, beats_loss=0.009898, ecapa_loss=0.0001657, whisper_loss=0.1067, over 16731.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01082, ecapa_loss=0.0001632, whisper_loss=0.09099, over 3880430.66 frames. ], batch size: 66, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:06:21,074 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-13 18:06:37,369 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.941e+01 2.475e+01 2.713e+01 3.023e+01 4.384e+01, threshold=5.427e+01, percent-clipped=0.0 2024-08-13 18:06:58,707 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 28 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-13 18:07:03,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2248610.0, ans=0.025 2024-08-13 18:07:06,156 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.03 vs. limit=15.0 2024-08-13 18:07:07,150 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 41 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-13 18:07:32,935 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 27 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-13 18:07:37,742 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 7500, loss[loss=0.1045, beats_loss=0.01209, ecapa_loss=0.0001837, whisper_loss=0.09059, over 21218.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01079, ecapa_loss=0.0001627, whisper_loss=0.09184, over 3892782.29 frames. ], batch size: 91, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:07:42,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2248810.0, ans=0.125 2024-08-13 18:07:57,901 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 16 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-13 18:08:06,035 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 27 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-13 18:08:17,878 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 23 from LS+wenet, 10 from Vox, 25 fro AS 2024-08-13 18:08:23,361 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.95 vs. limit=15.0 2024-08-13 18:08:31,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2249110.0, ans=0.1 2024-08-13 18:08:37,539 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2249110.0, ans=0.2 2024-08-13 18:08:54,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2249210.0, ans=0.125 2024-08-13 18:08:58,562 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 7550, loss[loss=0.1014, beats_loss=0.0111, ecapa_loss=0.0001778, whisper_loss=0.08853, over 17668.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01076, ecapa_loss=0.0001627, whisper_loss=0.09189, over 3877211.53 frames. ], batch size: 73, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:09:19,251 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.754e+01 2.370e+01 2.691e+01 3.011e+01 5.049e+01, threshold=5.381e+01, percent-clipped=0.0 2024-08-13 18:10:01,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2249710.0, ans=0.04949747468305833 2024-08-13 18:10:05,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=2249710.0, ans=0.02 2024-08-13 18:10:17,705 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 7600, loss[loss=0.1005, beats_loss=0.01038, ecapa_loss=0.0001201, whisper_loss=0.08891, over 15101.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01076, ecapa_loss=0.0001633, whisper_loss=0.09203, over 3880196.61 frames. ], batch size: 57, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:10:32,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2249910.0, ans=0.125 2024-08-13 18:11:06,810 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 15 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-13 18:11:08,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2250110.0, ans=0.125 2024-08-13 18:11:15,429 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 28 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-13 18:11:23,617 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2250210.0, ans=0.125 2024-08-13 18:11:32,390 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.79 vs. limit=15.0 2024-08-13 18:11:38,300 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 7650, loss[loss=0.0844, beats_loss=0.01159, ecapa_loss=0.0001623, whisper_loss=0.07119, over 15689.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01068, ecapa_loss=0.0001636, whisper_loss=0.09182, over 3875363.42 frames. ], batch size: 62, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:11:38,648 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2250310.0, ans=0.125 2024-08-13 18:11:39,763 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 18 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-13 18:11:41,523 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 24 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-13 18:11:57,716 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.818e+01 2.418e+01 2.664e+01 3.048e+01 4.401e+01, threshold=5.328e+01, percent-clipped=0.0 2024-08-13 18:12:07,918 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-13 18:12:24,820 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 23 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-13 18:12:28,892 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 14 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-13 18:12:48,259 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 17 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-13 18:12:52,373 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 7700, loss[loss=0.09935, beats_loss=0.01146, ecapa_loss=0.0001485, whisper_loss=0.08641, over 23638.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01073, ecapa_loss=0.0001636, whisper_loss=0.09158, over 3883617.54 frames. ], batch size: 95, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:13:09,209 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 20 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-13 18:13:20,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2250910.0, ans=0.125 2024-08-13 18:13:21,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2251010.0, ans=0.125 2024-08-13 18:13:28,168 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2251010.0, ans=0.125 2024-08-13 18:13:31,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2251010.0, ans=0.95 2024-08-13 18:14:05,654 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 7750, loss[loss=0.08218, beats_loss=0.01123, ecapa_loss=0.0001878, whisper_loss=0.06908, over 20001.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01084, ecapa_loss=0.0001628, whisper_loss=0.09051, over 3865290.19 frames. ], batch size: 86, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:14:06,074 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2251310.0, ans=0.125 2024-08-13 18:14:08,988 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2251310.0, ans=0.1 2024-08-13 18:14:20,274 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 28 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-13 18:14:23,341 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2251410.0, ans=0.2 2024-08-13 18:14:24,171 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.992e+01 2.510e+01 2.727e+01 3.092e+01 1.354e+02, threshold=5.455e+01, percent-clipped=2.0 2024-08-13 18:14:25,942 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2251410.0, ans=0.125 2024-08-13 18:14:36,827 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.09 vs. limit=6.0 2024-08-13 18:14:39,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2251510.0, ans=0.1 2024-08-13 18:15:14,490 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.62 vs. limit=15.0 2024-08-13 18:15:16,992 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.793e-01 2024-08-13 18:15:17,776 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 7800, loss[loss=0.1029, beats_loss=0.00944, ecapa_loss=0.0001642, whisper_loss=0.09186, over 18075.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01073, ecapa_loss=0.0001629, whisper_loss=0.09132, over 3869957.07 frames. ], batch size: 70, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:15:37,891 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2251910.0, ans=0.1 2024-08-13 18:15:46,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2251910.0, ans=0.125 2024-08-13 18:15:53,677 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 22 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-13 18:15:53,910 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2252010.0, ans=0.125 2024-08-13 18:16:08,629 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-13 18:16:16,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2252210.0, ans=0.125 2024-08-13 18:16:30,393 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 7850, loss[loss=0.07803, beats_loss=0.01168, ecapa_loss=0.0001594, whisper_loss=0.06476, over 19751.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01079, ecapa_loss=0.0001636, whisper_loss=0.09091, over 3875321.91 frames. ], batch size: 79, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:16:30,732 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 22 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-13 18:16:30,997 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2252310.0, ans=0.125 2024-08-13 18:16:32,349 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 24 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-13 18:16:44,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2252410.0, ans=0.125 2024-08-13 18:16:44,768 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2252410.0, ans=0.125 2024-08-13 18:16:48,525 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.348e+01 2.635e+01 3.053e+01 4.732e+01, threshold=5.269e+01, percent-clipped=0.0 2024-08-13 18:16:51,904 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2252410.0, ans=0.125 2024-08-13 18:17:10,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2252510.0, ans=0.125 2024-08-13 18:17:29,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2252710.0, ans=0.125 2024-08-13 18:17:32,815 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.75 vs. limit=15.0 2024-08-13 18:17:35,495 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2252710.0, ans=0.125 2024-08-13 18:17:41,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2252710.0, ans=0.125 2024-08-13 18:17:42,560 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 30 from Vox, 28 fro AS 2024-08-13 18:17:44,072 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 7900, loss[loss=0.1096, beats_loss=0.008186, ecapa_loss=0.0001834, whisper_loss=0.09955, over 21321.00 frames. ], tot_loss[loss=0.104, beats_loss=0.0108, ecapa_loss=0.0001633, whisper_loss=0.09157, over 3868150.05 frames. ], batch size: 88, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:17:47,212 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 23 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-13 18:17:51,636 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=2252810.0, ans=0.025 2024-08-13 18:17:59,789 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 33 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-13 18:18:11,480 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2253010.0, ans=0.125 2024-08-13 18:18:14,417 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2253010.0, ans=0.125 2024-08-13 18:18:18,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2253010.0, ans=0.2 2024-08-13 18:18:23,886 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.18 vs. limit=15.0 2024-08-13 18:18:28,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2253110.0, ans=0.125 2024-08-13 18:18:32,644 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 27 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-13 18:18:36,835 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 22 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-13 18:18:38,496 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 23 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-13 18:18:41,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2253210.0, ans=0.125 2024-08-13 18:18:50,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2253210.0, ans=0.125 2024-08-13 18:18:53,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2253210.0, ans=0.0 2024-08-13 18:18:57,425 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 7950, loss[loss=0.09036, beats_loss=0.01128, ecapa_loss=0.0001836, whisper_loss=0.07724, over 17275.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01076, ecapa_loss=0.0001627, whisper_loss=0.09229, over 3900352.03 frames. ], batch size: 72, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:19:12,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2253410.0, ans=0.2 2024-08-13 18:19:12,704 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.39 vs. limit=10.0 2024-08-13 18:19:15,912 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.363e+01 2.645e+01 3.044e+01 5.205e+01, threshold=5.290e+01, percent-clipped=0.0 2024-08-13 18:19:27,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2253510.0, ans=0.125 2024-08-13 18:19:36,143 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2253510.0, ans=0.125 2024-08-13 18:19:39,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2253510.0, ans=0.125 2024-08-13 18:19:54,567 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-13 18:20:06,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2253710.0, ans=0.125 2024-08-13 18:20:08,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2253710.0, ans=0.05 2024-08-13 18:20:13,348 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 8000, loss[loss=0.08014, beats_loss=0.01237, ecapa_loss=0.0001398, whisper_loss=0.06637, over 16747.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01079, ecapa_loss=0.0001609, whisper_loss=0.09202, over 3899806.09 frames. ], batch size: 68, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:20:17,872 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-13 18:20:36,053 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-13 18:20:38,976 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 19 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-13 18:20:39,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2253910.0, ans=0.125 2024-08-13 18:21:00,828 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 30 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-13 18:21:26,852 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 8050, loss[loss=0.09754, beats_loss=0.009088, ecapa_loss=0.0001478, whisper_loss=0.08697, over 15417.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01078, ecapa_loss=0.0001598, whisper_loss=0.09206, over 3886560.67 frames. ], batch size: 57, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:21:31,078 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-13 18:21:41,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2254410.0, ans=0.1 2024-08-13 18:21:42,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2254410.0, ans=0.125 2024-08-13 18:21:43,899 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.79 vs. limit=15.0 2024-08-13 18:21:46,289 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.332e+01 2.558e+01 3.003e+01 5.582e+01, threshold=5.115e+01, percent-clipped=1.0 2024-08-13 18:21:54,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2254510.0, ans=0.07 2024-08-13 18:22:01,941 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-13 18:22:04,544 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 23 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-13 18:22:09,294 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2254610.0, ans=0.125 2024-08-13 18:22:12,010 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2254610.0, ans=0.0 2024-08-13 18:22:21,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2254610.0, ans=0.125 2024-08-13 18:22:26,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2254710.0, ans=0.0 2024-08-13 18:22:26,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2254710.0, ans=0.0 2024-08-13 18:22:38,355 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 8100, loss[loss=0.1006, beats_loss=0.009633, ecapa_loss=0.0001412, whisper_loss=0.08959, over 19800.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01084, ecapa_loss=0.0001607, whisper_loss=0.09149, over 3874890.45 frames. ], batch size: 76, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:22:51,240 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-13 18:22:52,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2254910.0, ans=0.125 2024-08-13 18:22:54,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2254910.0, ans=0.0 2024-08-13 18:22:56,381 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 31 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-13 18:23:00,931 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-13 18:23:08,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2255010.0, ans=0.0 2024-08-13 18:23:12,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2255010.0, ans=0.0 2024-08-13 18:23:34,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2255210.0, ans=0.0 2024-08-13 18:23:40,409 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2255210.0, ans=0.125 2024-08-13 18:23:49,975 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 8150, loss[loss=0.1052, beats_loss=0.009559, ecapa_loss=0.0001697, whisper_loss=0.09391, over 21739.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01082, ecapa_loss=0.0001609, whisper_loss=0.09119, over 3871443.91 frames. ], batch size: 86, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:24:04,572 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-13 18:24:09,229 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.440e+01 2.841e+01 3.164e+01 5.500e+01, threshold=5.681e+01, percent-clipped=1.0 2024-08-13 18:24:10,766 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 16 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-13 18:24:24,525 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 24 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-13 18:24:30,105 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 22 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-13 18:24:32,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2255610.0, ans=0.1 2024-08-13 18:24:43,488 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-13 18:24:43,840 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2255610.0, ans=0.125 2024-08-13 18:24:46,819 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2255710.0, ans=0.125 2024-08-13 18:24:48,396 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 31 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-13 18:24:55,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2255710.0, ans=0.2 2024-08-13 18:24:58,280 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-13 18:25:02,907 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 8200, loss[loss=0.1026, beats_loss=0.01052, ecapa_loss=0.0001458, whisper_loss=0.09061, over 16904.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01081, ecapa_loss=0.0001608, whisper_loss=0.09119, over 3890714.63 frames. ], batch size: 66, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:25:08,016 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.63 vs. limit=22.5 2024-08-13 18:25:54,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2256110.0, ans=0.0 2024-08-13 18:26:13,281 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2256310.0, ans=0.0 2024-08-13 18:26:14,163 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 8250, loss[loss=0.08833, beats_loss=0.01404, ecapa_loss=0.0001146, whisper_loss=0.07314, over 23508.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01084, ecapa_loss=0.0001606, whisper_loss=0.09139, over 3912499.25 frames. ], batch size: 92, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:26:17,118 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-13 18:26:17,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2256310.0, ans=0.0 2024-08-13 18:26:27,401 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=4.546e-02 2024-08-13 18:26:32,168 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.899e+01 2.303e+01 2.576e+01 2.826e+01 3.811e+01, threshold=5.152e+01, percent-clipped=0.0 2024-08-13 18:26:33,004 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 23 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-13 18:26:38,222 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-13 18:26:42,661 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 30 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-13 18:26:45,617 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-13 18:26:47,040 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-13 18:27:14,411 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2256710.0, ans=0.2 2024-08-13 18:27:22,720 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2256710.0, ans=0.125 2024-08-13 18:27:22,892 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.37 vs. limit=15.0 2024-08-13 18:27:24,815 WARNING [optim.py:496] (0/4) Scaling gradients by 0.04663357511162758, model_norm_threshold=51.52228546142578 2024-08-13 18:27:25,054 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.2.self_attn_weights.linear_pos.weight with proportion 0.95, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.156e+06, grad_sumsq=1.333e+05, orig_rms_sq=8.675e+00 2024-08-13 18:27:25,082 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 8300, loss[loss=0.1033, beats_loss=0.01248, ecapa_loss=0.0001363, whisper_loss=0.0895, over 21435.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01093, ecapa_loss=0.0001608, whisper_loss=0.0909, over 3923843.88 frames. ], batch size: 86, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:27:25,592 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.69 vs. limit=15.0 2024-08-13 18:27:28,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2256810.0, ans=0.125 2024-08-13 18:27:35,600 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2256810.0, ans=0.125 2024-08-13 18:27:39,270 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2256910.0, ans=0.0 2024-08-13 18:27:46,000 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=5.776e-01 2024-08-13 18:27:46,884 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 35 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-13 18:27:53,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2257010.0, ans=0.125 2024-08-13 18:27:54,760 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 19 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-13 18:27:56,538 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 18:27:56,799 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.96 vs. limit=15.0 2024-08-13 18:28:00,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2257010.0, ans=0.2 2024-08-13 18:28:11,995 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 27 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-13 18:28:14,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2257110.0, ans=0.0 2024-08-13 18:28:18,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2257210.0, ans=0.1 2024-08-13 18:28:21,027 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 14 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-13 18:28:27,722 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.84 vs. limit=15.0 2024-08-13 18:28:30,653 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 8350, loss[loss=0.1166, beats_loss=0.009189, ecapa_loss=0.0001891, whisper_loss=0.1056, over 22377.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01092, ecapa_loss=0.000162, whisper_loss=0.09074, over 3932110.61 frames. ], batch size: 91, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:28:33,608 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2257310.0, ans=0.125 2024-08-13 18:28:37,934 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.02 vs. limit=15.0 2024-08-13 18:28:42,547 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 19 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-13 18:28:44,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2257410.0, ans=0.2 2024-08-13 18:28:47,993 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.980e+01 2.502e+01 2.789e+01 3.217e+01 1.105e+03, threshold=5.579e+01, percent-clipped=3.0 2024-08-13 18:29:16,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2257610.0, ans=0.2 2024-08-13 18:29:25,815 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2257710.0, ans=0.0 2024-08-13 18:29:33,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2257710.0, ans=0.125 2024-08-13 18:29:35,838 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 8400, loss[loss=0.1075, beats_loss=0.01007, ecapa_loss=0.0001409, whisper_loss=0.09599, over 15555.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0109, ecapa_loss=0.0001623, whisper_loss=0.0908, over 3887246.46 frames. ], batch size: 59, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:29:51,968 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2257910.0, ans=0.04949747468305833 2024-08-13 18:29:53,304 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2257910.0, ans=0.2 2024-08-13 18:30:03,726 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 18:30:04,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2258010.0, ans=0.125 2024-08-13 18:30:08,667 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 15 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-13 18:30:11,603 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 14 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-13 18:30:14,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2258010.0, ans=0.05 2024-08-13 18:30:35,071 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 17 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-13 18:30:42,838 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 8450, loss[loss=0.1262, beats_loss=0.008712, ecapa_loss=0.0001783, whisper_loss=0.1157, over 23613.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01086, ecapa_loss=0.0001628, whisper_loss=0.09048, over 3865096.35 frames. ], batch size: 94, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:30:43,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2258310.0, ans=0.0 2024-08-13 18:30:55,854 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.68 vs. limit=8.0 2024-08-13 18:30:59,954 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.752e+01 2.525e+01 2.749e+01 3.077e+01 1.697e+02, threshold=5.498e+01, percent-clipped=1.0 2024-08-13 18:31:02,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2258410.0, ans=0.1 2024-08-13 18:31:07,705 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 20 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-13 18:31:14,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2258510.0, ans=0.125 2024-08-13 18:31:18,307 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 18 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-13 18:31:28,127 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.16 vs. limit=15.0 2024-08-13 18:31:39,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2258710.0, ans=0.2 2024-08-13 18:31:48,373 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 8500, loss[loss=0.0883, beats_loss=0.01045, ecapa_loss=0.0001705, whisper_loss=0.07615, over 16015.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01085, ecapa_loss=0.0001628, whisper_loss=0.08998, over 3886542.59 frames. ], batch size: 65, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:31:52,279 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2258810.0, ans=0.2 2024-08-13 18:32:16,151 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-13 18:32:29,522 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 27 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-13 18:32:45,658 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.02 vs. limit=15.0 2024-08-13 18:32:48,123 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.99 vs. limit=10.0 2024-08-13 18:32:52,012 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.299e+01 2024-08-13 18:32:54,189 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 8550, loss[loss=0.08624, beats_loss=0.009749, ecapa_loss=0.000166, whisper_loss=0.07483, over 16548.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01079, ecapa_loss=0.0001622, whisper_loss=0.09084, over 3890674.36 frames. ], batch size: 66, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:33:03,279 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 21 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-13 18:33:07,616 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.62 vs. limit=15.0 2024-08-13 18:33:10,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2259410.0, ans=0.125 2024-08-13 18:33:10,174 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2259410.0, ans=0.1 2024-08-13 18:33:10,948 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.989e+01 2.379e+01 2.646e+01 2.938e+01 4.520e+01, threshold=5.292e+01, percent-clipped=0.0 2024-08-13 18:33:26,571 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2259510.0, ans=0.2 2024-08-13 18:33:32,622 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.62 vs. limit=15.0 2024-08-13 18:33:33,439 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 15 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-13 18:33:36,000 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-13 18:33:39,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=2259610.0, ans=15.0 2024-08-13 18:33:46,300 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2259710.0, ans=0.2 2024-08-13 18:33:56,964 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.23 vs. limit=15.0 2024-08-13 18:33:58,730 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 8600, loss[loss=0.1181, beats_loss=0.009065, ecapa_loss=0.0001537, whisper_loss=0.1075, over 18094.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01082, ecapa_loss=0.000162, whisper_loss=0.09089, over 3905749.81 frames. ], batch size: 67, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:34:03,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2259810.0, ans=0.125 2024-08-13 18:34:18,394 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 16 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-13 18:34:31,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2260010.0, ans=0.0 2024-08-13 18:34:32,460 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 14 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-13 18:34:50,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2260110.0, ans=0.125 2024-08-13 18:35:05,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2260310.0, ans=0.0 2024-08-13 18:35:06,604 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 8650, loss[loss=0.07621, beats_loss=0.01364, ecapa_loss=0.0001809, whisper_loss=0.06076, over 15976.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01087, ecapa_loss=0.0001628, whisper_loss=0.09018, over 3891622.89 frames. ], batch size: 69, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:35:24,163 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.838e+01 2.420e+01 2.581e+01 2.912e+01 4.652e+01, threshold=5.162e+01, percent-clipped=0.0 2024-08-13 18:35:27,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2260410.0, ans=0.0 2024-08-13 18:35:31,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2260410.0, ans=0.125 2024-08-13 18:35:32,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2260510.0, ans=0.0 2024-08-13 18:35:45,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2260610.0, ans=0.0 2024-08-13 18:35:53,089 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2260610.0, ans=0.1 2024-08-13 18:35:57,786 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 24 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-13 18:36:14,676 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 8700, loss[loss=0.09811, beats_loss=0.01262, ecapa_loss=0.0001255, whisper_loss=0.08423, over 17251.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01082, ecapa_loss=0.0001635, whisper_loss=0.09046, over 3858670.98 frames. ], batch size: 68, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:36:32,437 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 20 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-13 18:37:32,871 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 8750, loss[loss=0.08913, beats_loss=0.0133, ecapa_loss=0.0001362, whisper_loss=0.07447, over 15138.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01072, ecapa_loss=0.0001637, whisper_loss=0.09089, over 3866072.18 frames. ], batch size: 60, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:37:50,599 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.965e+01 2.392e+01 2.713e+01 3.025e+01 4.261e+01, threshold=5.425e+01, percent-clipped=0.0 2024-08-13 18:38:17,737 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2261610.0, ans=0.2 2024-08-13 18:38:23,246 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.51 vs. limit=22.5 2024-08-13 18:38:33,281 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2261610.0, ans=0.0 2024-08-13 18:38:34,522 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2261710.0, ans=0.125 2024-08-13 18:38:46,842 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 29 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-13 18:38:54,723 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 8800, loss[loss=0.1012, beats_loss=0.01053, ecapa_loss=0.0001815, whisper_loss=0.08884, over 15371.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01077, ecapa_loss=0.000163, whisper_loss=0.09113, over 3906279.98 frames. ], batch size: 64, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:38:56,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2261810.0, ans=0.1 2024-08-13 18:39:00,706 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 18 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-13 18:39:00,945 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2261810.0, ans=0.125 2024-08-13 18:39:22,954 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 23 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-13 18:39:58,098 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-13 18:40:04,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2262110.0, ans=0.0 2024-08-13 18:40:10,015 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 14 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-13 18:40:29,437 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 29 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-13 18:40:33,278 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 8850, loss[loss=0.1006, beats_loss=0.0116, ecapa_loss=0.0001444, whisper_loss=0.08752, over 23115.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01076, ecapa_loss=0.0001612, whisper_loss=0.09144, over 3888016.88 frames. ], batch size: 94, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:40:36,206 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.65 vs. limit=15.0 2024-08-13 18:40:39,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2262310.0, ans=0.95 2024-08-13 18:40:57,438 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.733e+01 2.352e+01 2.612e+01 3.083e+01 5.604e+01, threshold=5.223e+01, percent-clipped=1.0 2024-08-13 18:41:21,691 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 21 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-13 18:41:23,416 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 23 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-13 18:41:23,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2262510.0, ans=0.0 2024-08-13 18:41:32,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2262610.0, ans=0.125 2024-08-13 18:41:34,927 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 11 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-13 18:41:35,127 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2262610.0, ans=0.0 2024-08-13 18:41:55,134 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2262710.0, ans=0.0 2024-08-13 18:42:09,451 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 8900, loss[loss=0.11, beats_loss=0.01006, ecapa_loss=0.0001631, whisper_loss=0.09831, over 23014.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.0107, ecapa_loss=0.0001612, whisper_loss=0.09159, over 3866301.16 frames. ], batch size: 90, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:42:18,211 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2262810.0, ans=0.0 2024-08-13 18:42:24,257 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 20 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-13 18:42:50,328 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 16 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-13 18:42:50,995 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.86 vs. limit=15.0 2024-08-13 18:43:02,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2263010.0, ans=0.0 2024-08-13 18:43:21,836 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-13 18:43:33,207 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 8950, loss[loss=0.1054, beats_loss=0.01124, ecapa_loss=0.0001245, whisper_loss=0.09291, over 21131.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01079, ecapa_loss=0.0001602, whisper_loss=0.09111, over 3864452.91 frames. ], batch size: 84, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:43:33,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2263310.0, ans=0.0 2024-08-13 18:43:35,776 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-13 18:43:42,058 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2263310.0, ans=0.0 2024-08-13 18:43:48,668 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 18:43:49,622 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.772e+01 2.368e+01 2.604e+01 2.902e+01 4.386e+01, threshold=5.207e+01, percent-clipped=0.0 2024-08-13 18:44:05,465 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 23 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-13 18:44:27,232 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=19.49 vs. limit=15.0 2024-08-13 18:44:34,246 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 20 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-13 18:44:38,242 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 9000, loss[loss=0.1042, beats_loss=0.009801, ecapa_loss=0.0001545, whisper_loss=0.09289, over 20259.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01075, ecapa_loss=0.0001607, whisper_loss=0.0908, over 3843261.71 frames. ], batch size: 81, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:44:38,244 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-13 18:45:17,641 INFO [train_multi_KD3.py:1149] (0/4) Epoch 16, validation on ASR_libri: loss=0.2538, beats_loss=0, ecapa_loss=0.0005571, whisper_loss=0.2482, over 922467.00 frames. 2024-08-13 18:45:37,294 INFO [train_multi_KD3.py:1149] (0/4) Epoch 16, validation on SV_voxceleb1: loss=0.004514, beats_loss=0, ecapa_loss=0.0004514, whisper_loss=0, over 939242.00 frames. 2024-08-13 18:46:33,891 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.0057, 1.5657, 1.5922, 1.4880], device='cuda:0') 2024-08-13 18:47:17,665 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.6611, 2.5893, 2.2685, 1.7520, 2.1685, 2.0064, 2.4581, 2.3029], device='cuda:0') 2024-08-13 18:47:36,778 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.6789, 5.5159, 5.5708, 5.6235], device='cuda:0') 2024-08-13 18:47:39,569 INFO [train_multi_KD3.py:1149] (0/4) Epoch 16, validation on AT_audioset: loss=0.02381, beats_loss=0.02381, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 18:47:39,573 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32601MB 2024-08-13 18:48:03,978 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 31 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-13 18:48:05,793 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.84 vs. limit=15.0 2024-08-13 18:48:20,511 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2264110.0, ans=0.1 2024-08-13 18:48:24,114 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2264110.0, ans=0.125 2024-08-13 18:48:37,834 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 18 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-13 18:48:43,051 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.56 vs. limit=12.0 2024-08-13 18:48:47,717 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 9050, loss[loss=0.08665, beats_loss=0.01405, ecapa_loss=0.000119, whisper_loss=0.07141, over 23483.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01073, ecapa_loss=0.0001623, whisper_loss=0.0908, over 3815418.69 frames. ], batch size: 95, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:48:49,759 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.09 vs. limit=15.0 2024-08-13 18:49:01,067 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 15 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-13 18:49:05,183 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.411e+01 2.657e+01 3.042e+01 5.076e+01, threshold=5.314e+01, percent-clipped=0.0 2024-08-13 18:49:09,499 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 27 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-13 18:49:15,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2264510.0, ans=0.125 2024-08-13 18:49:16,546 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.84 vs. limit=10.0 2024-08-13 18:49:26,700 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2264510.0, ans=0.125 2024-08-13 18:49:26,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2264510.0, ans=0.0 2024-08-13 18:49:32,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2264610.0, ans=0.125 2024-08-13 18:49:52,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2264710.0, ans=0.125 2024-08-13 18:49:57,098 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 9100, loss[loss=0.1077, beats_loss=0.0115, ecapa_loss=0.0001949, whisper_loss=0.09426, over 20356.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01071, ecapa_loss=0.0001617, whisper_loss=0.09085, over 3809056.10 frames. ], batch size: 89, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:50:02,375 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-13 18:50:15,636 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2264910.0, ans=0.125 2024-08-13 18:50:17,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2264910.0, ans=0.2 2024-08-13 18:50:25,389 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.93 vs. limit=15.0 2024-08-13 18:50:29,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2265010.0, ans=0.125 2024-08-13 18:50:35,280 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.24 vs. limit=12.0 2024-08-13 18:50:39,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2265110.0, ans=0.0 2024-08-13 18:50:42,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2265110.0, ans=0.0 2024-08-13 18:50:43,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2265110.0, ans=0.1 2024-08-13 18:50:50,290 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 27 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-13 18:50:50,503 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2265110.0, ans=0.125 2024-08-13 18:50:50,528 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2265110.0, ans=0.0 2024-08-13 18:50:57,076 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2265210.0, ans=0.2 2024-08-13 18:51:07,386 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 9150, loss[loss=0.08925, beats_loss=0.009275, ecapa_loss=0.0002207, whisper_loss=0.07777, over 14746.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01071, ecapa_loss=0.0001625, whisper_loss=0.09067, over 3838770.28 frames. ], batch size: 63, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:51:18,230 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 23 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-13 18:51:19,686 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2265310.0, ans=0.125 2024-08-13 18:51:26,146 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.406e+01 2.793e+01 3.104e+01 4.161e+01, threshold=5.587e+01, percent-clipped=0.0 2024-08-13 18:51:34,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2265410.0, ans=0.125 2024-08-13 18:51:38,765 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 23 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-13 18:51:42,948 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 15 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-13 18:51:58,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2265610.0, ans=0.125 2024-08-13 18:52:00,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2265610.0, ans=0.0 2024-08-13 18:52:03,178 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-13 18:52:17,748 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 9200, loss[loss=0.1008, beats_loss=0.009857, ecapa_loss=0.000179, whisper_loss=0.08919, over 21458.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01073, ecapa_loss=0.000163, whisper_loss=0.09068, over 3852229.49 frames. ], batch size: 90, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:52:32,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2265910.0, ans=0.0 2024-08-13 18:52:56,164 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 18:53:12,120 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.94 vs. limit=22.5 2024-08-13 18:53:16,824 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-13 18:53:19,697 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2266210.0, ans=0.125 2024-08-13 18:53:24,388 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 9250, loss[loss=0.104, beats_loss=0.01259, ecapa_loss=0.000161, whisper_loss=0.08978, over 22428.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01077, ecapa_loss=0.0001634, whisper_loss=0.09029, over 3863346.91 frames. ], batch size: 93, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:53:31,254 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2266310.0, ans=0.1 2024-08-13 18:53:41,514 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.790e+01 2.413e+01 2.566e+01 3.082e+01 5.176e+01, threshold=5.131e+01, percent-clipped=0.0 2024-08-13 18:53:44,475 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-13 18:53:46,418 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.51 vs. limit=15.0 2024-08-13 18:53:58,505 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2266510.0, ans=0.125 2024-08-13 18:54:05,468 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 29 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-13 18:54:13,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2266610.0, ans=0.125 2024-08-13 18:54:13,508 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.40 vs. limit=15.0 2024-08-13 18:54:14,465 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2266610.0, ans=0.125 2024-08-13 18:54:32,031 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 9300, loss[loss=0.08456, beats_loss=0.01262, ecapa_loss=0.000135, whisper_loss=0.0706, over 21770.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01075, ecapa_loss=0.0001636, whisper_loss=0.09068, over 3901044.28 frames. ], batch size: 89, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:54:35,174 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2266810.0, ans=0.125 2024-08-13 18:54:49,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2266910.0, ans=0.125 2024-08-13 18:54:50,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2266910.0, ans=0.1 2024-08-13 18:55:02,947 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 16 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-13 18:55:08,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2267010.0, ans=0.2 2024-08-13 18:55:14,309 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 24 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-13 18:55:41,683 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 9350, loss[loss=0.09913, beats_loss=0.008749, ecapa_loss=0.0001938, whisper_loss=0.08845, over 20622.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01077, ecapa_loss=0.0001641, whisper_loss=0.09058, over 3879755.25 frames. ], batch size: 85, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:55:48,863 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 26 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-13 18:55:58,981 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.958e+01 2.391e+01 2.659e+01 2.911e+01 1.966e+02, threshold=5.317e+01, percent-clipped=2.0 2024-08-13 18:56:03,535 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-13 18:56:18,647 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 23 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-13 18:56:21,777 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2267610.0, ans=0.2 2024-08-13 18:56:24,655 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2267610.0, ans=0.0 2024-08-13 18:56:45,841 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.96 vs. limit=22.5 2024-08-13 18:56:48,881 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 9400, loss[loss=0.1112, beats_loss=0.009943, ecapa_loss=0.0001729, whisper_loss=0.0995, over 22803.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0108, ecapa_loss=0.0001634, whisper_loss=0.09057, over 3882587.20 frames. ], batch size: 92, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:56:50,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2267810.0, ans=0.125 2024-08-13 18:56:58,277 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 18:56:59,313 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 15 from LS+wenet, 32 from Vox, 40 fro AS 2024-08-13 18:57:04,580 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 15 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-13 18:57:15,742 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 18 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-13 18:57:18,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.whiten.whitening_limit, batch_count=2268010.0, ans=12.0 2024-08-13 18:57:21,029 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2268010.0, ans=0.05 2024-08-13 18:57:21,088 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2268010.0, ans=0.0 2024-08-13 18:57:25,996 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 27 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-13 18:57:26,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2268010.0, ans=0.0 2024-08-13 18:57:32,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2268110.0, ans=0.0 2024-08-13 18:57:35,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2268110.0, ans=0.2 2024-08-13 18:57:46,294 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 17 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-13 18:57:54,973 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 9450, loss[loss=0.1098, beats_loss=0.01207, ecapa_loss=0.0001549, whisper_loss=0.09615, over 22023.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01088, ecapa_loss=0.000163, whisper_loss=0.08995, over 3860395.63 frames. ], batch size: 89, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:57:55,276 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2268310.0, ans=0.07 2024-08-13 18:57:59,391 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.17 vs. limit=15.0 2024-08-13 18:58:00,660 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.77 vs. limit=6.0 2024-08-13 18:58:08,191 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 18 from LS+wenet, 27 from Vox, 24 fro AS 2024-08-13 18:58:12,043 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.364e+01 2.605e+01 2.951e+01 9.303e+01, threshold=5.211e+01, percent-clipped=2.0 2024-08-13 18:58:18,392 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 22 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-13 18:58:27,826 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2268510.0, ans=0.125 2024-08-13 18:58:35,721 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.93 vs. limit=10.0 2024-08-13 18:58:55,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2268710.0, ans=0.05 2024-08-13 18:59:00,209 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 9500, loss[loss=0.08886, beats_loss=0.009388, ecapa_loss=0.0001374, whisper_loss=0.0781, over 19677.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01091, ecapa_loss=0.0001614, whisper_loss=0.08973, over 3883344.95 frames. ], batch size: 74, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:59:27,451 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 21 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-13 18:59:27,699 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2269010.0, ans=0.1 2024-08-13 18:59:42,570 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.67 vs. limit=15.0 2024-08-13 19:00:06,419 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 9550, loss[loss=0.1194, beats_loss=0.00828, ecapa_loss=0.0002032, whisper_loss=0.1091, over 15635.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01095, ecapa_loss=0.0001624, whisper_loss=0.08925, over 3895688.86 frames. ], batch size: 61, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 19:00:23,776 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.741e+01 2.300e+01 2.521e+01 2.795e+01 4.846e+01, threshold=5.041e+01, percent-clipped=0.0 2024-08-13 19:00:25,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2269410.0, ans=0.125 2024-08-13 19:00:37,211 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2269510.0, ans=0.125 2024-08-13 19:00:40,691 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 21 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-13 19:00:43,549 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2269510.0, ans=0.125 2024-08-13 19:01:01,813 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2269710.0, ans=0.125 2024-08-13 19:01:07,799 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 19 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-13 19:01:11,695 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 9600, loss[loss=0.123, beats_loss=0.009262, ecapa_loss=0.0001437, whisper_loss=0.1123, over 16266.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01085, ecapa_loss=0.0001619, whisper_loss=0.08975, over 3858191.56 frames. ], batch size: 61, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 19:01:18,422 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2269810.0, ans=0.125 2024-08-13 19:01:19,636 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 28 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-13 19:01:21,432 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.01 vs. limit=15.0 2024-08-13 19:01:29,538 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 17 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-13 19:01:42,087 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2270010.0, ans=0.0 2024-08-13 19:01:55,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2270110.0, ans=0.125 2024-08-13 19:02:06,124 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 22 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-13 19:02:07,688 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2270210.0, ans=0.125 2024-08-13 19:02:14,197 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 24 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-13 19:02:16,661 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 9650, loss[loss=0.09736, beats_loss=0.01209, ecapa_loss=0.0001785, whisper_loss=0.08348, over 23904.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01073, ecapa_loss=0.0001639, whisper_loss=0.08987, over 3837374.36 frames. ], batch size: 97, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 19:02:19,251 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 26 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-13 19:02:31,320 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2270410.0, ans=0.0 2024-08-13 19:02:33,503 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.348e+01 2.592e+01 2.887e+01 4.146e+01, threshold=5.184e+01, percent-clipped=0.0 2024-08-13 19:02:43,711 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2270510.0, ans=0.5 2024-08-13 19:02:47,780 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.66 vs. limit=15.0 2024-08-13 19:02:49,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2270510.0, ans=0.1 2024-08-13 19:02:52,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2270510.0, ans=0.5 2024-08-13 19:02:55,359 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2270610.0, ans=0.0 2024-08-13 19:03:02,254 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.18 vs. limit=15.0 2024-08-13 19:03:11,922 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2270710.0, ans=0.0 2024-08-13 19:03:18,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=2270710.0, ans=10.0 2024-08-13 19:03:21,784 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 9700, loss[loss=0.1102, beats_loss=0.01155, ecapa_loss=0.0001655, whisper_loss=0.09702, over 19611.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01074, ecapa_loss=0.000163, whisper_loss=0.09064, over 3882132.26 frames. ], batch size: 76, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 19:03:35,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2270910.0, ans=0.125 2024-08-13 19:04:08,268 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.53 vs. limit=15.0 2024-08-13 19:04:08,834 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2271110.0, ans=0.125 2024-08-13 19:04:10,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2271110.0, ans=0.125 2024-08-13 19:04:24,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2271210.0, ans=0.1 2024-08-13 19:04:26,953 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 9750, loss[loss=0.116, beats_loss=0.01005, ecapa_loss=0.0001514, whisper_loss=0.1045, over 16875.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01075, ecapa_loss=0.000162, whisper_loss=0.09021, over 3842064.32 frames. ], batch size: 62, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 19:04:40,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2271410.0, ans=0.0 2024-08-13 19:04:43,864 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.462e+01 2.717e+01 3.058e+01 5.863e+01, threshold=5.433e+01, percent-clipped=1.0 2024-08-13 19:05:18,734 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.40 vs. limit=15.0 2024-08-13 19:05:23,924 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.69 vs. limit=12.0 2024-08-13 19:05:32,159 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 9800, loss[loss=0.06908, beats_loss=0.01414, ecapa_loss=0.0001431, whisper_loss=0.05351, over 15874.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01077, ecapa_loss=0.0001622, whisper_loss=0.09023, over 3841755.58 frames. ], batch size: 64, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 19:05:34,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2271810.0, ans=0.125 2024-08-13 19:05:34,939 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 17 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-13 19:06:00,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2272010.0, ans=0.125 2024-08-13 19:06:04,745 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2272010.0, ans=0.125 2024-08-13 19:06:07,085 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 12 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-13 19:06:18,239 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2272110.0, ans=0.0 2024-08-13 19:06:37,797 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 9850, loss[loss=0.09983, beats_loss=0.01207, ecapa_loss=0.000131, whisper_loss=0.08645, over 22605.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01071, ecapa_loss=0.0001632, whisper_loss=0.09056, over 3827083.94 frames. ], batch size: 90, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 19:06:40,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2272310.0, ans=0.125 2024-08-13 19:06:42,126 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2272310.0, ans=0.1 2024-08-13 19:06:44,623 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2272310.0, ans=0.125 2024-08-13 19:06:49,301 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 23 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-13 19:06:52,898 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.53 vs. limit=15.0 2024-08-13 19:06:54,552 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.864e+01 2.358e+01 2.690e+01 3.043e+01 6.098e+01, threshold=5.380e+01, percent-clipped=1.0 2024-08-13 19:07:00,193 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 35 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-13 19:07:12,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2272510.0, ans=0.0 2024-08-13 19:07:35,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2272710.0, ans=0.2 2024-08-13 19:07:39,778 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.10 vs. limit=10.0 2024-08-13 19:07:42,731 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 9900, loss[loss=0.1104, beats_loss=0.0106, ecapa_loss=0.0001572, whisper_loss=0.09819, over 22615.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01082, ecapa_loss=0.0001624, whisper_loss=0.09099, over 3873703.86 frames. ], batch size: 89, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 19:07:42,875 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 30 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-13 19:07:52,278 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.64 vs. limit=22.5 2024-08-13 19:07:57,406 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.11 vs. limit=15.0 2024-08-13 19:08:22,985 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2273110.0, ans=0.0 2024-08-13 19:08:30,779 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2273110.0, ans=0.0 2024-08-13 19:08:33,911 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.91 vs. limit=15.0 2024-08-13 19:08:43,567 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-13 19:08:47,040 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 9950, loss[loss=0.1088, beats_loss=0.01203, ecapa_loss=0.0001642, whisper_loss=0.09516, over 22475.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01084, ecapa_loss=0.0001621, whisper_loss=0.09111, over 3885217.02 frames. ], batch size: 92, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 19:08:56,367 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 38 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-13 19:09:04,081 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.022e+01 2.452e+01 2.692e+01 3.113e+01 1.874e+02, threshold=5.385e+01, percent-clipped=3.0 2024-08-13 19:09:09,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2273410.0, ans=0.0 2024-08-13 19:09:25,750 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 26 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-13 19:09:28,379 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 17 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-13 19:09:33,888 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2273610.0, ans=0.125 2024-08-13 19:09:42,198 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.75 vs. limit=15.0 2024-08-13 19:09:52,251 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 10000, loss[loss=0.1136, beats_loss=0.009955, ecapa_loss=0.0001566, whisper_loss=0.1021, over 13513.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01077, ecapa_loss=0.0001618, whisper_loss=0.09227, over 3895144.94 frames. ], batch size: 53, lr: 3.89e-03, grad_scale: 1.152921504606847e+18 2024-08-13 19:09:54,836 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 22 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-13 19:09:58,597 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2273810.0, ans=0.1 2024-08-13 19:10:00,411 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.31 vs. limit=15.0 2024-08-13 19:10:07,926 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.29 vs. limit=15.0 2024-08-13 19:10:10,461 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2273910.0, ans=0.0 2024-08-13 19:10:14,132 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2273910.0, ans=0.125 2024-08-13 19:10:37,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2274110.0, ans=0.125 2024-08-13 19:10:43,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2274210.0, ans=0.125 2024-08-13 19:10:57,536 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 10050, loss[loss=0.1163, beats_loss=0.01079, ecapa_loss=0.0001399, whisper_loss=0.1041, over 22058.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01081, ecapa_loss=0.0001622, whisper_loss=0.09159, over 3897778.14 frames. ], batch size: 86, lr: 3.89e-03, grad_scale: 1.152921504606847e+18 2024-08-13 19:11:13,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2274410.0, ans=0.125 2024-08-13 19:11:14,247 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.462e+01 2.706e+01 3.125e+01 1.991e+02, threshold=5.413e+01, percent-clipped=1.0 2024-08-13 19:11:14,469 INFO [train_multi_KD3.py:844] (0/4) A total of 97 cuts. 27 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-13 19:11:23,081 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 26 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-13 19:11:27,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2274510.0, ans=0.2 2024-08-13 19:11:32,125 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 20 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-13 19:11:41,411 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2274610.0, ans=0.125 2024-08-13 19:12:01,194 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 10100, loss[loss=0.1056, beats_loss=0.01178, ecapa_loss=0.000148, whisper_loss=0.09237, over 22574.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01074, ecapa_loss=0.0001633, whisper_loss=0.09209, over 3893483.06 frames. ], batch size: 87, lr: 3.89e-03, grad_scale: 1.152921504606847e+18 2024-08-13 19:12:02,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2274810.0, ans=0.1 2024-08-13 19:12:08,147 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-13 19:12:11,106 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.26 vs. limit=15.0 2024-08-13 19:12:15,608 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 14 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-13 19:12:15,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2274910.0, ans=0.125 2024-08-13 19:12:31,973 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.26 vs. limit=6.0 2024-08-13 19:12:35,819 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2275010.0, ans=0.0 2024-08-13 19:12:55,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2275210.0, ans=0.125 2024-08-13 19:12:59,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2275210.0, ans=0.125 2024-08-13 19:13:04,029 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 25 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-13 19:13:04,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2275210.0, ans=0.0 2024-08-13 19:13:06,636 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 10150, loss[loss=0.0734, beats_loss=0.01171, ecapa_loss=0.0001289, whisper_loss=0.0604, over 18270.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.0108, ecapa_loss=0.0001629, whisper_loss=0.09151, over 3921551.57 frames. ], batch size: 71, lr: 3.89e-03, grad_scale: 1.152921504606847e+18 2024-08-13 19:13:17,466 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2275310.0, ans=0.1 2024-08-13 19:13:22,223 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 20 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-13 19:13:22,892 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.85 vs. limit=22.5 2024-08-13 19:13:24,607 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.810e+01 2.395e+01 2.644e+01 2.917e+01 4.595e+01, threshold=5.288e+01, percent-clipped=0.0 2024-08-13 19:13:35,422 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2275510.0, ans=0.1 2024-08-13 19:13:46,216 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2275510.0, ans=0.125 2024-08-13 19:13:57,375 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 31 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-13 19:13:59,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2275610.0, ans=0.125 2024-08-13 19:14:02,769 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-13 19:14:11,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2275710.0, ans=0.0 2024-08-13 19:14:12,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2275710.0, ans=0.2 2024-08-13 19:14:15,988 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 10200, loss[loss=0.1167, beats_loss=0.00933, ecapa_loss=0.0001846, whisper_loss=0.1056, over 16603.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01073, ecapa_loss=0.000164, whisper_loss=0.0921, over 3918410.04 frames. ], batch size: 68, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:14:51,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2276010.0, ans=0.09899494936611666 2024-08-13 19:15:31,270 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 10250, loss[loss=0.101, beats_loss=0.0125, ecapa_loss=0.0001329, whisper_loss=0.08719, over 17459.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01075, ecapa_loss=0.0001627, whisper_loss=0.09176, over 3929938.50 frames. ], batch size: 67, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:15:38,115 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 27 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-13 19:15:53,070 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.977e+01 2.373e+01 2.735e+01 3.155e+01 5.239e+01, threshold=5.471e+01, percent-clipped=0.0 2024-08-13 19:16:26,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2276610.0, ans=0.0 2024-08-13 19:16:32,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2276710.0, ans=0.125 2024-08-13 19:16:33,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2276710.0, ans=0.2 2024-08-13 19:16:38,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2276710.0, ans=0.5 2024-08-13 19:16:44,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2276710.0, ans=0.2 2024-08-13 19:16:46,612 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 10300, loss[loss=0.09335, beats_loss=0.01292, ecapa_loss=0.0001495, whisper_loss=0.07894, over 22423.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01078, ecapa_loss=0.0001623, whisper_loss=0.09175, over 3937175.22 frames. ], batch size: 94, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:16:54,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2276810.0, ans=0.125 2024-08-13 19:16:56,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2276810.0, ans=0.125 2024-08-13 19:16:58,223 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2276810.0, ans=0.125 2024-08-13 19:17:18,307 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-13 19:17:23,503 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2277010.0, ans=0.125 2024-08-13 19:17:23,898 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.35 vs. limit=12.0 2024-08-13 19:17:25,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2277010.0, ans=0.025 2024-08-13 19:17:29,286 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 30 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-13 19:17:35,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2277110.0, ans=0.125 2024-08-13 19:17:37,984 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2277110.0, ans=0.0 2024-08-13 19:17:40,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2277110.0, ans=0.125 2024-08-13 19:17:48,632 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-13 19:17:48,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2277210.0, ans=0.0 2024-08-13 19:17:49,925 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 21 from LS+wenet, 12 from Vox, 41 fro AS 2024-08-13 19:17:51,146 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 22 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-13 19:18:02,359 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2277210.0, ans=0.0 2024-08-13 19:18:04,511 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 10350, loss[loss=0.1241, beats_loss=0.008243, ecapa_loss=0.0001751, whisper_loss=0.1141, over 22903.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.0107, ecapa_loss=0.0001621, whisper_loss=0.09216, over 3947921.25 frames. ], batch size: 88, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:18:12,810 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=2277310.0, ans=15.0 2024-08-13 19:18:23,819 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2277410.0, ans=0.2 2024-08-13 19:18:23,820 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2277410.0, ans=0.025 2024-08-13 19:18:26,215 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.822e+01 2.410e+01 2.741e+01 3.127e+01 1.313e+02, threshold=5.482e+01, percent-clipped=3.0 2024-08-13 19:18:30,856 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 18 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-13 19:18:36,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2277510.0, ans=0.0 2024-08-13 19:18:43,778 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2277510.0, ans=0.0 2024-08-13 19:18:51,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2277610.0, ans=0.125 2024-08-13 19:18:58,510 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2277610.0, ans=0.125 2024-08-13 19:19:14,374 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-13 19:19:14,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2277710.0, ans=0.0 2024-08-13 19:19:20,765 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 10400, loss[loss=0.0772, beats_loss=0.01141, ecapa_loss=0.0001585, whisper_loss=0.0642, over 17441.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01072, ecapa_loss=0.0001626, whisper_loss=0.09168, over 3930349.38 frames. ], batch size: 71, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:19:22,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2277810.0, ans=0.1 2024-08-13 19:19:25,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2277810.0, ans=0.1 2024-08-13 19:19:36,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2277910.0, ans=0.125 2024-08-13 19:19:39,460 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.84 vs. limit=15.0 2024-08-13 19:20:04,823 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2278110.0, ans=0.2 2024-08-13 19:20:09,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=2278110.0, ans=0.025 2024-08-13 19:20:20,270 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 29 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-13 19:20:33,596 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 10450, loss[loss=0.1027, beats_loss=0.01216, ecapa_loss=0.0001462, whisper_loss=0.08908, over 21539.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01076, ecapa_loss=0.0001612, whisper_loss=0.09157, over 3920912.11 frames. ], batch size: 89, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:20:35,609 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-13 19:20:36,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2278310.0, ans=0.1 2024-08-13 19:20:52,530 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 19 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-13 19:20:55,261 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.879e+01 2.408e+01 2.686e+01 2.992e+01 7.083e+01, threshold=5.372e+01, percent-clipped=1.0 2024-08-13 19:21:08,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2278510.0, ans=0.5 2024-08-13 19:21:12,891 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 22 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-13 19:21:36,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2278710.0, ans=0.0 2024-08-13 19:21:49,447 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 10500, loss[loss=0.1185, beats_loss=0.01332, ecapa_loss=0.0001254, whisper_loss=0.1039, over 23486.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01077, ecapa_loss=0.0001608, whisper_loss=0.09152, over 3947075.81 frames. ], batch size: 91, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:22:05,564 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 18 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-13 19:22:18,444 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.67 vs. limit=22.5 2024-08-13 19:22:38,405 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 14 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-13 19:22:47,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2279110.0, ans=0.0 2024-08-13 19:22:55,203 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 18 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-13 19:22:56,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2279210.0, ans=0.0 2024-08-13 19:23:04,358 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 28 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-13 19:23:05,952 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 10550, loss[loss=0.1036, beats_loss=0.01283, ecapa_loss=0.0001616, whisper_loss=0.08918, over 23461.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01085, ecapa_loss=0.0001606, whisper_loss=0.09069, over 3890787.55 frames. ], batch size: 94, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:23:12,417 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 28 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-13 19:23:16,956 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2279310.0, ans=0.0 2024-08-13 19:23:19,831 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=12.07 vs. limit=12.0 2024-08-13 19:23:24,323 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 22 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-13 19:23:29,187 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.960e+01 2.431e+01 2.823e+01 3.244e+01 7.825e+01, threshold=5.646e+01, percent-clipped=1.0 2024-08-13 19:23:43,769 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2279510.0, ans=0.1 2024-08-13 19:23:45,196 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 20 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-13 19:23:56,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2279610.0, ans=0.0 2024-08-13 19:24:05,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2279610.0, ans=0.125 2024-08-13 19:24:10,944 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.02 vs. limit=15.0 2024-08-13 19:24:12,719 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 14 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-13 19:24:22,750 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-13 19:24:25,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2279810.0, ans=0.0 2024-08-13 19:24:26,755 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 10600, loss[loss=0.1295, beats_loss=0.0083, ecapa_loss=0.0001862, whisper_loss=0.1193, over 19277.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01093, ecapa_loss=0.0001602, whisper_loss=0.09052, over 3898871.19 frames. ], batch size: 77, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:24:37,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2279810.0, ans=0.0 2024-08-13 19:24:53,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2279910.0, ans=0.2 2024-08-13 19:24:57,306 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-228000.pt 2024-08-13 19:25:03,684 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.76 vs. limit=22.5 2024-08-13 19:25:04,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2280010.0, ans=0.0 2024-08-13 19:25:28,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2280110.0, ans=0.0 2024-08-13 19:25:51,506 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 10650, loss[loss=0.1376, beats_loss=0.0079, ecapa_loss=0.0001826, whisper_loss=0.1279, over 20775.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01087, ecapa_loss=0.00016, whisper_loss=0.09117, over 3885835.62 frames. ], batch size: 81, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:25:58,836 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 19 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-13 19:26:15,375 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.293e+01 2.581e+01 2.913e+01 4.333e+01, threshold=5.161e+01, percent-clipped=0.0 2024-08-13 19:26:28,622 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.86 vs. limit=22.5 2024-08-13 19:26:29,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2280510.0, ans=0.125 2024-08-13 19:26:57,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2280710.0, ans=0.0 2024-08-13 19:27:13,864 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 10700, loss[loss=0.09452, beats_loss=0.01092, ecapa_loss=0.0001828, whisper_loss=0.08177, over 16707.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01089, ecapa_loss=0.0001585, whisper_loss=0.09126, over 3895879.62 frames. ], batch size: 69, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:27:17,411 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 12 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-13 19:27:17,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2280810.0, ans=0.125 2024-08-13 19:27:26,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2280810.0, ans=0.125 2024-08-13 19:27:56,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2281010.0, ans=0.0 2024-08-13 19:28:05,648 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 23 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-13 19:28:07,060 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2281110.0, ans=0.125 2024-08-13 19:28:14,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2281110.0, ans=0.035 2024-08-13 19:28:24,808 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 20 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-13 19:28:34,661 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 10750, loss[loss=0.113, beats_loss=0.01074, ecapa_loss=0.0001398, whisper_loss=0.1008, over 22937.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01074, ecapa_loss=0.0001582, whisper_loss=0.09289, over 3930426.37 frames. ], batch size: 88, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:28:38,007 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 23 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-13 19:28:39,994 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.61 vs. limit=15.0 2024-08-13 19:28:47,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2281310.0, ans=0.125 2024-08-13 19:28:57,100 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.061e+01 2.479e+01 2.782e+01 3.163e+01 7.452e+01, threshold=5.564e+01, percent-clipped=1.0 2024-08-13 19:29:21,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2281610.0, ans=0.2 2024-08-13 19:29:40,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2281710.0, ans=0.125 2024-08-13 19:29:52,354 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 25 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-13 19:29:52,821 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.53 vs. limit=15.0 2024-08-13 19:29:55,882 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 10800, loss[loss=0.09771, beats_loss=0.01115, ecapa_loss=0.0001822, whisper_loss=0.08474, over 20846.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.0108, ecapa_loss=0.0001586, whisper_loss=0.09238, over 3906426.26 frames. ], batch size: 87, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:29:57,446 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 15 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-13 19:30:12,985 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=2281910.0, ans=10.0 2024-08-13 19:30:25,756 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-13 19:30:31,378 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 29 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-13 19:30:55,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2282110.0, ans=0.1 2024-08-13 19:31:02,280 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 31 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-13 19:31:15,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2282310.0, ans=0.1 2024-08-13 19:31:16,118 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 10850, loss[loss=0.1005, beats_loss=0.01063, ecapa_loss=0.0001904, whisper_loss=0.08795, over 20816.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01078, ecapa_loss=0.0001588, whisper_loss=0.0924, over 3890774.28 frames. ], batch size: 88, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:31:22,219 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.58 vs. limit=22.5 2024-08-13 19:31:29,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2282310.0, ans=0.1 2024-08-13 19:31:38,778 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.143e+01 2.579e+01 2.775e+01 3.149e+01 7.029e+01, threshold=5.550e+01, percent-clipped=1.0 2024-08-13 19:31:41,726 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.95 vs. limit=15.0 2024-08-13 19:31:47,648 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2282510.0, ans=0.125 2024-08-13 19:31:47,966 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.35 vs. limit=15.0 2024-08-13 19:31:58,159 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.44 vs. limit=15.0 2024-08-13 19:32:24,286 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 37 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-13 19:32:25,684 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-13 19:32:33,607 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2282710.0, ans=0.1 2024-08-13 19:32:40,035 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 10900, loss[loss=0.1108, beats_loss=0.009005, ecapa_loss=0.0001853, whisper_loss=0.0999, over 14055.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01082, ecapa_loss=0.0001597, whisper_loss=0.09256, over 3906476.24 frames. ], batch size: 53, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:33:00,682 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2282910.0, ans=0.125 2024-08-13 19:33:17,951 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-13 19:33:34,523 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2283110.0, ans=0.125 2024-08-13 19:33:45,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2283210.0, ans=0.125 2024-08-13 19:34:00,876 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 10950, loss[loss=0.1019, beats_loss=0.009999, ecapa_loss=0.0001901, whisper_loss=0.08996, over 13323.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01077, ecapa_loss=0.0001591, whisper_loss=0.09251, over 3893863.11 frames. ], batch size: 55, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:34:04,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2283310.0, ans=0.125 2024-08-13 19:34:06,177 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.66 vs. limit=15.0 2024-08-13 19:34:09,647 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.71 vs. limit=22.5 2024-08-13 19:34:23,427 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.358e+01 2.590e+01 2.815e+01 3.849e+01, threshold=5.179e+01, percent-clipped=0.0 2024-08-13 19:34:32,430 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 23 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-13 19:35:16,534 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-13 19:35:22,595 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 11000, loss[loss=0.0905, beats_loss=0.01106, ecapa_loss=0.0001681, whisper_loss=0.07777, over 20291.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01078, ecapa_loss=0.0001598, whisper_loss=0.09174, over 3866653.96 frames. ], batch size: 81, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:35:31,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2283810.0, ans=0.0 2024-08-13 19:35:32,946 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=21.00 vs. limit=22.5 2024-08-13 19:35:35,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2283810.0, ans=0.2 2024-08-13 19:35:35,793 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2283810.0, ans=0.0 2024-08-13 19:35:37,791 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-13 19:35:41,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten.whitening_limit, batch_count=2283910.0, ans=15.0 2024-08-13 19:35:46,662 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 17 from Vox, 49 fro AS 2024-08-13 19:35:55,406 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2284010.0, ans=0.125 2024-08-13 19:36:07,131 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.95 vs. limit=22.5 2024-08-13 19:36:28,439 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 22 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-13 19:36:45,151 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 11050, loss[loss=0.1065, beats_loss=0.0107, ecapa_loss=0.0001525, whisper_loss=0.09425, over 21650.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01073, ecapa_loss=0.0001609, whisper_loss=0.09178, over 3871555.35 frames. ], batch size: 86, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:36:52,409 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-13 19:37:01,230 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.94 vs. limit=5.0 2024-08-13 19:37:08,165 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.114e+01 2.460e+01 2.684e+01 3.016e+01 4.539e+01, threshold=5.368e+01, percent-clipped=0.0 2024-08-13 19:37:25,612 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.29 vs. limit=6.0 2024-08-13 19:37:26,880 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 30 from LS+wenet, 15 from Vox, 17 fro AS 2024-08-13 19:37:42,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2284610.0, ans=0.0 2024-08-13 19:37:51,359 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2284710.0, ans=0.2 2024-08-13 19:37:52,701 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 13 from LS+wenet, 27 from Vox, 19 fro AS 2024-08-13 19:38:01,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2284710.0, ans=0.125 2024-08-13 19:38:03,185 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.36 vs. limit=15.0 2024-08-13 19:38:07,386 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 11100, loss[loss=0.116, beats_loss=0.009984, ecapa_loss=0.0001522, whisper_loss=0.1045, over 23332.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01073, ecapa_loss=0.0001615, whisper_loss=0.09162, over 3895987.36 frames. ], batch size: 89, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:38:12,830 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 28 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-13 19:38:21,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2284810.0, ans=0.125 2024-08-13 19:38:25,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2284910.0, ans=0.125 2024-08-13 19:38:37,570 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.28 vs. limit=15.0 2024-08-13 19:38:46,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2285010.0, ans=0.125 2024-08-13 19:39:01,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2285110.0, ans=0.125 2024-08-13 19:39:29,448 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 11150, loss[loss=0.1239, beats_loss=0.009145, ecapa_loss=0.0001604, whisper_loss=0.1132, over 22952.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01074, ecapa_loss=0.000161, whisper_loss=0.0913, over 3899565.36 frames. ], batch size: 89, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:39:38,045 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 23 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-13 19:39:40,029 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2285310.0, ans=0.125 2024-08-13 19:39:47,940 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-13 19:39:49,335 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 19 from LS+wenet, 19 from Vox, 17 fro AS 2024-08-13 19:39:52,321 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.107e+01 2.378e+01 2.624e+01 2.890e+01 4.520e+01, threshold=5.247e+01, percent-clipped=0.0 2024-08-13 19:40:13,301 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.36 vs. limit=12.0 2024-08-13 19:40:16,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2285510.0, ans=0.0 2024-08-13 19:40:27,086 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 19 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-13 19:40:30,995 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.91 vs. limit=22.5 2024-08-13 19:40:52,833 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 11200, loss[loss=0.07406, beats_loss=0.01637, ecapa_loss=0.0001354, whisper_loss=0.05633, over 16974.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01064, ecapa_loss=0.0001617, whisper_loss=0.09148, over 3879149.86 frames. ], batch size: 70, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:41:03,253 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.71 vs. limit=15.0 2024-08-13 19:41:25,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2286010.0, ans=0.0 2024-08-13 19:41:46,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2286110.0, ans=0.0 2024-08-13 19:41:51,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2286110.0, ans=0.2 2024-08-13 19:42:05,636 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-13 19:42:10,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2286210.0, ans=0.0 2024-08-13 19:42:13,116 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 11250, loss[loss=0.1127, beats_loss=0.01006, ecapa_loss=0.0001568, whisper_loss=0.1011, over 21431.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0107, ecapa_loss=0.000162, whisper_loss=0.09088, over 3900110.49 frames. ], batch size: 83, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:42:31,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2286410.0, ans=0.125 2024-08-13 19:42:31,350 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 19:42:32,255 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 25 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-13 19:42:34,631 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.042e+01 2.479e+01 2.663e+01 3.017e+01 9.282e+01, threshold=5.327e+01, percent-clipped=2.0 2024-08-13 19:42:44,734 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-13 19:42:58,639 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2286610.0, ans=0.2 2024-08-13 19:43:06,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2286610.0, ans=0.035 2024-08-13 19:43:09,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2286610.0, ans=0.125 2024-08-13 19:43:14,340 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 32 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-13 19:43:14,571 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2286710.0, ans=0.125 2024-08-13 19:43:20,645 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 21 from LS+wenet, 23 from Vox, 48 fro AS 2024-08-13 19:43:21,875 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 18 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-13 19:43:23,553 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2286710.0, ans=0.0 2024-08-13 19:43:31,199 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 11300, loss[loss=0.1043, beats_loss=0.007858, ecapa_loss=0.0002489, whisper_loss=0.09392, over 21808.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01068, ecapa_loss=0.0001631, whisper_loss=0.09077, over 3944022.21 frames. ], batch size: 95, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:44:14,201 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 22 from LS+wenet, 20 from Vox, 49 fro AS 2024-08-13 19:44:35,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2287210.0, ans=0.125 2024-08-13 19:44:53,126 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 11350, loss[loss=0.1146, beats_loss=0.01055, ecapa_loss=0.0001564, whisper_loss=0.1025, over 14484.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01064, ecapa_loss=0.0001633, whisper_loss=0.09128, over 3910375.65 frames. ], batch size: 56, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:45:15,600 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.993e+01 2.325e+01 2.679e+01 3.004e+01 6.399e+01, threshold=5.358e+01, percent-clipped=2.0 2024-08-13 19:45:16,051 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2287410.0, ans=0.125 2024-08-13 19:45:38,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2287510.0, ans=0.125 2024-08-13 19:45:59,900 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 16 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-13 19:46:02,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2287710.0, ans=0.125 2024-08-13 19:46:14,160 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 11400, loss[loss=0.1243, beats_loss=0.008529, ecapa_loss=0.0001982, whisper_loss=0.1138, over 23361.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.0106, ecapa_loss=0.0001642, whisper_loss=0.09156, over 3869522.79 frames. ], batch size: 94, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:46:37,214 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.73 vs. limit=15.0 2024-08-13 19:46:43,125 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 17 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-13 19:46:51,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=2288010.0, ans=0.05 2024-08-13 19:47:15,437 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2288110.0, ans=0.125 2024-08-13 19:47:26,593 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 21 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-13 19:47:31,218 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2288210.0, ans=0.0 2024-08-13 19:47:31,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2288210.0, ans=0.0 2024-08-13 19:47:33,930 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 11450, loss[loss=0.0854, beats_loss=0.01038, ecapa_loss=0.0001647, whisper_loss=0.07337, over 15195.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01068, ecapa_loss=0.0001635, whisper_loss=0.09086, over 3856076.25 frames. ], batch size: 61, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:47:47,854 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 18 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-13 19:47:52,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2288410.0, ans=0.1 2024-08-13 19:47:55,111 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.013e+01 2.447e+01 2.680e+01 2.957e+01 5.322e+01, threshold=5.359e+01, percent-clipped=0.0 2024-08-13 19:48:17,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2288510.0, ans=0.125 2024-08-13 19:48:28,115 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 16 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-13 19:48:34,468 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.58 vs. limit=15.0 2024-08-13 19:48:42,923 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2288710.0, ans=0.05 2024-08-13 19:48:51,791 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 11500, loss[loss=0.1177, beats_loss=0.01064, ecapa_loss=0.0001462, whisper_loss=0.1056, over 23025.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01073, ecapa_loss=0.000163, whisper_loss=0.09054, over 3850338.68 frames. ], batch size: 92, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:49:08,954 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-13 19:49:39,811 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 11 from LS+wenet, 11 from Vox, 36 fro AS 2024-08-13 19:49:53,823 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2289110.0, ans=0.0 2024-08-13 19:50:13,199 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 11550, loss[loss=0.1047, beats_loss=0.008521, ecapa_loss=0.0002067, whisper_loss=0.09411, over 18306.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01079, ecapa_loss=0.0001625, whisper_loss=0.09045, over 3853065.58 frames. ], batch size: 73, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:50:35,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2289410.0, ans=0.0 2024-08-13 19:50:36,973 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.552e+01 2.829e+01 3.234e+01 6.675e+01, threshold=5.658e+01, percent-clipped=2.0 2024-08-13 19:50:42,812 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2289410.0, ans=0.125 2024-08-13 19:50:57,834 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=2289510.0, ans=0.025 2024-08-13 19:51:00,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2289510.0, ans=0.1 2024-08-13 19:51:00,840 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.09 vs. limit=15.0 2024-08-13 19:51:08,863 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-13 19:51:35,570 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 11600, loss[loss=0.07364, beats_loss=0.01132, ecapa_loss=0.0001746, whisper_loss=0.06058, over 13814.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01083, ecapa_loss=0.000164, whisper_loss=0.09016, over 3862525.48 frames. ], batch size: 58, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:52:10,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2290010.0, ans=0.125 2024-08-13 19:52:16,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2290010.0, ans=0.2 2024-08-13 19:52:31,463 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 25 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-13 19:52:45,543 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.25 vs. limit=15.0 2024-08-13 19:52:59,415 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 11650, loss[loss=0.1186, beats_loss=0.00762, ecapa_loss=0.0002186, whisper_loss=0.1088, over 21356.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01088, ecapa_loss=0.0001632, whisper_loss=0.09046, over 3868740.52 frames. ], batch size: 85, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:52:59,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2290310.0, ans=0.1 2024-08-13 19:53:00,366 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.29 vs. limit=15.0 2024-08-13 19:53:10,172 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.13 vs. limit=15.0 2024-08-13 19:53:12,478 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2290310.0, ans=0.07 2024-08-13 19:53:22,682 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.043e+01 2.408e+01 2.632e+01 2.967e+01 4.953e+01, threshold=5.264e+01, percent-clipped=0.0 2024-08-13 19:53:37,648 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-13 19:53:52,510 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-13 19:53:59,490 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 13 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-13 19:54:21,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2290710.0, ans=0.1 2024-08-13 19:54:23,583 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 11700, loss[loss=0.1395, beats_loss=0.007302, ecapa_loss=0.0001789, whisper_loss=0.1304, over 15682.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01095, ecapa_loss=0.0001637, whisper_loss=0.09053, over 3881219.93 frames. ], batch size: 56, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:54:28,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2290810.0, ans=0.125 2024-08-13 19:54:32,534 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=5.238e+00 2024-08-13 19:54:40,761 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 24 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-13 19:54:42,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2290910.0, ans=0.2 2024-08-13 19:54:47,776 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 15 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-13 19:54:47,962 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2290910.0, ans=0.0 2024-08-13 19:55:10,009 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 29 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-13 19:55:19,541 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 26 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-13 19:55:20,114 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=12.23 vs. limit=12.0 2024-08-13 19:55:29,247 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.99 vs. limit=15.0 2024-08-13 19:55:46,652 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 11750, loss[loss=0.1183, beats_loss=0.00937, ecapa_loss=0.0001781, whisper_loss=0.1072, over 22292.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01096, ecapa_loss=0.0001638, whisper_loss=0.09072, over 3867809.10 frames. ], batch size: 88, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:55:56,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2291310.0, ans=0.125 2024-08-13 19:56:11,469 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.921e+01 2.398e+01 2.617e+01 2.949e+01 4.150e+01, threshold=5.234e+01, percent-clipped=0.0 2024-08-13 19:56:48,263 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2291610.0, ans=0.1 2024-08-13 19:56:51,178 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 28 from LS+wenet, 29 from Vox, 24 fro AS 2024-08-13 19:57:09,101 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 11800, loss[loss=0.1189, beats_loss=0.008531, ecapa_loss=0.0001462, whisper_loss=0.1089, over 15697.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01087, ecapa_loss=0.0001621, whisper_loss=0.09158, over 3865696.11 frames. ], batch size: 58, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:57:27,512 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-13 19:57:34,569 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2291910.0, ans=0.125 2024-08-13 19:58:20,257 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-13 19:58:28,486 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 18 from LS+wenet, 23 from Vox, 19 fro AS 2024-08-13 19:58:32,701 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 11850, loss[loss=0.09261, beats_loss=0.01007, ecapa_loss=0.0001868, whisper_loss=0.08068, over 19794.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01084, ecapa_loss=0.000163, whisper_loss=0.09125, over 3892743.04 frames. ], batch size: 81, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:58:42,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2292310.0, ans=0.125 2024-08-13 19:58:45,750 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-13 19:58:46,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2292310.0, ans=0.1 2024-08-13 19:58:55,608 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.839e+01 2.456e+01 2.721e+01 2.965e+01 7.443e+01, threshold=5.443e+01, percent-clipped=2.0 2024-08-13 19:59:03,780 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 30 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-13 19:59:15,245 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-13 19:59:37,231 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 16 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-13 19:59:49,548 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2292710.0, ans=0.0 2024-08-13 19:59:52,992 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 11900, loss[loss=0.1294, beats_loss=0.007902, ecapa_loss=0.0001803, whisper_loss=0.1197, over 19341.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01092, ecapa_loss=0.0001619, whisper_loss=0.09102, over 3909188.32 frames. ], batch size: 77, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:00:03,540 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2292810.0, ans=0.1 2024-08-13 20:00:42,444 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.69 vs. limit=15.0 2024-08-13 20:01:01,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2293210.0, ans=0.125 2024-08-13 20:01:05,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2293210.0, ans=0.1 2024-08-13 20:01:12,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2293310.0, ans=0.125 2024-08-13 20:01:13,034 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 11950, loss[loss=0.1028, beats_loss=0.008148, ecapa_loss=0.000186, whisper_loss=0.09281, over 20612.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01078, ecapa_loss=0.0001637, whisper_loss=0.09177, over 3899479.53 frames. ], batch size: 79, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:01:35,779 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.982e+01 2.292e+01 2.621e+01 2.963e+01 5.710e+01, threshold=5.241e+01, percent-clipped=1.0 2024-08-13 20:01:39,862 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2293410.0, ans=0.125 2024-08-13 20:01:44,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2293510.0, ans=0.04949747468305833 2024-08-13 20:02:01,577 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 22 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-13 20:02:04,590 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 35 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-13 20:02:08,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2293610.0, ans=0.0 2024-08-13 20:02:15,201 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.58 vs. limit=15.0 2024-08-13 20:02:23,425 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 18 from Vox, 50 fro AS 2024-08-13 20:02:25,401 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2293710.0, ans=10.0 2024-08-13 20:02:32,125 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 12000, loss[loss=0.1119, beats_loss=0.01024, ecapa_loss=0.0001781, whisper_loss=0.09986, over 23610.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01083, ecapa_loss=0.0001639, whisper_loss=0.09093, over 3912351.13 frames. ], batch size: 96, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:02:32,126 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-13 20:02:50,875 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.7636, 3.3924, 3.3437, 3.4354], device='cuda:0') 2024-08-13 20:03:12,728 INFO [train_multi_KD3.py:1149] (0/4) Epoch 16, validation on ASR_libri: loss=0.2535, beats_loss=0, ecapa_loss=0.0005542, whisper_loss=0.248, over 922467.00 frames. 2024-08-13 20:03:33,762 INFO [train_multi_KD3.py:1149] (0/4) Epoch 16, validation on SV_voxceleb1: loss=0.004415, beats_loss=0, ecapa_loss=0.0004415, whisper_loss=0, over 939242.00 frames. 2024-08-13 20:04:36,283 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.0854, 1.7610, 1.8071, 1.6994], device='cuda:0') 2024-08-13 20:05:21,730 INFO [train_multi_KD3.py:1149] (0/4) Epoch 16, validation on AT_audioset: loss=0.02371, beats_loss=0.02371, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 20:05:21,734 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32601MB 2024-08-13 20:05:34,474 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2293810.0, ans=0.0 2024-08-13 20:05:57,314 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-13 20:06:13,077 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2294110.0, ans=0.0 2024-08-13 20:06:24,248 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 16 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-13 20:06:32,690 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 20 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-13 20:06:39,693 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.03 vs. limit=15.0 2024-08-13 20:06:44,309 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 12050, loss[loss=0.09457, beats_loss=0.01229, ecapa_loss=0.000156, whisper_loss=0.08072, over 17125.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01081, ecapa_loss=0.0001634, whisper_loss=0.09136, over 3914817.92 frames. ], batch size: 68, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:06:46,857 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.95 vs. limit=15.0 2024-08-13 20:06:58,020 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2294310.0, ans=0.125 2024-08-13 20:07:07,489 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.929e+01 2.498e+01 2.752e+01 3.060e+01 1.760e+02, threshold=5.504e+01, percent-clipped=2.0 2024-08-13 20:07:28,941 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.03 vs. limit=22.5 2024-08-13 20:07:33,631 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 24 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-13 20:07:37,130 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2294610.0, ans=0.125 2024-08-13 20:07:41,856 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 37 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-13 20:07:42,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2294610.0, ans=0.125 2024-08-13 20:07:53,299 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.92 vs. limit=15.0 2024-08-13 20:07:55,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2294710.0, ans=0.0 2024-08-13 20:08:08,949 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 12100, loss[loss=0.07639, beats_loss=0.01249, ecapa_loss=0.0001936, whisper_loss=0.06196, over 20727.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01079, ecapa_loss=0.0001638, whisper_loss=0.09129, over 3912077.66 frames. ], batch size: 92, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:08:16,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=2294810.0, ans=10.0 2024-08-13 20:08:28,473 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2294910.0, ans=0.1 2024-08-13 20:08:32,381 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 16 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-13 20:08:32,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2294910.0, ans=0.05 2024-08-13 20:08:40,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2295010.0, ans=0.125 2024-08-13 20:08:52,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2295010.0, ans=0.125 2024-08-13 20:09:17,888 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 22 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-13 20:09:26,588 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2295210.0, ans=0.125 2024-08-13 20:09:27,785 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 22 from LS+wenet, 32 from Vox, 37 fro AS 2024-08-13 20:09:29,300 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 12150, loss[loss=0.08945, beats_loss=0.01273, ecapa_loss=0.0002149, whisper_loss=0.07457, over 20957.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01078, ecapa_loss=0.0001644, whisper_loss=0.09084, over 3891715.19 frames. ], batch size: 91, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:09:38,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2295310.0, ans=0.1 2024-08-13 20:09:41,451 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2295310.0, ans=0.1 2024-08-13 20:09:44,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2295410.0, ans=0.2 2024-08-13 20:09:52,649 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.856e+01 2.320e+01 2.600e+01 2.810e+01 5.391e+01, threshold=5.201e+01, percent-clipped=0.0 2024-08-13 20:09:55,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2295410.0, ans=0.2 2024-08-13 20:10:00,033 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 26 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-13 20:10:09,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2295510.0, ans=0.125 2024-08-13 20:10:13,170 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 23 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-13 20:10:22,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2295610.0, ans=0.025 2024-08-13 20:10:42,143 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2295710.0, ans=0.1 2024-08-13 20:10:42,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2295710.0, ans=0.125 2024-08-13 20:10:55,513 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 12200, loss[loss=0.1087, beats_loss=0.008521, ecapa_loss=0.0001852, whisper_loss=0.09835, over 15852.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01079, ecapa_loss=0.000164, whisper_loss=0.09042, over 3870730.59 frames. ], batch size: 63, lr: 3.88e-03, grad_scale: 1.152921504606847e+18 2024-08-13 20:11:11,079 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 18 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-13 20:11:11,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2295910.0, ans=0.0 2024-08-13 20:11:14,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2295910.0, ans=0.125 2024-08-13 20:11:20,767 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2295910.0, ans=0.125 2024-08-13 20:11:23,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2295910.0, ans=0.0 2024-08-13 20:11:52,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2296110.0, ans=0.125 2024-08-13 20:11:52,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2296110.0, ans=0.125 2024-08-13 20:11:56,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2296110.0, ans=0.125 2024-08-13 20:12:00,938 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 25 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-13 20:12:16,958 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 12250, loss[loss=0.09209, beats_loss=0.0109, ecapa_loss=0.0002069, whisper_loss=0.07913, over 18855.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01077, ecapa_loss=0.0001642, whisper_loss=0.09128, over 3881770.23 frames. ], batch size: 86, lr: 3.88e-03, grad_scale: 1.152921504606847e+18 2024-08-13 20:12:34,688 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 23 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-13 20:12:38,794 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.422e+01 2.679e+01 2.912e+01 4.424e+01, threshold=5.358e+01, percent-clipped=0.0 2024-08-13 20:12:49,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2296510.0, ans=0.5 2024-08-13 20:12:55,879 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.86 vs. limit=6.0 2024-08-13 20:13:09,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2296610.0, ans=0.125 2024-08-13 20:13:36,230 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 12300, loss[loss=0.1067, beats_loss=0.008892, ecapa_loss=0.0001802, whisper_loss=0.09605, over 21404.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01078, ecapa_loss=0.0001645, whisper_loss=0.09172, over 3890862.70 frames. ], batch size: 86, lr: 3.88e-03, grad_scale: 1.152921504606847e+18 2024-08-13 20:13:46,922 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2296810.0, ans=0.0 2024-08-13 20:13:53,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2296910.0, ans=0.125 2024-08-13 20:13:56,623 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 15 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-13 20:14:45,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2297210.0, ans=0.0 2024-08-13 20:14:51,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2297210.0, ans=0.2 2024-08-13 20:14:55,167 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 12350, loss[loss=0.1074, beats_loss=0.01095, ecapa_loss=0.0001696, whisper_loss=0.09474, over 18344.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01078, ecapa_loss=0.0001649, whisper_loss=0.09156, over 3876862.10 frames. ], batch size: 71, lr: 3.87e-03, grad_scale: 1.152921504606847e+18 2024-08-13 20:15:04,620 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2297310.0, ans=0.125 2024-08-13 20:15:18,218 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.796e+01 2.416e+01 2.631e+01 3.029e+01 4.449e+01, threshold=5.262e+01, percent-clipped=0.0 2024-08-13 20:15:36,031 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.92 vs. limit=22.5 2024-08-13 20:15:40,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2297510.0, ans=0.125 2024-08-13 20:15:44,323 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-13 20:15:49,679 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.54 vs. limit=22.5 2024-08-13 20:16:00,537 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-13 20:16:04,031 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 32 from Vox, 29 fro AS 2024-08-13 20:16:13,376 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2297710.0, ans=0.125 2024-08-13 20:16:18,957 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 12400, loss[loss=0.1183, beats_loss=0.01217, ecapa_loss=0.0001385, whisper_loss=0.1048, over 24007.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01074, ecapa_loss=0.0001627, whisper_loss=0.0919, over 3890636.04 frames. ], batch size: 93, lr: 3.87e-03, grad_scale: 1.152921504606847e+18 2024-08-13 20:16:24,935 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.68 vs. limit=6.0 2024-08-13 20:16:27,638 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 28 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-13 20:16:30,585 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2297810.0, ans=0.2 2024-08-13 20:16:59,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2298010.0, ans=0.0 2024-08-13 20:17:08,184 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.13 vs. limit=15.0 2024-08-13 20:17:08,913 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 15 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-13 20:17:12,372 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 31 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-13 20:17:17,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2298110.0, ans=0.0 2024-08-13 20:17:18,620 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.795e+05 2024-08-13 20:17:30,237 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-13 20:17:36,446 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 12450, loss[loss=0.09163, beats_loss=0.01243, ecapa_loss=0.0001292, whisper_loss=0.07791, over 20293.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01073, ecapa_loss=0.0001628, whisper_loss=0.09205, over 3882652.48 frames. ], batch size: 82, lr: 3.87e-03, grad_scale: 1.152921504606847e+18 2024-08-13 20:17:45,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2298310.0, ans=0.125 2024-08-13 20:17:51,536 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2298410.0, ans=0.125 2024-08-13 20:17:57,293 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.913e+01 2.445e+01 2.805e+01 3.307e+01 1.075e+02, threshold=5.611e+01, percent-clipped=1.0 2024-08-13 20:17:57,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2298410.0, ans=0.0 2024-08-13 20:18:06,758 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2298510.0, ans=0.0 2024-08-13 20:18:10,252 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 23 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-13 20:18:33,011 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 26 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-13 20:18:46,665 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2298710.0, ans=0.125 2024-08-13 20:18:53,376 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 12500, loss[loss=0.09825, beats_loss=0.009905, ecapa_loss=0.0001591, whisper_loss=0.08675, over 23448.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01075, ecapa_loss=0.000163, whisper_loss=0.09237, over 3883577.56 frames. ], batch size: 94, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:19:00,717 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2298810.0, ans=0.125 2024-08-13 20:19:25,275 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.58 vs. limit=22.5 2024-08-13 20:19:39,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2299010.0, ans=0.1 2024-08-13 20:19:45,457 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 21 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-13 20:19:45,750 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2299110.0, ans=0.125 2024-08-13 20:19:52,226 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.79 vs. limit=12.0 2024-08-13 20:20:01,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2299210.0, ans=0.0 2024-08-13 20:20:12,102 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 22 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-13 20:20:12,760 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.57 vs. limit=15.0 2024-08-13 20:20:13,668 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 12550, loss[loss=0.1055, beats_loss=0.01123, ecapa_loss=0.0001527, whisper_loss=0.09271, over 17372.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01072, ecapa_loss=0.0001623, whisper_loss=0.09209, over 3864335.12 frames. ], batch size: 70, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:20:14,180 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 25 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-13 20:20:15,768 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 15 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-13 20:20:17,868 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.77 vs. limit=15.0 2024-08-13 20:20:35,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2299410.0, ans=0.0 2024-08-13 20:20:37,938 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.461e+01 2.791e+01 3.123e+01 5.243e+01, threshold=5.581e+01, percent-clipped=0.0 2024-08-13 20:20:43,608 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2299410.0, ans=0.125 2024-08-13 20:21:02,101 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-13 20:21:07,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2299610.0, ans=0.0 2024-08-13 20:21:12,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2299610.0, ans=0.125 2024-08-13 20:21:13,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2299610.0, ans=0.0 2024-08-13 20:21:14,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2299610.0, ans=0.125 2024-08-13 20:21:23,907 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 14 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-13 20:21:32,920 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 12600, loss[loss=0.1028, beats_loss=0.009385, ecapa_loss=0.0002253, whisper_loss=0.09116, over 22251.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01084, ecapa_loss=0.0001618, whisper_loss=0.09122, over 3877959.56 frames. ], batch size: 95, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:21:43,142 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 22 from LS+wenet, 32 from Vox, 35 fro AS 2024-08-13 20:21:46,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2299810.0, ans=0.1 2024-08-13 20:21:56,902 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 21 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-13 20:22:01,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2299910.0, ans=0.125 2024-08-13 20:22:09,032 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 15 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-13 20:22:15,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2300010.0, ans=0.125 2024-08-13 20:22:16,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2300010.0, ans=0.0 2024-08-13 20:22:32,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2300110.0, ans=0.1 2024-08-13 20:22:36,110 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 23 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-13 20:22:41,787 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 30 from LS+wenet, 13 from Vox, 39 fro AS 2024-08-13 20:22:45,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2300210.0, ans=0.125 2024-08-13 20:22:50,730 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 12650, loss[loss=0.09686, beats_loss=0.01136, ecapa_loss=0.0001766, whisper_loss=0.08374, over 16654.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01085, ecapa_loss=0.000163, whisper_loss=0.0916, over 3855577.44 frames. ], batch size: 70, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:23:13,513 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.379e+01 2.634e+01 2.946e+01 5.512e+01, threshold=5.269e+01, percent-clipped=0.0 2024-08-13 20:23:16,275 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 24 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-13 20:23:25,397 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2300510.0, ans=0.2 2024-08-13 20:23:29,891 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-13 20:23:34,851 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 26 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-13 20:23:36,911 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.45 vs. limit=15.0 2024-08-13 20:23:39,889 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 28 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-13 20:23:47,920 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-13 20:23:52,683 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 32 from LS+wenet, 15 from Vox, 46 fro AS 2024-08-13 20:24:07,562 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 12700, loss[loss=0.09734, beats_loss=0.008927, ecapa_loss=0.0001724, whisper_loss=0.08669, over 18599.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01085, ecapa_loss=0.0001625, whisper_loss=0.09145, over 3830145.76 frames. ], batch size: 75, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:24:30,095 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 23 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-13 20:24:33,408 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2300910.0, ans=0.0 2024-08-13 20:24:44,693 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.03 vs. limit=22.5 2024-08-13 20:24:56,217 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-13 20:25:09,878 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 20:25:18,659 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.01 vs. limit=15.0 2024-08-13 20:25:22,851 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 23 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-13 20:25:26,882 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 12750, loss[loss=0.08904, beats_loss=0.00851, ecapa_loss=0.000204, whisper_loss=0.07849, over 16545.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01094, ecapa_loss=0.0001619, whisper_loss=0.09115, over 3867019.70 frames. ], batch size: 69, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:25:29,184 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.16 vs. limit=10.0 2024-08-13 20:25:33,291 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.80 vs. limit=6.0 2024-08-13 20:25:47,351 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 29 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-13 20:25:50,291 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.027e+01 2.318e+01 2.587e+01 2.901e+01 2.435e+02, threshold=5.175e+01, percent-clipped=0.0 2024-08-13 20:25:50,449 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 19 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-13 20:25:52,135 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2301410.0, ans=0.0 2024-08-13 20:25:55,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2301410.0, ans=0.1 2024-08-13 20:26:01,436 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-13 20:26:03,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2301510.0, ans=0.125 2024-08-13 20:26:35,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2301710.0, ans=0.125 2024-08-13 20:26:45,282 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 12800, loss[loss=0.08778, beats_loss=0.01019, ecapa_loss=0.0001677, whisper_loss=0.07592, over 19228.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01087, ecapa_loss=0.0001634, whisper_loss=0.09108, over 3870917.58 frames. ], batch size: 74, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:26:47,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2301810.0, ans=0.1 2024-08-13 20:27:14,503 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 23 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-13 20:27:31,729 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2302110.0, ans=0.0 2024-08-13 20:27:59,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2302210.0, ans=0.125 2024-08-13 20:28:03,075 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 12850, loss[loss=0.08743, beats_loss=0.01285, ecapa_loss=0.0001629, whisper_loss=0.07296, over 17033.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01086, ecapa_loss=0.0001636, whisper_loss=0.09091, over 3876774.04 frames. ], batch size: 70, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:28:20,356 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2302410.0, ans=0.125 2024-08-13 20:28:21,571 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-13 20:28:21,778 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2302410.0, ans=0.0 2024-08-13 20:28:23,174 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 22 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-13 20:28:26,586 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.352e+01 2.567e+01 2.932e+01 5.459e+01, threshold=5.134e+01, percent-clipped=2.0 2024-08-13 20:28:29,777 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2302410.0, ans=0.125 2024-08-13 20:28:30,925 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 13 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-13 20:28:40,616 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2302510.0, ans=0.125 2024-08-13 20:28:43,019 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 20 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-13 20:29:19,205 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.14 vs. limit=15.0 2024-08-13 20:29:20,487 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 12900, loss[loss=0.09529, beats_loss=0.01052, ecapa_loss=0.0001421, whisper_loss=0.08335, over 15645.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01082, ecapa_loss=0.0001634, whisper_loss=0.09029, over 3831416.79 frames. ], batch size: 55, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:29:23,490 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2302810.0, ans=0.0 2024-08-13 20:29:25,930 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 19 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-13 20:29:26,217 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff2.min_abs, batch_count=2302810.0, ans=0.1 2024-08-13 20:29:32,244 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 21 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 20:29:36,654 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 20:29:38,546 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.81 vs. limit=22.5 2024-08-13 20:29:55,799 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 25 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-13 20:29:57,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2303010.0, ans=0.04949747468305833 2024-08-13 20:30:12,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2303110.0, ans=0.1 2024-08-13 20:30:20,055 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 20:30:33,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2303210.0, ans=0.0 2024-08-13 20:30:37,722 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 27 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-13 20:30:39,614 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 12950, loss[loss=0.127, beats_loss=0.007615, ecapa_loss=0.0001589, whisper_loss=0.1178, over 18275.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0108, ecapa_loss=0.0001633, whisper_loss=0.09044, over 3847986.43 frames. ], batch size: 68, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:31:01,900 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.283e+01 2.671e+01 2.992e+01 6.489e+01, threshold=5.342e+01, percent-clipped=3.0 2024-08-13 20:31:14,962 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 18 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-13 20:31:22,053 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-13 20:31:24,348 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.38 vs. limit=22.5 2024-08-13 20:31:25,552 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 32 from Vox, 33 fro AS 2024-08-13 20:31:27,495 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-13 20:31:31,626 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 25 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-13 20:31:43,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2303710.0, ans=0.125 2024-08-13 20:31:59,074 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 13000, loss[loss=0.106, beats_loss=0.009602, ecapa_loss=0.0001252, whisper_loss=0.09514, over 14635.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01085, ecapa_loss=0.0001622, whisper_loss=0.0904, over 3866094.19 frames. ], batch size: 54, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:32:04,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2303810.0, ans=0.0 2024-08-13 20:32:17,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2303910.0, ans=0.0 2024-08-13 20:32:58,783 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 17 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-13 20:33:11,144 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-13 20:33:23,579 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 13050, loss[loss=0.1068, beats_loss=0.0129, ecapa_loss=0.0001235, whisper_loss=0.09265, over 15674.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01088, ecapa_loss=0.0001616, whisper_loss=0.08977, over 3811937.44 frames. ], batch size: 62, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:33:30,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2304310.0, ans=0.04949747468305833 2024-08-13 20:33:55,486 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.312e+01 2.609e+01 2.956e+01 5.975e+01, threshold=5.219e+01, percent-clipped=1.0 2024-08-13 20:33:58,917 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2304410.0, ans=0.125 2024-08-13 20:34:01,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=2304410.0, ans=15.0 2024-08-13 20:34:16,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2304510.0, ans=0.125 2024-08-13 20:34:48,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2304610.0, ans=0.125 2024-08-13 20:35:02,324 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 25 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-13 20:35:07,344 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.67 vs. limit=15.0 2024-08-13 20:35:14,275 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2304810.0, ans=0.1 2024-08-13 20:35:16,078 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 13100, loss[loss=0.1168, beats_loss=0.009359, ecapa_loss=0.0001997, whisper_loss=0.1054, over 15408.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01073, ecapa_loss=0.0001623, whisper_loss=0.09039, over 3810796.18 frames. ], batch size: 64, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:35:24,293 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.69 vs. limit=15.0 2024-08-13 20:35:34,829 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-13 20:35:40,151 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.05 vs. limit=15.0 2024-08-13 20:36:09,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2305010.0, ans=0.125 2024-08-13 20:36:16,649 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 26 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-13 20:36:19,073 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 24 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-13 20:36:37,515 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 41 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-13 20:37:02,500 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 13150, loss[loss=0.1011, beats_loss=0.0116, ecapa_loss=0.0001472, whisper_loss=0.08798, over 23482.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01083, ecapa_loss=0.0001612, whisper_loss=0.09053, over 3835143.44 frames. ], batch size: 93, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:37:07,621 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-13 20:37:10,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2305310.0, ans=0.0 2024-08-13 20:37:22,528 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2305310.0, ans=0.125 2024-08-13 20:37:37,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2305410.0, ans=0.2 2024-08-13 20:37:39,113 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.958e+01 2.460e+01 2.677e+01 2.995e+01 4.365e+01, threshold=5.353e+01, percent-clipped=0.0 2024-08-13 20:38:05,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2305510.0, ans=0.1 2024-08-13 20:38:08,026 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2305510.0, ans=0.0 2024-08-13 20:38:21,208 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2305610.0, ans=0.0 2024-08-13 20:38:41,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2305710.0, ans=0.0 2024-08-13 20:38:48,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2305710.0, ans=0.125 2024-08-13 20:38:56,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2305710.0, ans=0.125 2024-08-13 20:39:00,205 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2305710.0, ans=0.0 2024-08-13 20:39:04,645 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 13200, loss[loss=0.1346, beats_loss=0.00716, ecapa_loss=0.0001565, whisper_loss=0.1259, over 18677.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0109, ecapa_loss=0.0001609, whisper_loss=0.08995, over 3847339.33 frames. ], batch size: 69, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:39:22,756 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2305810.0, ans=0.125 2024-08-13 20:40:30,055 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2306110.0, ans=0.5 2024-08-13 20:40:40,724 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 26 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-13 20:41:12,225 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 13250, loss[loss=0.1034, beats_loss=0.01059, ecapa_loss=0.0001448, whisper_loss=0.09138, over 19252.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01084, ecapa_loss=0.0001614, whisper_loss=0.09018, over 3832314.68 frames. ], batch size: 75, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:41:12,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2306310.0, ans=0.125 2024-08-13 20:41:26,456 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 22 from LS+wenet, 31 from Vox, 36 fro AS 2024-08-13 20:41:44,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2306410.0, ans=0.2 2024-08-13 20:41:48,145 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.969e+01 2.383e+01 2.626e+01 2.998e+01 4.392e+01, threshold=5.251e+01, percent-clipped=0.0 2024-08-13 20:42:30,900 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.66 vs. limit=22.5 2024-08-13 20:42:33,425 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2306610.0, ans=0.125 2024-08-13 20:42:36,503 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2306710.0, ans=0.0 2024-08-13 20:42:39,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2306710.0, ans=0.2 2024-08-13 20:42:53,580 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 13300, loss[loss=0.08554, beats_loss=0.01192, ecapa_loss=0.0001347, whisper_loss=0.07227, over 18511.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0108, ecapa_loss=0.0001615, whisper_loss=0.09025, over 3827741.13 frames. ], batch size: 72, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:42:53,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2306810.0, ans=0.125 2024-08-13 20:43:01,654 WARNING [optim.py:496] (0/4) Scaling gradients by 0.08281490951776505, model_norm_threshold=52.51310729980469 2024-08-13 20:43:01,876 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.17, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.983e+04, grad_sumsq=6.983e+04, orig_rms_sq=1.000e+00 2024-08-13 20:43:13,156 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 11 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-13 20:43:26,863 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 26 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-13 20:43:36,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2307010.0, ans=0.1 2024-08-13 20:43:38,340 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 20 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-13 20:43:49,383 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.77 vs. limit=6.0 2024-08-13 20:43:57,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2307210.0, ans=0.125 2024-08-13 20:44:12,636 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 33 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-13 20:44:13,898 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 13350, loss[loss=0.1222, beats_loss=0.01119, ecapa_loss=0.000137, whisper_loss=0.1096, over 22872.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01091, ecapa_loss=0.0001601, whisper_loss=0.09028, over 3833814.06 frames. ], batch size: 88, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:44:19,621 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 17 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-13 20:44:37,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2307410.0, ans=0.2 2024-08-13 20:44:38,201 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.043e+01 2.444e+01 2.749e+01 3.154e+01 6.341e+02, threshold=5.498e+01, percent-clipped=1.0 2024-08-13 20:44:50,607 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2307510.0, ans=0.125 2024-08-13 20:45:16,832 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-13 20:45:20,220 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2307710.0, ans=0.2 2024-08-13 20:45:21,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2307710.0, ans=0.0 2024-08-13 20:45:24,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2307710.0, ans=0.125 2024-08-13 20:45:29,528 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2307710.0, ans=0.1 2024-08-13 20:45:34,192 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 13400, loss[loss=0.09857, beats_loss=0.01119, ecapa_loss=0.0001714, whisper_loss=0.08566, over 20946.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01088, ecapa_loss=0.0001611, whisper_loss=0.08962, over 3825200.38 frames. ], batch size: 82, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:45:36,076 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2307810.0, ans=0.07 2024-08-13 20:45:48,457 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-13 20:46:05,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2308010.0, ans=0.125 2024-08-13 20:46:05,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2308010.0, ans=0.125 2024-08-13 20:46:28,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2308110.0, ans=0.2 2024-08-13 20:46:31,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2308110.0, ans=0.0 2024-08-13 20:46:45,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2308210.0, ans=0.2 2024-08-13 20:46:54,605 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 13450, loss[loss=0.09162, beats_loss=0.01182, ecapa_loss=0.00015, whisper_loss=0.0783, over 18061.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01093, ecapa_loss=0.0001624, whisper_loss=0.08921, over 3843096.12 frames. ], batch size: 74, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:46:54,794 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 17 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-13 20:46:59,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2308310.0, ans=0.0 2024-08-13 20:47:01,142 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-13 20:47:17,874 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.358e+01 2.676e+01 3.282e+01 1.336e+02, threshold=5.353e+01, percent-clipped=2.0 2024-08-13 20:47:44,978 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 17 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-13 20:47:57,387 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2308710.0, ans=0.2 2024-08-13 20:48:01,511 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.29 vs. limit=8.0 2024-08-13 20:48:06,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2308710.0, ans=0.125 2024-08-13 20:48:13,709 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 13500, loss[loss=0.09407, beats_loss=0.01142, ecapa_loss=0.0001978, whisper_loss=0.08067, over 16914.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01088, ecapa_loss=0.0001635, whisper_loss=0.08974, over 3834818.08 frames. ], batch size: 68, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:48:35,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2308910.0, ans=0.125 2024-08-13 20:48:36,840 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 32 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-13 20:48:38,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2308910.0, ans=0.0 2024-08-13 20:48:39,785 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 14 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-13 20:48:48,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2309010.0, ans=0.125 2024-08-13 20:49:19,003 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 20 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-13 20:49:34,693 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2309310.0, ans=0.025 2024-08-13 20:49:36,494 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 13550, loss[loss=0.1003, beats_loss=0.01153, ecapa_loss=0.0001896, whisper_loss=0.08689, over 22335.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01091, ecapa_loss=0.0001629, whisper_loss=0.08972, over 3837436.64 frames. ], batch size: 93, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:49:47,002 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2309310.0, ans=0.0 2024-08-13 20:49:58,216 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 15 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-13 20:50:00,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2309410.0, ans=0.2 2024-08-13 20:50:00,955 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.845e+01 2.439e+01 2.648e+01 3.076e+01 1.090e+02, threshold=5.296e+01, percent-clipped=1.0 2024-08-13 20:50:10,689 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 12 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-13 20:50:17,246 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-13 20:50:20,310 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-13 20:50:24,186 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.06 vs. limit=15.0 2024-08-13 20:50:30,005 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2309610.0, ans=0.125 2024-08-13 20:50:30,057 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=5.086e+05 2024-08-13 20:50:35,330 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=7.675e-03 2024-08-13 20:50:56,506 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 13600, loss[loss=0.09784, beats_loss=0.008798, ecapa_loss=0.0001768, whisper_loss=0.08727, over 16051.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01093, ecapa_loss=0.0001617, whisper_loss=0.09002, over 3845031.88 frames. ], batch size: 63, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:51:03,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2309810.0, ans=0.125 2024-08-13 20:51:06,436 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2309810.0, ans=0.07 2024-08-13 20:51:09,198 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2309810.0, ans=0.0 2024-08-13 20:51:24,724 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 26 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-13 20:51:25,121 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2309910.0, ans=0.125 2024-08-13 20:51:26,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2310010.0, ans=0.1 2024-08-13 20:51:32,510 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.71 vs. limit=15.0 2024-08-13 20:51:52,974 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 33 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-13 20:51:53,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2310110.0, ans=0.125 2024-08-13 20:51:58,759 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.70 vs. limit=15.0 2024-08-13 20:52:00,937 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 20 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-13 20:52:06,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2310210.0, ans=0.125 2024-08-13 20:52:10,478 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2310210.0, ans=0.0 2024-08-13 20:52:15,415 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 13650, loss[loss=0.1022, beats_loss=0.009621, ecapa_loss=0.0001697, whisper_loss=0.09093, over 22707.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01096, ecapa_loss=0.0001619, whisper_loss=0.08985, over 3863502.60 frames. ], batch size: 91, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:52:30,867 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 23 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-13 20:52:34,275 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.35 vs. limit=15.0 2024-08-13 20:52:37,723 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.402e+01 2.676e+01 3.034e+01 5.771e+01, threshold=5.352e+01, percent-clipped=1.0 2024-08-13 20:52:40,853 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 22 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-13 20:52:42,427 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-13 20:52:45,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2310510.0, ans=0.125 2024-08-13 20:52:48,299 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 22 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-13 20:52:51,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2310510.0, ans=0.125 2024-08-13 20:52:59,756 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2310610.0, ans=0.125 2024-08-13 20:53:05,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2310610.0, ans=0.125 2024-08-13 20:53:05,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2310610.0, ans=0.1 2024-08-13 20:53:10,673 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.77 vs. limit=12.0 2024-08-13 20:53:30,203 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 13700, loss[loss=0.09922, beats_loss=0.01342, ecapa_loss=0.000146, whisper_loss=0.08434, over 22344.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01093, ecapa_loss=0.0001631, whisper_loss=0.0904, over 3856611.66 frames. ], batch size: 88, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:53:30,420 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2310810.0, ans=0.1 2024-08-13 20:53:36,862 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2310810.0, ans=0.125 2024-08-13 20:53:37,348 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.73 vs. limit=22.5 2024-08-13 20:53:45,961 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 30 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-13 20:53:49,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2310910.0, ans=0.0 2024-08-13 20:53:49,260 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.47 vs. limit=12.0 2024-08-13 20:53:53,141 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-13 20:54:07,906 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-13 20:54:12,841 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-13 20:54:19,447 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2311110.0, ans=0.125 2024-08-13 20:54:23,739 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 19 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-13 20:54:38,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2311210.0, ans=0.1 2024-08-13 20:54:43,308 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 13750, loss[loss=0.1156, beats_loss=0.008122, ecapa_loss=0.0001788, whisper_loss=0.1057, over 13534.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01081, ecapa_loss=0.0001626, whisper_loss=0.09105, over 3869215.46 frames. ], batch size: 54, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:54:59,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2311410.0, ans=0.125 2024-08-13 20:55:05,059 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.898e+01 2.299e+01 2.662e+01 2.929e+01 4.195e+01, threshold=5.323e+01, percent-clipped=0.0 2024-08-13 20:55:33,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2311610.0, ans=0.125 2024-08-13 20:55:36,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2311610.0, ans=0.2 2024-08-13 20:55:40,799 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2311710.0, ans=0.125 2024-08-13 20:55:45,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2311710.0, ans=0.2 2024-08-13 20:55:57,183 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 13800, loss[loss=0.1041, beats_loss=0.01051, ecapa_loss=0.0001769, whisper_loss=0.09178, over 16250.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01073, ecapa_loss=0.0001608, whisper_loss=0.0916, over 3867520.63 frames. ], batch size: 65, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:55:59,232 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=10.01 vs. limit=10.0 2024-08-13 20:56:00,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2311810.0, ans=0.0 2024-08-13 20:56:15,133 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-13 20:56:21,313 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=8.182e+01 2024-08-13 20:56:26,512 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 25 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-13 20:56:37,530 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.77 vs. limit=15.0 2024-08-13 20:56:42,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2312110.0, ans=0.125 2024-08-13 20:56:49,256 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 22 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-13 20:57:03,741 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2312210.0, ans=0.1 2024-08-13 20:57:08,162 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2312310.0, ans=0.125 2024-08-13 20:57:08,939 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 13850, loss[loss=0.1028, beats_loss=0.009124, ecapa_loss=0.0001788, whisper_loss=0.09189, over 20781.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01069, ecapa_loss=0.0001616, whisper_loss=0.09183, over 3860992.54 frames. ], batch size: 84, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:57:28,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2312410.0, ans=0.0 2024-08-13 20:57:31,125 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.957e+01 2.385e+01 2.789e+01 3.325e+01 4.881e+01, threshold=5.578e+01, percent-clipped=0.0 2024-08-13 20:57:32,457 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 21 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-13 20:57:59,963 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 27 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 20:58:04,647 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 25 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-13 20:58:11,514 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2312710.0, ans=0.1 2024-08-13 20:58:17,110 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.98 vs. limit=15.0 2024-08-13 20:58:21,960 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 13900, loss[loss=0.108, beats_loss=0.008196, ecapa_loss=0.0001699, whisper_loss=0.09807, over 15608.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01073, ecapa_loss=0.0001598, whisper_loss=0.09208, over 3890338.01 frames. ], batch size: 57, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:58:22,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2312810.0, ans=0.125 2024-08-13 20:58:28,980 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-13 20:58:31,452 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-13 20:58:42,320 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=17.11 vs. limit=15.0 2024-08-13 20:58:44,800 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 39 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-13 20:59:03,958 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 13 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-13 20:59:06,820 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2313110.0, ans=0.125 2024-08-13 20:59:34,203 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 13950, loss[loss=0.1226, beats_loss=0.008415, ecapa_loss=0.0001548, whisper_loss=0.1127, over 23940.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01065, ecapa_loss=0.0001611, whisper_loss=0.0926, over 3909991.28 frames. ], batch size: 92, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:59:40,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2313310.0, ans=0.125 2024-08-13 20:59:47,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2313410.0, ans=0.125 2024-08-13 20:59:51,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2313410.0, ans=0.125 2024-08-13 20:59:53,815 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2313410.0, ans=0.07 2024-08-13 20:59:56,273 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.112e+01 2.456e+01 2.773e+01 3.183e+01 5.275e+01, threshold=5.547e+01, percent-clipped=0.0 2024-08-13 21:00:04,918 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=2313510.0, ans=15.0 2024-08-13 21:00:22,041 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2313610.0, ans=0.125 2024-08-13 21:00:41,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2313710.0, ans=0.125 2024-08-13 21:00:48,826 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 14000, loss[loss=0.0881, beats_loss=0.01331, ecapa_loss=0.0001148, whisper_loss=0.07364, over 17653.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01068, ecapa_loss=0.0001613, whisper_loss=0.09223, over 3894411.97 frames. ], batch size: 69, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:00:49,547 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.14 vs. limit=12.0 2024-08-13 21:00:52,318 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 29 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-13 21:00:59,052 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2313810.0, ans=0.125 2024-08-13 21:01:00,578 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2313810.0, ans=0.0 2024-08-13 21:01:10,569 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.24 vs. limit=12.0 2024-08-13 21:01:44,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2314110.0, ans=0.0 2024-08-13 21:01:49,591 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 23 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-13 21:02:02,963 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 14050, loss[loss=0.1153, beats_loss=0.009072, ecapa_loss=0.000199, whisper_loss=0.1042, over 18612.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01065, ecapa_loss=0.0001609, whisper_loss=0.09276, over 3919453.18 frames. ], batch size: 76, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:02:11,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2314310.0, ans=0.1 2024-08-13 21:02:16,645 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.90 vs. limit=22.5 2024-08-13 21:02:24,087 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.943e+01 2.387e+01 2.626e+01 2.873e+01 4.572e+01, threshold=5.251e+01, percent-clipped=0.0 2024-08-13 21:02:25,794 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 22 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-13 21:02:37,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2314510.0, ans=0.0 2024-08-13 21:02:40,012 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 18 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-13 21:02:56,412 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 19 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-13 21:03:00,383 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.41 vs. limit=15.0 2024-08-13 21:03:09,630 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 15 from Vox, 49 fro AS 2024-08-13 21:03:12,423 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 19 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-13 21:03:15,014 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 14100, loss[loss=0.1104, beats_loss=0.0112, ecapa_loss=0.0001851, whisper_loss=0.09732, over 22005.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01065, ecapa_loss=0.0001604, whisper_loss=0.0927, over 3885845.18 frames. ], batch size: 90, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:03:28,306 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 9 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-13 21:03:37,189 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2314910.0, ans=0.125 2024-08-13 21:03:52,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2315010.0, ans=0.2 2024-08-13 21:03:53,502 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 23 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-13 21:03:53,735 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2315010.0, ans=0.125 2024-08-13 21:03:59,061 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 30 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-13 21:04:05,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2315110.0, ans=0.5 2024-08-13 21:04:08,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2315110.0, ans=0.0 2024-08-13 21:04:16,262 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2315210.0, ans=0.1 2024-08-13 21:04:18,731 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 16 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-13 21:04:21,929 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 27 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-13 21:04:26,870 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 14150, loss[loss=0.08114, beats_loss=0.01344, ecapa_loss=0.0001227, whisper_loss=0.06646, over 17641.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01077, ecapa_loss=0.0001597, whisper_loss=0.09197, over 3882164.21 frames. ], batch size: 71, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:04:44,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2315410.0, ans=0.2 2024-08-13 21:04:46,672 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 16 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-13 21:04:48,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2315410.0, ans=0.125 2024-08-13 21:04:49,561 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.891e+01 2.465e+01 2.633e+01 2.994e+01 4.985e+01, threshold=5.265e+01, percent-clipped=0.0 2024-08-13 21:04:57,205 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2315510.0, ans=0.07 2024-08-13 21:04:59,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2315510.0, ans=0.0 2024-08-13 21:05:06,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2315510.0, ans=0.125 2024-08-13 21:05:15,924 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 24 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-13 21:05:41,767 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 14200, loss[loss=0.08893, beats_loss=0.009046, ecapa_loss=0.0001903, whisper_loss=0.07798, over 13530.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01076, ecapa_loss=0.0001604, whisper_loss=0.09167, over 3885186.57 frames. ], batch size: 57, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:05:43,111 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-13 21:05:43,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2315810.0, ans=0.0 2024-08-13 21:05:43,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2315810.0, ans=0.125 2024-08-13 21:05:44,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2315810.0, ans=0.025 2024-08-13 21:05:49,061 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 28 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-13 21:05:53,862 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 29 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-13 21:06:02,814 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.74 vs. limit=6.0 2024-08-13 21:06:08,129 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.41 vs. limit=15.0 2024-08-13 21:06:19,736 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 16 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-13 21:06:45,263 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 24 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-13 21:07:00,904 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 14250, loss[loss=0.1196, beats_loss=0.008009, ecapa_loss=0.000177, whisper_loss=0.1098, over 21194.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01065, ecapa_loss=0.0001615, whisper_loss=0.09207, over 3865325.57 frames. ], batch size: 82, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:07:12,913 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 18 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-13 21:07:23,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2316410.0, ans=0.125 2024-08-13 21:07:24,719 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.814e+01 2.489e+01 2.737e+01 3.188e+01 4.877e+01, threshold=5.475e+01, percent-clipped=0.0 2024-08-13 21:07:25,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2316410.0, ans=0.125 2024-08-13 21:07:39,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2316510.0, ans=0.125 2024-08-13 21:07:53,418 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 19 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-13 21:08:17,093 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 14300, loss[loss=0.1051, beats_loss=0.01025, ecapa_loss=0.000154, whisper_loss=0.09327, over 23284.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01068, ecapa_loss=0.0001608, whisper_loss=0.09155, over 3871994.50 frames. ], batch size: 94, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:08:32,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2316910.0, ans=0.05 2024-08-13 21:08:34,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2316910.0, ans=0.025 2024-08-13 21:08:35,806 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 20 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-13 21:08:56,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2317010.0, ans=0.125 2024-08-13 21:09:01,693 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2317110.0, ans=0.125 2024-08-13 21:09:04,264 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.05 vs. limit=8.0 2024-08-13 21:09:13,671 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 12 from Vox, 44 fro AS 2024-08-13 21:09:33,240 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 14350, loss[loss=0.1141, beats_loss=0.008376, ecapa_loss=0.0001955, whisper_loss=0.1038, over 15506.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01068, ecapa_loss=0.0001599, whisper_loss=0.09141, over 3854778.35 frames. ], batch size: 58, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:09:42,658 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.87 vs. limit=15.0 2024-08-13 21:09:56,041 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.901e+01 2.382e+01 2.716e+01 3.017e+01 1.009e+02, threshold=5.432e+01, percent-clipped=2.0 2024-08-13 21:10:20,147 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.11 vs. limit=22.5 2024-08-13 21:10:40,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2317710.0, ans=0.125 2024-08-13 21:10:42,893 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=7.874e-03 2024-08-13 21:10:49,421 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 14400, loss[loss=0.1348, beats_loss=0.01055, ecapa_loss=0.000148, whisper_loss=0.1228, over 23091.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01076, ecapa_loss=0.00016, whisper_loss=0.09167, over 3867771.67 frames. ], batch size: 89, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:10:58,232 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2317810.0, ans=0.0 2024-08-13 21:11:01,077 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2317810.0, ans=0.125 2024-08-13 21:11:02,400 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2317810.0, ans=0.125 2024-08-13 21:11:05,326 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 34 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-13 21:11:18,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2318010.0, ans=0.125 2024-08-13 21:11:20,864 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 22 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-13 21:11:21,221 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2318010.0, ans=0.5 2024-08-13 21:11:29,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2318010.0, ans=0.0 2024-08-13 21:11:32,358 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2318010.0, ans=0.2 2024-08-13 21:11:34,613 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2318110.0, ans=0.0 2024-08-13 21:11:34,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2318110.0, ans=0.125 2024-08-13 21:11:39,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2318110.0, ans=0.125 2024-08-13 21:11:41,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2318110.0, ans=0.125 2024-08-13 21:11:44,911 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.12 vs. limit=15.0 2024-08-13 21:11:56,956 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2318210.0, ans=0.1 2024-08-13 21:12:05,864 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 14450, loss[loss=0.07829, beats_loss=0.01069, ecapa_loss=0.0001869, whisper_loss=0.06573, over 16761.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01089, ecapa_loss=0.0001596, whisper_loss=0.09061, over 3875361.69 frames. ], batch size: 68, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:12:06,229 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2318310.0, ans=0.125 2024-08-13 21:12:10,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2318310.0, ans=0.125 2024-08-13 21:12:15,127 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 27 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-13 21:12:15,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2318310.0, ans=0.0 2024-08-13 21:12:18,043 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2318310.0, ans=0.0 2024-08-13 21:12:18,084 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2318310.0, ans=0.2 2024-08-13 21:12:28,338 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.918e+01 2.376e+01 2.680e+01 3.028e+01 6.046e+01, threshold=5.360e+01, percent-clipped=1.0 2024-08-13 21:12:36,881 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2318510.0, ans=0.125 2024-08-13 21:12:51,768 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.93 vs. limit=10.0 2024-08-13 21:13:08,323 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/epoch-16.pt 2024-08-13 21:13:47,650 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 0, loss[loss=0.1291, beats_loss=0.009004, ecapa_loss=0.0001574, whisper_loss=0.1185, over 21852.00 frames. ], tot_loss[loss=0.1291, beats_loss=0.009004, ecapa_loss=0.0001574, whisper_loss=0.1185, over 21852.00 frames. ], batch size: 84, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:13:47,651 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-13 21:14:29,933 INFO [train_multi_KD3.py:1149] (0/4) Epoch 17, validation on ASR_libri: loss=0.2533, beats_loss=0, ecapa_loss=0.0005592, whisper_loss=0.2478, over 922467.00 frames. 2024-08-13 21:14:37,476 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.6198, 2.4792, 2.8001, 2.1177], device='cuda:0') 2024-08-13 21:14:45,245 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.0354, 4.5518, 4.7894, 4.9723], device='cuda:0') 2024-08-13 21:14:46,271 INFO [train_multi_KD3.py:1149] (0/4) Epoch 17, validation on SV_voxceleb1: loss=0.004509, beats_loss=0, ecapa_loss=0.0004509, whisper_loss=0, over 939242.00 frames. 2024-08-13 21:16:46,351 INFO [train_multi_KD3.py:1149] (0/4) Epoch 17, validation on AT_audioset: loss=0.02361, beats_loss=0.02361, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 21:16:46,356 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32601MB 2024-08-13 21:16:59,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2318730.0, ans=0.07 2024-08-13 21:16:59,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2318730.0, ans=0.1 2024-08-13 21:17:03,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2318730.0, ans=0.125 2024-08-13 21:17:05,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2318730.0, ans=0.125 2024-08-13 21:17:13,784 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-13 21:17:42,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2318930.0, ans=0.125 2024-08-13 21:17:42,836 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.66 vs. limit=15.0 2024-08-13 21:17:50,164 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-13 21:18:33,480 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 38 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-13 21:18:46,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2319130.0, ans=0.1 2024-08-13 21:18:58,535 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 50, loss[loss=0.09859, beats_loss=0.01281, ecapa_loss=0.0001388, whisper_loss=0.08439, over 20458.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01001, ecapa_loss=0.0001649, whisper_loss=0.09153, over 900973.68 frames. ], batch size: 84, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:19:21,408 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 18 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-13 21:19:55,546 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.055e+01 2.701e+01 3.109e+01 3.430e+01 6.788e+01, threshold=6.217e+01, percent-clipped=2.0 2024-08-13 21:20:10,131 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2319530.0, ans=0.2 2024-08-13 21:20:12,544 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-13 21:20:20,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2319530.0, ans=0.09899494936611666 2024-08-13 21:20:29,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2319530.0, ans=0.1 2024-08-13 21:20:53,264 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2319630.0, ans=0.125 2024-08-13 21:20:53,537 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=2319630.0, ans=15.0 2024-08-13 21:21:00,586 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 100, loss[loss=0.1117, beats_loss=0.009297, ecapa_loss=0.0001943, whisper_loss=0.1005, over 21040.00 frames. ], tot_loss[loss=0.102, beats_loss=0.009955, ecapa_loss=0.0001663, whisper_loss=0.0904, over 1579742.02 frames. ], batch size: 90, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:21:00,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2319730.0, ans=0.125 2024-08-13 21:21:47,732 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2319930.0, ans=0.04949747468305833 2024-08-13 21:22:01,261 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-232000.pt 2024-08-13 21:22:19,194 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=8.543e-02 2024-08-13 21:22:21,253 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2320030.0, ans=0.125 2024-08-13 21:22:50,827 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 24 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-13 21:22:52,639 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 150, loss[loss=0.1104, beats_loss=0.009652, ecapa_loss=0.0001575, whisper_loss=0.0992, over 17301.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.009975, ecapa_loss=0.0001635, whisper_loss=0.09054, over 2089542.25 frames. ], batch size: 68, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:23:05,702 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.36 vs. limit=15.0 2024-08-13 21:23:16,187 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 35 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-13 21:23:22,584 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 32 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-13 21:23:29,388 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 13 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-13 21:23:32,377 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.074e+01 2.591e+01 2.910e+01 3.226e+01 4.259e+01, threshold=5.820e+01, percent-clipped=0.0 2024-08-13 21:23:42,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2320530.0, ans=0.0 2024-08-13 21:23:42,293 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2320530.0, ans=0.125 2024-08-13 21:23:46,391 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.40 vs. limit=22.5 2024-08-13 21:24:00,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2320630.0, ans=0.125 2024-08-13 21:24:12,175 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2320630.0, ans=0.125 2024-08-13 21:24:16,150 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 200, loss[loss=0.08299, beats_loss=0.01037, ecapa_loss=0.0001869, whisper_loss=0.07075, over 14254.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01005, ecapa_loss=0.0001636, whisper_loss=0.09112, over 2441087.32 frames. ], batch size: 56, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:24:28,998 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 17 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-13 21:24:31,771 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 14 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-13 21:24:31,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2320830.0, ans=0.035 2024-08-13 21:24:34,883 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-13 21:25:13,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2321030.0, ans=0.0 2024-08-13 21:25:19,425 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2321030.0, ans=0.125 2024-08-13 21:25:25,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2321130.0, ans=0.125 2024-08-13 21:25:27,496 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 21 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-13 21:25:38,712 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 250, loss[loss=0.09455, beats_loss=0.0098, ecapa_loss=0.000192, whisper_loss=0.08283, over 21088.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01014, ecapa_loss=0.0001645, whisper_loss=0.0909, over 2734272.72 frames. ], batch size: 85, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:25:42,425 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2321230.0, ans=0.0 2024-08-13 21:25:52,177 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 21 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-13 21:26:05,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2321330.0, ans=0.0 2024-08-13 21:26:05,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2321330.0, ans=0.2 2024-08-13 21:26:17,150 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.792e+01 2.338e+01 2.625e+01 3.056e+01 3.496e+02, threshold=5.250e+01, percent-clipped=1.0 2024-08-13 21:26:24,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2321430.0, ans=0.0 2024-08-13 21:26:40,409 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 22 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-13 21:26:52,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2321630.0, ans=0.125 2024-08-13 21:27:02,162 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 300, loss[loss=0.1016, beats_loss=0.01242, ecapa_loss=0.0001245, whisper_loss=0.08796, over 20042.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01032, ecapa_loss=0.0001628, whisper_loss=0.09026, over 2957704.33 frames. ], batch size: 76, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:27:04,221 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 16 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-13 21:27:19,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2321830.0, ans=0.0 2024-08-13 21:27:21,174 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-13 21:27:42,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2321930.0, ans=0.1 2024-08-13 21:27:42,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2321930.0, ans=0.125 2024-08-13 21:27:45,998 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 26 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-13 21:28:27,968 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2322230.0, ans=0.125 2024-08-13 21:28:29,140 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 350, loss[loss=0.104, beats_loss=0.009504, ecapa_loss=0.000144, whisper_loss=0.09308, over 17286.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01048, ecapa_loss=0.0001621, whisper_loss=0.0904, over 3154477.62 frames. ], batch size: 62, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:28:35,456 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 27 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-13 21:28:43,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2322230.0, ans=0.2 2024-08-13 21:29:08,368 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.409e+01 2.739e+01 3.112e+01 5.763e+01, threshold=5.479e+01, percent-clipped=3.0 2024-08-13 21:29:22,714 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 25 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-13 21:29:24,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2322530.0, ans=0.1 2024-08-13 21:29:38,914 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.44 vs. limit=15.0 2024-08-13 21:29:40,220 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-13 21:29:57,546 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 400, loss[loss=0.1043, beats_loss=0.008761, ecapa_loss=0.0001756, whisper_loss=0.09379, over 16229.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01059, ecapa_loss=0.00016, whisper_loss=0.08926, over 3292535.46 frames. ], batch size: 65, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:29:58,378 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.34 vs. limit=12.0 2024-08-13 21:30:04,121 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2322730.0, ans=0.1 2024-08-13 21:30:18,089 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 35 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-13 21:30:29,005 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.20 vs. limit=6.0 2024-08-13 21:30:34,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2322930.0, ans=0.0 2024-08-13 21:30:42,350 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-13 21:30:57,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2323030.0, ans=0.1 2024-08-13 21:31:02,575 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2323030.0, ans=0.125 2024-08-13 21:31:05,688 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-13 21:31:24,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2323230.0, ans=0.125 2024-08-13 21:31:25,642 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 450, loss[loss=0.08835, beats_loss=0.01251, ecapa_loss=0.0001683, whisper_loss=0.07415, over 21375.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01067, ecapa_loss=0.0001593, whisper_loss=0.08924, over 3434703.51 frames. ], batch size: 89, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:31:37,649 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 22 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-13 21:31:54,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2323330.0, ans=0.0 2024-08-13 21:31:54,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2323330.0, ans=0.125 2024-08-13 21:32:06,915 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.385e+01 2.600e+01 2.999e+01 5.733e+01, threshold=5.200e+01, percent-clipped=1.0 2024-08-13 21:32:10,790 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.94 vs. limit=15.0 2024-08-13 21:32:17,151 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2323530.0, ans=0.2 2024-08-13 21:32:17,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2323530.0, ans=0.125 2024-08-13 21:32:31,812 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 17 from LS+wenet, 8 from Vox, 31 fro AS 2024-08-13 21:32:39,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2323630.0, ans=0.0 2024-08-13 21:32:46,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2323630.0, ans=0.125 2024-08-13 21:32:52,067 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 500, loss[loss=0.1062, beats_loss=0.009615, ecapa_loss=0.0001875, whisper_loss=0.09474, over 19300.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0107, ecapa_loss=0.0001593, whisper_loss=0.08957, over 3515334.78 frames. ], batch size: 76, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:32:58,600 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.92 vs. limit=15.0 2024-08-13 21:32:59,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2323730.0, ans=0.125 2024-08-13 21:33:03,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2323730.0, ans=0.125 2024-08-13 21:33:03,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2323730.0, ans=0.125 2024-08-13 21:33:08,723 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2323830.0, ans=0.2 2024-08-13 21:33:10,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2323830.0, ans=0.125 2024-08-13 21:33:20,118 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 23 from LS+wenet, 17 from Vox, 14 fro AS 2024-08-13 21:33:22,864 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.46 vs. limit=22.5 2024-08-13 21:33:33,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2323930.0, ans=0.0 2024-08-13 21:33:46,784 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 29 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-13 21:34:13,714 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 550, loss[loss=0.1122, beats_loss=0.01159, ecapa_loss=0.0001764, whisper_loss=0.09887, over 17315.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01077, ecapa_loss=0.0001587, whisper_loss=0.08949, over 3567950.28 frames. ], batch size: 71, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:34:13,804 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-13 21:34:14,195 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.15 vs. limit=6.0 2024-08-13 21:34:23,440 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2324230.0, ans=0.95 2024-08-13 21:34:25,366 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 15 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-13 21:34:49,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2324430.0, ans=0.125 2024-08-13 21:34:51,305 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.730e+01 2.278e+01 2.510e+01 2.744e+01 4.092e+01, threshold=5.020e+01, percent-clipped=0.0 2024-08-13 21:35:02,134 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 16 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-13 21:35:14,595 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-13 21:35:25,117 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2324630.0, ans=0.125 2024-08-13 21:35:30,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2324730.0, ans=0.04949747468305833 2024-08-13 21:35:31,572 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 600, loss[loss=0.08386, beats_loss=0.01092, ecapa_loss=0.0001503, whisper_loss=0.07144, over 16805.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01075, ecapa_loss=0.0001588, whisper_loss=0.08998, over 3629702.75 frames. ], batch size: 64, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:35:35,017 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 27 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-13 21:36:00,827 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 24 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-13 21:36:02,341 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2324930.0, ans=0.0 2024-08-13 21:36:14,177 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2325030.0, ans=0.125 2024-08-13 21:36:17,807 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-13 21:36:28,557 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2325130.0, ans=0.125 2024-08-13 21:36:38,418 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 650, loss[loss=0.1039, beats_loss=0.008901, ecapa_loss=0.0001794, whisper_loss=0.09317, over 20199.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01063, ecapa_loss=0.0001597, whisper_loss=0.09084, over 3659896.41 frames. ], batch size: 81, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:36:42,826 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 27 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-13 21:36:56,796 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-13 21:37:02,390 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.51 vs. limit=15.0 2024-08-13 21:37:09,464 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.961e+01 2.389e+01 2.704e+01 3.013e+01 8.978e+01, threshold=5.408e+01, percent-clipped=2.0 2024-08-13 21:37:12,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2325430.0, ans=0.1 2024-08-13 21:37:23,379 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.52 vs. limit=22.5 2024-08-13 21:37:27,095 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2325530.0, ans=0.125 2024-08-13 21:37:28,424 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2325530.0, ans=0.125 2024-08-13 21:37:33,722 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.78 vs. limit=15.0 2024-08-13 21:37:41,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2325630.0, ans=0.125 2024-08-13 21:37:43,675 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 700, loss[loss=0.1059, beats_loss=0.01114, ecapa_loss=0.0001433, whisper_loss=0.09331, over 23879.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01061, ecapa_loss=0.0001607, whisper_loss=0.09009, over 3679450.08 frames. ], batch size: 93, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:37:48,898 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.44 vs. limit=6.0 2024-08-13 21:37:49,443 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2325730.0, ans=0.125 2024-08-13 21:37:56,930 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 22 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-13 21:37:59,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2325830.0, ans=0.0 2024-08-13 21:38:03,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2325830.0, ans=0.0 2024-08-13 21:38:12,708 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 21 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-13 21:38:15,443 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2325930.0, ans=0.025 2024-08-13 21:38:23,155 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.60 vs. limit=15.0 2024-08-13 21:38:27,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2326030.0, ans=0.2 2024-08-13 21:38:34,463 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 22 from LS+wenet, 22 from Vox, 49 fro AS 2024-08-13 21:38:43,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2326130.0, ans=0.0 2024-08-13 21:38:48,984 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 750, loss[loss=0.09511, beats_loss=0.01209, ecapa_loss=0.0001697, whisper_loss=0.08132, over 21611.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01066, ecapa_loss=0.0001587, whisper_loss=0.08975, over 3727435.25 frames. ], batch size: 90, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:38:49,341 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2326230.0, ans=0.125 2024-08-13 21:38:53,668 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.49 vs. limit=12.0 2024-08-13 21:38:54,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2326230.0, ans=0.2 2024-08-13 21:39:02,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2326330.0, ans=0.125 2024-08-13 21:39:05,824 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 25 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-13 21:39:07,647 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.04 vs. limit=12.0 2024-08-13 21:39:13,745 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-13 21:39:19,868 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.349e+01 2.525e+01 2.805e+01 4.000e+01, threshold=5.049e+01, percent-clipped=0.0 2024-08-13 21:39:22,860 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2326430.0, ans=0.125 2024-08-13 21:39:25,273 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 13 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-13 21:39:37,891 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.14 vs. limit=22.5 2024-08-13 21:39:39,986 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 25 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-13 21:39:54,223 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 800, loss[loss=0.102, beats_loss=0.009329, ecapa_loss=0.0001239, whisper_loss=0.09148, over 17316.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01066, ecapa_loss=0.0001589, whisper_loss=0.08955, over 3753627.80 frames. ], batch size: 62, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:39:55,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2326730.0, ans=0.125 2024-08-13 21:39:58,294 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-13 21:40:07,914 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.33 vs. limit=15.0 2024-08-13 21:40:23,203 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 24 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-13 21:40:47,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2327130.0, ans=0.1 2024-08-13 21:40:49,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2327130.0, ans=0.125 2024-08-13 21:40:55,881 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.00 vs. limit=15.0 2024-08-13 21:40:59,027 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 850, loss[loss=0.1142, beats_loss=0.01017, ecapa_loss=0.0001459, whisper_loss=0.1026, over 22774.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01068, ecapa_loss=0.000159, whisper_loss=0.0894, over 3764436.74 frames. ], batch size: 90, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:41:13,856 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 23 from LS+wenet, 30 from Vox, 40 fro AS 2024-08-13 21:41:21,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2327330.0, ans=0.2 2024-08-13 21:41:25,743 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2327430.0, ans=0.0 2024-08-13 21:41:30,590 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.619e+01 2.387e+01 2.596e+01 3.123e+01 5.757e+01, threshold=5.192e+01, percent-clipped=1.0 2024-08-13 21:41:47,792 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 14 from Vox, 43 fro AS 2024-08-13 21:41:58,474 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2327630.0, ans=0.1 2024-08-13 21:42:04,941 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 900, loss[loss=0.09855, beats_loss=0.007522, ecapa_loss=0.0001676, whisper_loss=0.08936, over 20340.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01065, ecapa_loss=0.0001584, whisper_loss=0.08955, over 3773377.00 frames. ], batch size: 77, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:42:06,585 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2327730.0, ans=0.0 2024-08-13 21:42:07,629 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 33 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-13 21:42:33,532 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 25 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-13 21:42:37,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2327930.0, ans=0.125 2024-08-13 21:42:40,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2327930.0, ans=0.125 2024-08-13 21:42:44,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2328030.0, ans=0.0 2024-08-13 21:42:48,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2328030.0, ans=0.0 2024-08-13 21:42:58,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2328130.0, ans=0.125 2024-08-13 21:42:59,594 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2328130.0, ans=0.2 2024-08-13 21:43:08,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2328230.0, ans=0.2 2024-08-13 21:43:09,693 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 950, loss[loss=0.1083, beats_loss=0.008508, ecapa_loss=0.0001919, whisper_loss=0.0979, over 15530.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01065, ecapa_loss=0.0001589, whisper_loss=0.08951, over 3765370.95 frames. ], batch size: 62, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:43:38,745 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 12 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-13 21:43:41,098 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.841e+01 2.367e+01 2.632e+01 3.016e+01 5.732e+01, threshold=5.263e+01, percent-clipped=3.0 2024-08-13 21:43:46,979 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.44 vs. limit=12.0 2024-08-13 21:43:51,886 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 25 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-13 21:44:04,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2328630.0, ans=0.125 2024-08-13 21:44:08,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2328630.0, ans=0.125 2024-08-13 21:44:15,213 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 1000, loss[loss=0.1058, beats_loss=0.009713, ecapa_loss=0.0001694, whisper_loss=0.09442, over 22631.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01067, ecapa_loss=0.0001583, whisper_loss=0.08945, over 3765827.48 frames. ], batch size: 90, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:44:24,511 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 15 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-13 21:44:36,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2328830.0, ans=0.125 2024-08-13 21:44:42,625 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.37 vs. limit=22.5 2024-08-13 21:44:56,492 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2329030.0, ans=0.125 2024-08-13 21:44:57,422 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 24 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-13 21:45:02,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2329030.0, ans=0.015 2024-08-13 21:45:05,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2329030.0, ans=0.125 2024-08-13 21:45:21,000 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 1050, loss[loss=0.06755, beats_loss=0.01242, ecapa_loss=0.0001344, whisper_loss=0.05379, over 14410.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01074, ecapa_loss=0.0001578, whisper_loss=0.08868, over 3775033.16 frames. ], batch size: 58, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:45:24,903 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-13 21:45:51,279 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2329430.0, ans=0.0 2024-08-13 21:45:52,008 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.865e+01 2.420e+01 2.664e+01 2.978e+01 4.899e+01, threshold=5.328e+01, percent-clipped=0.0 2024-08-13 21:45:56,206 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-13 21:46:00,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2329530.0, ans=0.125 2024-08-13 21:46:08,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2329530.0, ans=0.1 2024-08-13 21:46:08,865 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.72 vs. limit=15.0 2024-08-13 21:46:11,270 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.31 vs. limit=10.0 2024-08-13 21:46:18,893 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.59 vs. limit=10.0 2024-08-13 21:46:21,168 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=8.184e+00 2024-08-13 21:46:26,192 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 1100, loss[loss=0.08486, beats_loss=0.01306, ecapa_loss=0.0001396, whisper_loss=0.07041, over 14816.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01072, ecapa_loss=0.0001566, whisper_loss=0.08857, over 3771995.62 frames. ], batch size: 60, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:46:26,375 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 26 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-13 21:46:30,242 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2329730.0, ans=0.125 2024-08-13 21:46:39,557 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 19 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-13 21:46:50,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2329830.0, ans=0.0 2024-08-13 21:46:56,916 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2329930.0, ans=0.125 2024-08-13 21:47:10,300 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2330030.0, ans=0.125 2024-08-13 21:47:11,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2330030.0, ans=0.125 2024-08-13 21:47:32,342 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 1150, loss[loss=0.09145, beats_loss=0.01261, ecapa_loss=0.0001248, whisper_loss=0.07759, over 16091.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01066, ecapa_loss=0.0001565, whisper_loss=0.0896, over 3791989.34 frames. ], batch size: 62, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:47:32,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2330230.0, ans=0.0 2024-08-13 21:47:33,696 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 21 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-13 21:47:46,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2330330.0, ans=0.035 2024-08-13 21:47:53,693 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 33 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-13 21:48:01,182 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 21 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-13 21:48:03,683 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.427e+01 2.716e+01 3.117e+01 1.034e+02, threshold=5.432e+01, percent-clipped=2.0 2024-08-13 21:48:21,044 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-13 21:48:28,133 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2330630.0, ans=0.0 2024-08-13 21:48:33,624 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.41 vs. limit=12.0 2024-08-13 21:48:35,574 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 15 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-13 21:48:37,910 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 1200, loss[loss=0.1021, beats_loss=0.01101, ecapa_loss=0.0001769, whisper_loss=0.08935, over 20663.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01063, ecapa_loss=0.0001564, whisper_loss=0.09014, over 3797529.15 frames. ], batch size: 82, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:48:38,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2330730.0, ans=0.2 2024-08-13 21:48:41,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2330730.0, ans=0.09899494936611666 2024-08-13 21:48:44,810 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 20 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-13 21:48:47,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2330730.0, ans=0.0 2024-08-13 21:48:49,911 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 22 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-13 21:49:12,910 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.68 vs. limit=10.0 2024-08-13 21:49:13,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2330930.0, ans=0.2 2024-08-13 21:49:22,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2331030.0, ans=0.1 2024-08-13 21:49:24,412 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.59 vs. limit=15.0 2024-08-13 21:49:42,269 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 15 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-13 21:49:43,304 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 1250, loss[loss=0.08311, beats_loss=0.01177, ecapa_loss=0.0001265, whisper_loss=0.07007, over 15507.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01076, ecapa_loss=0.0001569, whisper_loss=0.08903, over 3773714.55 frames. ], batch size: 59, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:49:49,523 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.49 vs. limit=15.0 2024-08-13 21:49:53,574 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 25 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-13 21:49:56,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2331330.0, ans=0.2 2024-08-13 21:50:08,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2331430.0, ans=0.125 2024-08-13 21:50:14,255 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.232e+01 2.491e+01 2.766e+01 6.956e+01, threshold=4.983e+01, percent-clipped=1.0 2024-08-13 21:50:18,304 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2331430.0, ans=0.1 2024-08-13 21:50:35,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=2331630.0, ans=0.02 2024-08-13 21:50:43,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2331630.0, ans=0.125 2024-08-13 21:50:48,670 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 1300, loss[loss=0.1078, beats_loss=0.01161, ecapa_loss=0.0001455, whisper_loss=0.0947, over 16552.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01084, ecapa_loss=0.000156, whisper_loss=0.08926, over 3805242.15 frames. ], batch size: 67, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:50:52,809 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 27 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-13 21:50:54,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2331730.0, ans=0.125 2024-08-13 21:51:05,283 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 19 from LS+wenet, 26 from Vox, 21 fro AS 2024-08-13 21:51:05,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2331830.0, ans=0.025 2024-08-13 21:51:18,939 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 17 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-13 21:51:27,178 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.12 vs. limit=15.0 2024-08-13 21:51:28,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2332030.0, ans=0.125 2024-08-13 21:51:34,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2332030.0, ans=0.0 2024-08-13 21:51:36,047 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.99 vs. limit=22.5 2024-08-13 21:51:42,944 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 23 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-13 21:51:56,371 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 1350, loss[loss=0.1081, beats_loss=0.009219, ecapa_loss=0.0001575, whisper_loss=0.09731, over 18124.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0108, ecapa_loss=0.0001555, whisper_loss=0.08953, over 3799845.54 frames. ], batch size: 74, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:52:11,138 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=14.59 vs. limit=15.0 2024-08-13 21:52:19,433 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 23 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-13 21:52:23,804 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 21 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-13 21:52:26,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2332430.0, ans=0.125 2024-08-13 21:52:30,697 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.910e+01 2.343e+01 2.685e+01 2.934e+01 4.089e+01, threshold=5.369e+01, percent-clipped=0.0 2024-08-13 21:52:36,514 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.76 vs. limit=15.0 2024-08-13 21:52:47,251 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-13 21:52:52,152 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 26 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-13 21:53:01,435 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2332630.0, ans=0.035 2024-08-13 21:53:06,093 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2332630.0, ans=0.1 2024-08-13 21:53:10,602 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 1400, loss[loss=0.08874, beats_loss=0.01142, ecapa_loss=0.0001602, whisper_loss=0.07572, over 18116.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01075, ecapa_loss=0.0001567, whisper_loss=0.08897, over 3807226.18 frames. ], batch size: 73, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:53:12,374 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 26 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-13 21:53:12,599 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 21:53:21,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2332730.0, ans=0.1 2024-08-13 21:53:39,694 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-13 21:53:47,902 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 30 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-13 21:53:58,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2333030.0, ans=0.2 2024-08-13 21:54:24,795 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 1450, loss[loss=0.1164, beats_loss=0.008493, ecapa_loss=0.0001803, whisper_loss=0.1061, over 23533.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01076, ecapa_loss=0.0001566, whisper_loss=0.08907, over 3783254.77 frames. ], batch size: 92, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:55:21,931 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.338e+01 2.604e+01 2.874e+01 4.710e+01, threshold=5.208e+01, percent-clipped=0.0 2024-08-13 21:55:34,906 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2333530.0, ans=0.0 2024-08-13 21:55:38,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2333530.0, ans=0.125 2024-08-13 21:56:01,156 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 1500, loss[loss=0.1048, beats_loss=0.009955, ecapa_loss=0.0001598, whisper_loss=0.09324, over 22939.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01077, ecapa_loss=0.000156, whisper_loss=0.08856, over 3770344.12 frames. ], batch size: 93, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:56:26,139 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 33 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-13 21:56:30,757 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-13 21:56:32,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2333930.0, ans=0.2 2024-08-13 21:56:48,524 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2334030.0, ans=0.1 2024-08-13 21:56:58,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2334130.0, ans=0.0 2024-08-13 21:57:14,649 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 1550, loss[loss=0.08875, beats_loss=0.009137, ecapa_loss=0.0001452, whisper_loss=0.07816, over 14944.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01078, ecapa_loss=0.0001553, whisper_loss=0.08858, over 3766110.54 frames. ], batch size: 56, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:57:17,056 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 23 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-13 21:57:17,383 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2334230.0, ans=0.0 2024-08-13 21:57:30,558 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-13 21:57:51,450 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.314e+01 2.571e+01 2.868e+01 3.932e+01, threshold=5.142e+01, percent-clipped=0.0 2024-08-13 21:58:20,874 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 22 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-13 21:58:29,846 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 1600, loss[loss=0.09493, beats_loss=0.0135, ecapa_loss=0.0001417, whisper_loss=0.08001, over 21938.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.0108, ecapa_loss=0.0001546, whisper_loss=0.08874, over 3811246.00 frames. ], batch size: 89, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:58:43,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2334830.0, ans=0.09899494936611666 2024-08-13 21:58:49,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2334830.0, ans=0.125 2024-08-13 21:58:52,043 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2334830.0, ans=0.125 2024-08-13 21:58:54,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2334830.0, ans=0.07 2024-08-13 21:59:10,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2334930.0, ans=0.125 2024-08-13 21:59:17,021 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 24 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-13 21:59:26,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2335130.0, ans=0.125 2024-08-13 21:59:36,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2335130.0, ans=0.1 2024-08-13 21:59:39,537 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2335130.0, ans=0.0 2024-08-13 21:59:41,624 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 1650, loss[loss=0.09398, beats_loss=0.01178, ecapa_loss=0.0001545, whisper_loss=0.08065, over 15256.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01079, ecapa_loss=0.0001542, whisper_loss=0.08913, over 3822703.94 frames. ], batch size: 61, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:59:46,339 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 16 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-13 21:59:49,484 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2335230.0, ans=0.2 2024-08-13 21:59:50,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2335230.0, ans=0.125 2024-08-13 21:59:50,903 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2335230.0, ans=0.1 2024-08-13 22:00:15,684 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.351e+01 2.606e+01 2.894e+01 4.343e+01, threshold=5.211e+01, percent-clipped=0.0 2024-08-13 22:00:23,265 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2335530.0, ans=0.0 2024-08-13 22:00:44,454 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2335630.0, ans=0.0 2024-08-13 22:00:52,859 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 1700, loss[loss=0.1256, beats_loss=0.01196, ecapa_loss=0.0001241, whisper_loss=0.1124, over 24592.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01077, ecapa_loss=0.0001547, whisper_loss=0.0899, over 3841339.97 frames. ], batch size: 93, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:00:56,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2335730.0, ans=0.125 2024-08-13 22:01:21,213 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 28 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-13 22:01:25,741 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.13 vs. limit=15.0 2024-08-13 22:01:44,320 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-13 22:01:44,530 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2336030.0, ans=0.0 2024-08-13 22:01:48,698 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-13 22:02:02,561 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 1750, loss[loss=0.1236, beats_loss=0.009224, ecapa_loss=0.0001367, whisper_loss=0.113, over 17820.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01082, ecapa_loss=0.0001537, whisper_loss=0.09006, over 3845045.63 frames. ], batch size: 65, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:02:04,064 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 20 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-13 22:02:05,862 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.09 vs. limit=15.0 2024-08-13 22:02:12,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2336230.0, ans=0.025 2024-08-13 22:02:13,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2336230.0, ans=0.125 2024-08-13 22:02:15,609 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 25 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-13 22:02:18,264 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 33 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-13 22:02:19,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2336330.0, ans=0.0 2024-08-13 22:02:20,233 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.61 vs. limit=15.0 2024-08-13 22:02:24,410 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 27 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-13 22:02:28,521 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-13 22:02:35,027 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.877e+01 2.335e+01 2.606e+01 3.098e+01 1.901e+02, threshold=5.212e+01, percent-clipped=3.0 2024-08-13 22:02:56,652 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2336630.0, ans=0.125 2024-08-13 22:03:11,806 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 1800, loss[loss=0.08462, beats_loss=0.01192, ecapa_loss=0.0001638, whisper_loss=0.07106, over 19848.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01074, ecapa_loss=0.0001552, whisper_loss=0.09032, over 3853557.19 frames. ], batch size: 82, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:03:11,971 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 21 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-13 22:03:27,135 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.79 vs. limit=12.0 2024-08-13 22:03:45,866 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2336930.0, ans=0.125 2024-08-13 22:04:00,362 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.372e-01 2024-08-13 22:04:05,515 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 15 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-13 22:04:16,683 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 26 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-13 22:04:20,861 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 1850, loss[loss=0.1054, beats_loss=0.009047, ecapa_loss=0.0001329, whisper_loss=0.09504, over 15206.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01066, ecapa_loss=0.0001561, whisper_loss=0.09035, over 3835715.98 frames. ], batch size: 56, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:04:21,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2337230.0, ans=0.0 2024-08-13 22:04:26,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2337230.0, ans=0.0 2024-08-13 22:04:26,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2337230.0, ans=0.125 2024-08-13 22:04:52,893 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.324e+01 2.518e+01 2.718e+01 4.142e+01, threshold=5.036e+01, percent-clipped=0.0 2024-08-13 22:05:04,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2337530.0, ans=0.2 2024-08-13 22:05:30,327 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 1900, loss[loss=0.09398, beats_loss=0.01276, ecapa_loss=0.0001214, whisper_loss=0.08001, over 20805.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01071, ecapa_loss=0.0001556, whisper_loss=0.08944, over 3825030.07 frames. ], batch size: 81, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:05:59,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2337830.0, ans=0.125 2024-08-13 22:06:15,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2337930.0, ans=0.0 2024-08-13 22:06:37,607 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 32 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-13 22:06:37,790 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2338130.0, ans=0.0 2024-08-13 22:06:42,820 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.70 vs. limit=12.0 2024-08-13 22:06:52,659 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 1950, loss[loss=0.09733, beats_loss=0.01114, ecapa_loss=0.0001822, whisper_loss=0.08437, over 21938.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01072, ecapa_loss=0.0001548, whisper_loss=0.08965, over 3837309.65 frames. ], batch size: 89, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:06:53,432 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.43 vs. limit=15.0 2024-08-13 22:06:59,141 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-13 22:07:30,593 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.995e+01 2.359e+01 2.594e+01 2.893e+01 6.920e+01, threshold=5.188e+01, percent-clipped=1.0 2024-08-13 22:07:54,363 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 35 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-13 22:08:13,704 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 2000, loss[loss=0.09262, beats_loss=0.01116, ecapa_loss=0.0001665, whisper_loss=0.0798, over 19619.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01076, ecapa_loss=0.0001542, whisper_loss=0.08931, over 3830541.60 frames. ], batch size: 81, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:08:20,750 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.02 vs. limit=6.0 2024-08-13 22:08:32,814 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2338830.0, ans=0.0 2024-08-13 22:08:55,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2338930.0, ans=0.0 2024-08-13 22:09:05,496 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 29 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-13 22:09:35,040 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 2050, loss[loss=0.09814, beats_loss=0.01197, ecapa_loss=0.0001621, whisper_loss=0.08455, over 22867.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01079, ecapa_loss=0.000154, whisper_loss=0.08936, over 3838259.22 frames. ], batch size: 92, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:09:51,535 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 15 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-13 22:10:13,077 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.784e+01 2.291e+01 2.646e+01 3.086e+01 1.043e+02, threshold=5.292e+01, percent-clipped=1.0 2024-08-13 22:10:25,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2339530.0, ans=0.1 2024-08-13 22:10:57,121 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 2100, loss[loss=0.1059, beats_loss=0.009512, ecapa_loss=0.0001986, whisper_loss=0.09435, over 18277.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01084, ecapa_loss=0.0001542, whisper_loss=0.08897, over 3824251.71 frames. ], batch size: 74, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:11:29,985 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 33 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-13 22:11:59,088 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.02 vs. limit=22.5 2024-08-13 22:12:06,959 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.53 vs. limit=15.0 2024-08-13 22:12:14,775 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 2150, loss[loss=0.09245, beats_loss=0.01017, ecapa_loss=0.0001961, whisper_loss=0.08032, over 15634.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01094, ecapa_loss=0.0001544, whisper_loss=0.08886, over 3820626.99 frames. ], batch size: 65, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:12:14,973 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-13 22:12:17,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2340230.0, ans=0.0 2024-08-13 22:12:31,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2340330.0, ans=0.125 2024-08-13 22:12:54,184 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.910e+01 2.324e+01 2.581e+01 2.963e+01 1.302e+02, threshold=5.163e+01, percent-clipped=1.0 2024-08-13 22:13:06,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2340530.0, ans=0.1 2024-08-13 22:13:14,254 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.70 vs. limit=12.0 2024-08-13 22:13:36,587 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 2200, loss[loss=0.09585, beats_loss=0.01347, ecapa_loss=0.0001602, whisper_loss=0.08078, over 21106.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01093, ecapa_loss=0.0001538, whisper_loss=0.08952, over 3807927.70 frames. ], batch size: 85, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:13:36,871 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2340730.0, ans=0.125 2024-08-13 22:13:38,937 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 34 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-13 22:13:39,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2340730.0, ans=0.2 2024-08-13 22:13:41,866 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 25 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-13 22:13:58,044 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2340830.0, ans=0.125 2024-08-13 22:14:05,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2340830.0, ans=0.125 2024-08-13 22:14:27,121 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2341030.0, ans=0.125 2024-08-13 22:14:28,518 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 21 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-13 22:14:37,393 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 24 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-13 22:14:51,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2341130.0, ans=0.125 2024-08-13 22:14:57,137 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 2250, loss[loss=0.1221, beats_loss=0.008944, ecapa_loss=0.0001634, whisper_loss=0.1115, over 21123.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01088, ecapa_loss=0.0001561, whisper_loss=0.09022, over 3808885.54 frames. ], batch size: 84, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:15:00,039 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2341230.0, ans=0.07 2024-08-13 22:15:04,061 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 27 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-13 22:15:08,582 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 28 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-13 22:15:34,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2341430.0, ans=0.125 2024-08-13 22:15:35,325 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.396e+01 2.660e+01 2.938e+01 1.173e+02, threshold=5.320e+01, percent-clipped=2.0 2024-08-13 22:15:45,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2341530.0, ans=0.2 2024-08-13 22:15:55,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2341530.0, ans=0.2 2024-08-13 22:16:16,693 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 17 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-13 22:16:18,967 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 2300, loss[loss=0.09956, beats_loss=0.008902, ecapa_loss=0.0001766, whisper_loss=0.08889, over 14118.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0108, ecapa_loss=0.0001577, whisper_loss=0.09033, over 3850601.23 frames. ], batch size: 57, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:16:19,132 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 21 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-13 22:16:43,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2341830.0, ans=0.125 2024-08-13 22:17:10,173 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 22 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-13 22:17:13,553 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2342030.0, ans=0.0 2024-08-13 22:17:39,680 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 2350, loss[loss=0.1193, beats_loss=0.009221, ecapa_loss=0.0001457, whisper_loss=0.1086, over 14984.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01085, ecapa_loss=0.0001575, whisper_loss=0.09017, over 3837010.74 frames. ], batch size: 54, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:17:42,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2342230.0, ans=0.035 2024-08-13 22:17:48,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2342230.0, ans=0.0 2024-08-13 22:18:03,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2342330.0, ans=0.125 2024-08-13 22:18:04,576 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 25 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-13 22:18:10,174 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2342330.0, ans=0.125 2024-08-13 22:18:19,700 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.977e+01 2.389e+01 2.636e+01 2.881e+01 1.786e+02, threshold=5.272e+01, percent-clipped=1.0 2024-08-13 22:18:30,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2342530.0, ans=0.125 2024-08-13 22:18:34,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2342530.0, ans=0.125 2024-08-13 22:18:34,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2342530.0, ans=0.125 2024-08-13 22:18:35,707 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 19 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-13 22:18:43,149 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2342630.0, ans=0.0 2024-08-13 22:18:47,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2342630.0, ans=0.125 2024-08-13 22:18:53,334 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-13 22:18:59,980 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2342730.0, ans=0.2 2024-08-13 22:19:01,119 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 2400, loss[loss=0.0801, beats_loss=0.01215, ecapa_loss=0.0001468, whisper_loss=0.06648, over 17656.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01084, ecapa_loss=0.0001573, whisper_loss=0.09006, over 3840821.79 frames. ], batch size: 70, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:19:03,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2342730.0, ans=0.125 2024-08-13 22:19:11,210 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 31 from Vox, 31 fro AS 2024-08-13 22:19:14,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2342730.0, ans=0.125 2024-08-13 22:19:22,756 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2342830.0, ans=0.125 2024-08-13 22:19:37,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2342930.0, ans=0.2 2024-08-13 22:19:46,329 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.90 vs. limit=15.0 2024-08-13 22:19:48,389 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.51 vs. limit=22.5 2024-08-13 22:19:50,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2343030.0, ans=0.125 2024-08-13 22:20:03,909 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2343030.0, ans=0.1 2024-08-13 22:20:24,397 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 2450, loss[loss=0.09655, beats_loss=0.01072, ecapa_loss=0.0001486, whisper_loss=0.08435, over 16529.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01083, ecapa_loss=0.0001581, whisper_loss=0.09049, over 3865493.92 frames. ], batch size: 65, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:20:24,575 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 29 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-13 22:20:30,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2343230.0, ans=0.125 2024-08-13 22:21:05,823 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.291e+01 2.587e+01 2.997e+01 1.554e+02, threshold=5.173e+01, percent-clipped=3.0 2024-08-13 22:21:10,756 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.00 vs. limit=6.0 2024-08-13 22:21:37,273 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2343630.0, ans=0.125 2024-08-13 22:21:47,786 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 2500, loss[loss=0.1035, beats_loss=0.009434, ecapa_loss=0.0001574, whisper_loss=0.09245, over 14276.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01069, ecapa_loss=0.0001592, whisper_loss=0.09146, over 3855200.64 frames. ], batch size: 56, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:22:01,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2343730.0, ans=0.2 2024-08-13 22:22:07,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2343830.0, ans=0.125 2024-08-13 22:22:25,403 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2343930.0, ans=0.2 2024-08-13 22:22:25,405 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2343930.0, ans=0.1 2024-08-13 22:22:26,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2343930.0, ans=0.1 2024-08-13 22:22:39,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2344030.0, ans=0.1 2024-08-13 22:22:41,193 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 23 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-13 22:22:53,482 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2344130.0, ans=0.2 2024-08-13 22:23:00,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2344130.0, ans=0.2 2024-08-13 22:23:06,122 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.20 vs. limit=15.0 2024-08-13 22:23:12,899 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 2550, loss[loss=0.1001, beats_loss=0.01065, ecapa_loss=0.0001674, whisper_loss=0.08777, over 22118.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01066, ecapa_loss=0.0001607, whisper_loss=0.09181, over 3881869.07 frames. ], batch size: 93, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:23:13,036 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 25 from LS+wenet, 24 from Vox, 21 fro AS 2024-08-13 22:23:25,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2344230.0, ans=0.0 2024-08-13 22:23:44,789 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 29 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-13 22:23:53,789 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.921e+01 2.329e+01 2.677e+01 3.229e+01 5.510e+01, threshold=5.353e+01, percent-clipped=1.0 2024-08-13 22:23:55,662 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-13 22:24:05,212 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 30 from Vox, 28 fro AS 2024-08-13 22:24:06,739 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-13 22:24:28,126 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-13 22:24:28,830 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.68 vs. limit=10.0 2024-08-13 22:24:30,948 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 15 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-13 22:24:35,888 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 2600, loss[loss=0.074, beats_loss=0.01289, ecapa_loss=0.0001366, whisper_loss=0.05974, over 16348.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01066, ecapa_loss=0.0001601, whisper_loss=0.09148, over 3851115.20 frames. ], batch size: 69, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:24:42,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2344730.0, ans=0.2 2024-08-13 22:25:03,114 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 13 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-13 22:25:11,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2344930.0, ans=0.0 2024-08-13 22:25:12,480 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-13 22:25:47,492 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2345130.0, ans=0.125 2024-08-13 22:25:54,407 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 2650, loss[loss=0.07422, beats_loss=0.01328, ecapa_loss=0.0001332, whisper_loss=0.05961, over 14664.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01065, ecapa_loss=0.00016, whisper_loss=0.09156, over 3845970.69 frames. ], batch size: 57, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:25:54,815 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2345230.0, ans=0.0 2024-08-13 22:25:58,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2345230.0, ans=0.125 2024-08-13 22:26:04,117 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2345230.0, ans=0.0 2024-08-13 22:26:18,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2345330.0, ans=0.125 2024-08-13 22:26:31,859 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.731e+01 2.316e+01 2.514e+01 2.879e+01 4.241e+01, threshold=5.029e+01, percent-clipped=0.0 2024-08-13 22:26:36,055 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-13 22:26:37,975 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.47 vs. limit=15.0 2024-08-13 22:26:43,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2345530.0, ans=0.0 2024-08-13 22:26:54,608 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 18 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-13 22:26:58,894 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.92 vs. limit=15.0 2024-08-13 22:27:01,591 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2345630.0, ans=0.1 2024-08-13 22:27:06,722 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.85 vs. limit=22.5 2024-08-13 22:27:13,544 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 2700, loss[loss=0.09547, beats_loss=0.01289, ecapa_loss=9.765e-05, whisper_loss=0.0816, over 17409.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01071, ecapa_loss=0.000159, whisper_loss=0.09088, over 3868097.87 frames. ], batch size: 64, lr: 3.72e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:27:14,485 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.12 vs. limit=22.5 2024-08-13 22:27:21,854 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2345730.0, ans=0.0 2024-08-13 22:27:39,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2345830.0, ans=0.0 2024-08-13 22:27:43,904 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2345930.0, ans=0.2 2024-08-13 22:27:57,652 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2345930.0, ans=0.1 2024-08-13 22:28:08,576 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.35 vs. limit=22.5 2024-08-13 22:28:28,221 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2346130.0, ans=0.2 2024-08-13 22:28:32,295 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 2750, loss[loss=0.09611, beats_loss=0.01187, ecapa_loss=0.0001507, whisper_loss=0.08274, over 18440.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01074, ecapa_loss=0.0001583, whisper_loss=0.0909, over 3881512.66 frames. ], batch size: 75, lr: 3.72e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:28:33,806 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-13 22:28:36,036 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.38 vs. limit=15.0 2024-08-13 22:28:37,020 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-13 22:29:11,727 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.971e+01 2.410e+01 2.665e+01 3.029e+01 5.908e+01, threshold=5.329e+01, percent-clipped=1.0 2024-08-13 22:29:11,940 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 17 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-13 22:29:19,732 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2346530.0, ans=0.0 2024-08-13 22:29:46,430 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-13 22:29:50,609 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 2800, loss[loss=0.1138, beats_loss=0.00894, ecapa_loss=0.0001528, whisper_loss=0.1033, over 19725.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01071, ecapa_loss=0.000158, whisper_loss=0.09154, over 3868993.22 frames. ], batch size: 77, lr: 3.72e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:30:20,089 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2346930.0, ans=0.125 2024-08-13 22:30:24,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2346930.0, ans=0.125 2024-08-13 22:30:26,340 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.81 vs. limit=15.0 2024-08-13 22:30:32,246 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 38 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-13 22:30:34,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2346930.0, ans=0.2 2024-08-13 22:30:34,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2346930.0, ans=0.1 2024-08-13 22:30:45,197 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.77 vs. limit=22.5 2024-08-13 22:30:57,840 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2347130.0, ans=0.125 2024-08-13 22:31:11,723 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-13 22:31:15,263 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 2850, loss[loss=0.1187, beats_loss=0.009672, ecapa_loss=0.0001624, whisper_loss=0.1074, over 16762.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01065, ecapa_loss=0.0001577, whisper_loss=0.09241, over 3844957.86 frames. ], batch size: 64, lr: 3.72e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:31:43,089 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 22 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-13 22:31:48,822 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2347430.0, ans=0.1 2024-08-13 22:31:52,551 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.905e+01 2.368e+01 2.681e+01 3.083e+01 7.841e+01, threshold=5.363e+01, percent-clipped=3.0 2024-08-13 22:31:58,042 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2024-08-13 22:32:06,127 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2347530.0, ans=0.125 2024-08-13 22:32:17,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2347530.0, ans=0.125 2024-08-13 22:32:43,699 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 2900, loss[loss=0.1002, beats_loss=0.01087, ecapa_loss=0.00015, whisper_loss=0.08784, over 15398.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01067, ecapa_loss=0.0001598, whisper_loss=0.09192, over 3829898.86 frames. ], batch size: 63, lr: 3.72e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:33:23,444 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=19.38 vs. limit=22.5 2024-08-13 22:33:35,340 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 22 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-13 22:34:01,143 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2348030.0, ans=0.0 2024-08-13 22:34:28,956 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2348230.0, ans=0.125 2024-08-13 22:34:31,267 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 2950, loss[loss=0.1034, beats_loss=0.01007, ecapa_loss=0.0001627, whisper_loss=0.09168, over 19941.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01064, ecapa_loss=0.0001619, whisper_loss=0.09214, over 3832990.51 frames. ], batch size: 78, lr: 3.72e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:34:34,667 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 26 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-13 22:34:38,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2348230.0, ans=0.125 2024-08-13 22:35:00,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2348330.0, ans=0.1 2024-08-13 22:35:03,484 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2348330.0, ans=0.125 2024-08-13 22:35:07,416 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.75 vs. limit=22.5 2024-08-13 22:35:13,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2348330.0, ans=0.0 2024-08-13 22:35:29,769 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.877e+01 2.427e+01 2.649e+01 3.118e+01 1.077e+02, threshold=5.298e+01, percent-clipped=4.0 2024-08-13 22:35:48,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2348530.0, ans=0.1 2024-08-13 22:36:00,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2348530.0, ans=0.0 2024-08-13 22:36:37,582 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 3000, loss[loss=0.08805, beats_loss=0.01051, ecapa_loss=9.799e-05, whisper_loss=0.07656, over 16012.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01063, ecapa_loss=0.0001618, whisper_loss=0.09266, over 3868847.04 frames. ], batch size: 56, lr: 3.72e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:36:37,583 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-13 22:37:40,759 INFO [train_multi_KD3.py:1149] (0/4) Epoch 17, validation on ASR_libri: loss=0.2526, beats_loss=0, ecapa_loss=0.0005533, whisper_loss=0.2471, over 922467.00 frames. 2024-08-13 22:38:04,818 INFO [train_multi_KD3.py:1149] (0/4) Epoch 17, validation on SV_voxceleb1: loss=0.004391, beats_loss=0, ecapa_loss=0.0004391, whisper_loss=0, over 939242.00 frames. 2024-08-13 22:40:54,337 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([1.6736e-04, 1.0931e-02, 1.2297e-02, 3.7933e+00, 8.6286e-03, 2.1107e-02, 3.4841e-02, 2.3958e-02], device='cuda:0') 2024-08-13 22:41:12,720 INFO [train_multi_KD3.py:1149] (0/4) Epoch 17, validation on AT_audioset: loss=0.02357, beats_loss=0.02357, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 22:41:12,730 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32601MB 2024-08-13 22:41:32,072 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.61 vs. limit=15.0 2024-08-13 22:41:35,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2348830.0, ans=0.0 2024-08-13 22:41:47,732 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2348930.0, ans=0.0 2024-08-13 22:41:54,549 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 31 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-13 22:42:22,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2349030.0, ans=0.1 2024-08-13 22:42:36,731 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 18 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-13 22:42:41,668 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 3050, loss[loss=0.1155, beats_loss=0.007981, ecapa_loss=0.0001911, whisper_loss=0.1057, over 22363.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01065, ecapa_loss=0.0001621, whisper_loss=0.09256, over 3857561.56 frames. ], batch size: 89, lr: 3.72e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:42:43,970 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 20 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-13 22:42:47,617 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-13 22:43:14,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2349330.0, ans=0.0 2024-08-13 22:43:17,404 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.85 vs. limit=15.0 2024-08-13 22:43:25,583 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.934e+01 2.432e+01 2.716e+01 3.181e+01 1.148e+02, threshold=5.433e+01, percent-clipped=2.0 2024-08-13 22:43:43,146 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=2349530.0, ans=0.95 2024-08-13 22:43:45,400 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.22 vs. limit=15.0 2024-08-13 22:43:48,876 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.36 vs. limit=15.0 2024-08-13 22:44:09,542 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 22 from LS+wenet, 27 from Vox, 46 fro AS 2024-08-13 22:44:11,975 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 3100, loss[loss=0.09997, beats_loss=0.01164, ecapa_loss=0.0001344, whisper_loss=0.08698, over 18525.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01074, ecapa_loss=0.0001616, whisper_loss=0.09246, over 3871887.85 frames. ], batch size: 72, lr: 3.72e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:45:34,913 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2350130.0, ans=0.5 2024-08-13 22:45:37,928 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 3150, loss[loss=0.112, beats_loss=0.01169, ecapa_loss=0.000156, whisper_loss=0.09876, over 22301.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01078, ecapa_loss=0.0001613, whisper_loss=0.09255, over 3871855.51 frames. ], batch size: 91, lr: 3.72e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:45:54,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2350330.0, ans=0.0 2024-08-13 22:45:55,796 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-13 22:45:59,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2350330.0, ans=0.2 2024-08-13 22:46:02,738 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2350330.0, ans=0.1 2024-08-13 22:46:17,607 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 26 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-13 22:46:20,839 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.358e+01 2.601e+01 2.838e+01 4.154e+01, threshold=5.202e+01, percent-clipped=0.0 2024-08-13 22:46:28,547 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2350430.0, ans=0.125 2024-08-13 22:46:36,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2350530.0, ans=0.125 2024-08-13 22:46:42,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2350530.0, ans=0.0 2024-08-13 22:47:02,411 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 22:47:07,233 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 3200, loss[loss=0.1048, beats_loss=0.0111, ecapa_loss=0.0001541, whisper_loss=0.09219, over 23305.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01078, ecapa_loss=0.0001611, whisper_loss=0.09303, over 3885815.79 frames. ], batch size: 94, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:47:11,406 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2350730.0, ans=0.125 2024-08-13 22:47:55,248 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 19 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-13 22:47:58,186 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 18 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-13 22:48:22,926 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.69 vs. limit=15.0 2024-08-13 22:48:37,601 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 3250, loss[loss=0.1072, beats_loss=0.0113, ecapa_loss=0.0001706, whisper_loss=0.09419, over 19424.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01078, ecapa_loss=0.000162, whisper_loss=0.09283, over 3882953.84 frames. ], batch size: 76, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:48:42,267 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2351230.0, ans=0.125 2024-08-13 22:48:52,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2351230.0, ans=0.1 2024-08-13 22:48:53,616 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2351330.0, ans=0.125 2024-08-13 22:49:19,546 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.385e+01 2.597e+01 2.999e+01 7.217e+01, threshold=5.195e+01, percent-clipped=0.0 2024-08-13 22:49:20,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2351430.0, ans=0.125 2024-08-13 22:49:48,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2351630.0, ans=0.2 2024-08-13 22:49:56,185 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.07 vs. limit=15.0 2024-08-13 22:49:57,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2351630.0, ans=0.035 2024-08-13 22:50:05,130 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 3300, loss[loss=0.1054, beats_loss=0.01336, ecapa_loss=0.0001501, whisper_loss=0.09056, over 22976.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01083, ecapa_loss=0.0001612, whisper_loss=0.09273, over 3881983.02 frames. ], batch size: 94, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:50:16,220 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2351730.0, ans=0.0 2024-08-13 22:51:04,648 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2352030.0, ans=0.125 2024-08-13 22:51:06,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=2352030.0, ans=0.5 2024-08-13 22:51:13,976 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 17 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-13 22:51:14,266 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2352130.0, ans=0.125 2024-08-13 22:51:30,149 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 3350, loss[loss=0.1031, beats_loss=0.00965, ecapa_loss=0.0002011, whisper_loss=0.09143, over 18153.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01083, ecapa_loss=0.0001607, whisper_loss=0.09265, over 3877357.76 frames. ], batch size: 74, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:51:33,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2352230.0, ans=0.0 2024-08-13 22:51:40,410 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 23 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-13 22:51:46,126 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2352230.0, ans=0.0 2024-08-13 22:51:55,688 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.80 vs. limit=12.0 2024-08-13 22:52:02,243 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 35 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-13 22:52:05,811 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-13 22:52:11,087 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 2.332e+01 2.587e+01 3.048e+01 7.749e+01, threshold=5.173e+01, percent-clipped=3.0 2024-08-13 22:52:28,018 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-13 22:52:37,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2352530.0, ans=0.0 2024-08-13 22:52:41,181 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.79 vs. limit=12.0 2024-08-13 22:52:56,429 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 3400, loss[loss=0.09527, beats_loss=0.01251, ecapa_loss=0.0001366, whisper_loss=0.08139, over 23369.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01077, ecapa_loss=0.0001606, whisper_loss=0.09198, over 3881330.25 frames. ], batch size: 91, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:52:59,461 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2352730.0, ans=0.0 2024-08-13 22:53:07,988 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2352730.0, ans=0.125 2024-08-13 22:53:31,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2352930.0, ans=0.0 2024-08-13 22:53:34,683 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 18 from LS+wenet, 28 from Vox, 49 fro AS 2024-08-13 22:53:38,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2352930.0, ans=0.1 2024-08-13 22:53:38,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2352930.0, ans=0.0 2024-08-13 22:54:06,827 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-13 22:54:07,501 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.92 vs. limit=6.0 2024-08-13 22:54:12,901 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.37 vs. limit=15.0 2024-08-13 22:54:26,407 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 3450, loss[loss=0.08896, beats_loss=0.01032, ecapa_loss=0.0002103, whisper_loss=0.07654, over 19014.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01075, ecapa_loss=0.0001608, whisper_loss=0.09148, over 3918327.40 frames. ], batch size: 82, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:54:29,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2353230.0, ans=0.125 2024-08-13 22:54:37,518 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 22 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-13 22:54:49,316 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 25 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-13 22:54:53,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2353330.0, ans=0.1 2024-08-13 22:55:01,173 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2353430.0, ans=0.125 2024-08-13 22:55:09,103 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.899e+01 2.393e+01 2.606e+01 2.901e+01 5.659e+01, threshold=5.211e+01, percent-clipped=1.0 2024-08-13 22:55:16,114 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 30 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-13 22:55:21,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2353530.0, ans=0.0 2024-08-13 22:55:22,120 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 24 from LS+wenet, 28 from Vox, 43 fro AS 2024-08-13 22:55:27,685 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2353530.0, ans=0.125 2024-08-13 22:55:39,336 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2353630.0, ans=0.0 2024-08-13 22:55:52,637 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 3500, loss[loss=0.1025, beats_loss=0.01069, ecapa_loss=0.0001565, whisper_loss=0.09022, over 19654.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01082, ecapa_loss=0.0001605, whisper_loss=0.0906, over 3890714.50 frames. ], batch size: 79, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:55:58,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2353730.0, ans=0.125 2024-08-13 22:56:22,904 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-13 22:56:46,079 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.86 vs. limit=15.0 2024-08-13 22:56:48,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2354030.0, ans=0.125 2024-08-13 22:56:50,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2354030.0, ans=0.125 2024-08-13 22:56:50,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2354030.0, ans=0.125 2024-08-13 22:56:57,612 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2354130.0, ans=0.125 2024-08-13 22:56:57,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2354130.0, ans=0.0 2024-08-13 22:57:01,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2354130.0, ans=0.125 2024-08-13 22:57:12,858 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 20 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-13 22:57:15,867 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 3550, loss[loss=0.1164, beats_loss=0.007897, ecapa_loss=0.000163, whisper_loss=0.1068, over 16013.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01076, ecapa_loss=0.0001604, whisper_loss=0.09087, over 3884775.22 frames. ], batch size: 63, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:57:21,386 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 24 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-13 22:57:39,141 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 18 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-13 22:57:45,063 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 26 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-13 22:57:48,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2354430.0, ans=0.125 2024-08-13 22:57:54,865 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.111e+01 2.375e+01 2.617e+01 2.958e+01 4.205e+01, threshold=5.234e+01, percent-clipped=0.0 2024-08-13 22:57:59,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2354430.0, ans=0.125 2024-08-13 22:58:07,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2354530.0, ans=0.0 2024-08-13 22:58:36,938 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 3600, loss[loss=0.1165, beats_loss=0.008565, ecapa_loss=0.0002102, whisper_loss=0.1058, over 19439.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01072, ecapa_loss=0.0001617, whisper_loss=0.09167, over 3895837.01 frames. ], batch size: 81, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:58:43,719 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 35 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-13 22:58:53,335 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 21 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-13 22:59:00,071 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.445e-01 2024-08-13 22:59:26,521 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 22:59:30,093 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2355030.0, ans=0.125 2024-08-13 22:59:40,954 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 14 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-13 22:59:55,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2355230.0, ans=0.125 2024-08-13 22:59:56,868 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 3650, loss[loss=0.1138, beats_loss=0.01009, ecapa_loss=0.0001728, whisper_loss=0.1019, over 17659.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01073, ecapa_loss=0.0001613, whisper_loss=0.09142, over 3876123.23 frames. ], batch size: 73, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 23:00:00,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2355230.0, ans=0.1 2024-08-13 23:00:01,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2355230.0, ans=0.0 2024-08-13 23:00:01,495 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2355230.0, ans=0.125 2024-08-13 23:00:14,983 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-13 23:00:16,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2355330.0, ans=0.0 2024-08-13 23:00:16,651 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.91 vs. limit=15.0 2024-08-13 23:00:29,429 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2355430.0, ans=0.1 2024-08-13 23:00:34,827 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.876e+01 2.445e+01 2.700e+01 3.239e+01 5.632e+01, threshold=5.401e+01, percent-clipped=1.0 2024-08-13 23:00:35,312 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2355430.0, ans=0.2 2024-08-13 23:00:47,512 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2355530.0, ans=0.125 2024-08-13 23:01:00,150 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2355630.0, ans=0.0 2024-08-13 23:01:15,938 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 3700, loss[loss=0.09134, beats_loss=0.01236, ecapa_loss=0.0001728, whisper_loss=0.07725, over 19795.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01076, ecapa_loss=0.0001614, whisper_loss=0.09122, over 3860603.54 frames. ], batch size: 83, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 23:01:18,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2355730.0, ans=0.125 2024-08-13 23:01:31,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2355830.0, ans=0.2 2024-08-13 23:01:44,652 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2355830.0, ans=0.125 2024-08-13 23:01:54,417 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2355930.0, ans=0.0 2024-08-13 23:01:58,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2355930.0, ans=0.125 2024-08-13 23:02:19,060 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 21 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-13 23:02:20,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2356130.0, ans=0.2 2024-08-13 23:02:21,839 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 18 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-13 23:02:34,226 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 3750, loss[loss=0.1195, beats_loss=0.009894, ecapa_loss=0.0001617, whisper_loss=0.108, over 18033.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01081, ecapa_loss=0.0001605, whisper_loss=0.09115, over 3848878.43 frames. ], batch size: 69, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 23:02:46,752 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2356230.0, ans=0.1 2024-08-13 23:02:46,846 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2356230.0, ans=0.125 2024-08-13 23:03:10,241 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.800e+01 2.349e+01 2.622e+01 2.917e+01 8.940e+01, threshold=5.244e+01, percent-clipped=1.0 2024-08-13 23:03:16,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2356430.0, ans=0.1 2024-08-13 23:03:20,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2356530.0, ans=0.0 2024-08-13 23:03:24,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2356530.0, ans=0.0 2024-08-13 23:03:30,705 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 24 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-13 23:03:44,027 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.16 vs. limit=10.0 2024-08-13 23:03:49,151 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 3800, loss[loss=0.1136, beats_loss=0.01031, ecapa_loss=0.0001586, whisper_loss=0.1017, over 15657.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01083, ecapa_loss=0.0001609, whisper_loss=0.0908, over 3816581.05 frames. ], batch size: 60, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 23:03:50,771 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 24 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-13 23:03:52,610 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-13 23:03:58,245 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 21 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-13 23:04:03,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2356830.0, ans=0.0 2024-08-13 23:04:18,916 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2356930.0, ans=0.125 2024-08-13 23:05:03,046 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 24 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-13 23:05:07,155 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 3850, loss[loss=0.1142, beats_loss=0.009453, ecapa_loss=0.0001701, whisper_loss=0.103, over 23036.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01085, ecapa_loss=0.0001605, whisper_loss=0.09099, over 3861415.81 frames. ], batch size: 91, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 23:05:28,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2357330.0, ans=0.0 2024-08-13 23:05:38,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2357430.0, ans=0.125 2024-08-13 23:05:44,048 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.901e+01 2.323e+01 2.537e+01 2.804e+01 4.147e+01, threshold=5.073e+01, percent-clipped=0.0 2024-08-13 23:05:59,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2357530.0, ans=0.05 2024-08-13 23:06:15,665 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2357630.0, ans=0.0 2024-08-13 23:06:19,621 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 22 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-13 23:06:21,355 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=2357630.0, ans=15.0 2024-08-13 23:06:23,288 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 3900, loss[loss=0.09582, beats_loss=0.01173, ecapa_loss=0.0001342, whisper_loss=0.08274, over 16795.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01073, ecapa_loss=0.0001621, whisper_loss=0.09177, over 3842098.49 frames. ], batch size: 63, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 23:06:52,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2357830.0, ans=0.1 2024-08-13 23:06:55,926 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2357930.0, ans=0.1 2024-08-13 23:07:20,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2358030.0, ans=0.0 2024-08-13 23:07:22,470 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 23 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-13 23:07:28,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2358130.0, ans=0.2 2024-08-13 23:07:41,456 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 3950, loss[loss=0.1025, beats_loss=0.01169, ecapa_loss=0.0001397, whisper_loss=0.08938, over 22488.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01073, ecapa_loss=0.0001627, whisper_loss=0.09167, over 3890875.45 frames. ], batch size: 88, lr: 3.71e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:08:01,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2358330.0, ans=0.0 2024-08-13 23:08:03,832 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.81 vs. limit=15.0 2024-08-13 23:08:07,560 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 17 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-13 23:08:18,731 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-13 23:08:20,320 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.910e+01 2.511e+01 2.750e+01 3.070e+01 4.670e+01, threshold=5.499e+01, percent-clipped=0.0 2024-08-13 23:08:29,647 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-13 23:08:39,944 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 23 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-13 23:08:53,077 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-13 23:08:57,428 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 4000, loss[loss=0.09148, beats_loss=0.01151, ecapa_loss=0.0001504, whisper_loss=0.07846, over 20820.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01077, ecapa_loss=0.0001624, whisper_loss=0.09159, over 3907539.23 frames. ], batch size: 84, lr: 3.71e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:08:59,033 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 17 from Vox, 48 fro AS 2024-08-13 23:09:08,102 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 27 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-13 23:09:17,811 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 23 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-13 23:09:19,940 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.77 vs. limit=15.0 2024-08-13 23:09:30,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2358930.0, ans=0.1 2024-08-13 23:09:30,878 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.11 vs. limit=15.0 2024-08-13 23:09:34,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2358930.0, ans=0.125 2024-08-13 23:09:35,757 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2358930.0, ans=0.2 2024-08-13 23:09:47,163 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 36 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-13 23:09:51,779 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2359030.0, ans=0.05 2024-08-13 23:09:54,323 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 16 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-13 23:10:11,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2359130.0, ans=10.0 2024-08-13 23:10:15,200 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 4050, loss[loss=0.1068, beats_loss=0.009631, ecapa_loss=0.000175, whisper_loss=0.09546, over 19318.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01078, ecapa_loss=0.0001626, whisper_loss=0.09179, over 3908980.46 frames. ], batch size: 74, lr: 3.71e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:10:21,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2359230.0, ans=0.0 2024-08-13 23:10:21,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2359230.0, ans=0.125 2024-08-13 23:10:31,248 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 17 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-13 23:10:36,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2359330.0, ans=0.0 2024-08-13 23:10:43,466 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2359430.0, ans=0.125 2024-08-13 23:10:43,608 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.whiten.whitening_limit, batch_count=2359430.0, ans=12.0 2024-08-13 23:10:51,753 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.046e+01 2.408e+01 2.659e+01 2.975e+01 6.287e+01, threshold=5.318e+01, percent-clipped=1.0 2024-08-13 23:10:52,431 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 23:10:58,788 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.65 vs. limit=12.0 2024-08-13 23:11:23,549 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.86 vs. limit=22.5 2024-08-13 23:11:29,844 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 4100, loss[loss=0.117, beats_loss=0.01049, ecapa_loss=0.0001786, whisper_loss=0.1047, over 19598.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01076, ecapa_loss=0.0001619, whisper_loss=0.09194, over 3893091.61 frames. ], batch size: 77, lr: 3.71e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:12:01,053 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2359930.0, ans=0.0 2024-08-13 23:12:09,610 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-236000.pt 2024-08-13 23:12:34,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2360130.0, ans=0.125 2024-08-13 23:12:37,422 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-13 23:12:48,445 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 4150, loss[loss=0.1012, beats_loss=0.0128, ecapa_loss=0.0001431, whisper_loss=0.087, over 16784.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01077, ecapa_loss=0.000163, whisper_loss=0.09133, over 3878505.27 frames. ], batch size: 67, lr: 3.71e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:12:48,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2360230.0, ans=0.2 2024-08-13 23:12:59,424 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2024-08-13 23:13:25,972 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.970e+01 2.420e+01 2.616e+01 2.987e+01 7.044e+01, threshold=5.231e+01, percent-clipped=1.0 2024-08-13 23:13:30,186 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.59 vs. limit=12.0 2024-08-13 23:13:53,562 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 18 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-13 23:14:00,424 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2360630.0, ans=0.125 2024-08-13 23:14:03,011 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 4200, loss[loss=0.1087, beats_loss=0.01107, ecapa_loss=0.0001579, whisper_loss=0.09606, over 19667.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01081, ecapa_loss=0.000162, whisper_loss=0.09056, over 3869480.02 frames. ], batch size: 75, lr: 3.71e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:14:09,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2360730.0, ans=0.125 2024-08-13 23:14:18,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2360830.0, ans=0.2 2024-08-13 23:14:25,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2360830.0, ans=0.0 2024-08-13 23:14:44,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2361030.0, ans=0.2 2024-08-13 23:14:57,991 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 34 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-13 23:15:05,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2361130.0, ans=0.125 2024-08-13 23:15:12,180 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 4250, loss[loss=0.1169, beats_loss=0.01056, ecapa_loss=0.0001447, whisper_loss=0.1049, over 22116.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01081, ecapa_loss=0.0001621, whisper_loss=0.09102, over 3866649.65 frames. ], batch size: 86, lr: 3.71e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:15:12,790 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2361230.0, ans=0.2 2024-08-13 23:15:20,479 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2361230.0, ans=0.125 2024-08-13 23:15:21,504 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 29 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-13 23:15:28,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2361330.0, ans=0.125 2024-08-13 23:15:29,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2361330.0, ans=0.1 2024-08-13 23:15:30,946 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 28 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-13 23:15:44,823 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.995e+01 2.294e+01 2.587e+01 2.870e+01 6.296e+01, threshold=5.174e+01, percent-clipped=1.0 2024-08-13 23:15:49,032 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-13 23:16:06,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2361630.0, ans=0.0 2024-08-13 23:16:13,412 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 28 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-13 23:16:17,370 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 4300, loss[loss=0.09246, beats_loss=0.01166, ecapa_loss=0.0001481, whisper_loss=0.07932, over 20623.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01083, ecapa_loss=0.0001608, whisper_loss=0.08985, over 3857829.11 frames. ], batch size: 82, lr: 3.71e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:16:19,061 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-13 23:17:29,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2361930.0, ans=0.125 2024-08-13 23:17:36,962 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.59 vs. limit=15.0 2024-08-13 23:18:13,051 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 4350, loss[loss=0.08394, beats_loss=0.01239, ecapa_loss=0.0001949, whisper_loss=0.0696, over 21615.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01081, ecapa_loss=0.0001607, whisper_loss=0.08957, over 3847605.34 frames. ], batch size: 92, lr: 3.71e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:18:31,229 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2362330.0, ans=0.125 2024-08-13 23:18:42,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2362430.0, ans=0.125 2024-08-13 23:18:52,364 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.708e+01 2.337e+01 2.576e+01 3.012e+01 4.056e+01, threshold=5.151e+01, percent-clipped=0.0 2024-08-13 23:19:06,475 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.54 vs. limit=15.0 2024-08-13 23:19:11,477 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-13 23:19:13,623 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 31 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-13 23:19:14,512 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.14 vs. limit=15.0 2024-08-13 23:19:16,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2362630.0, ans=0.1 2024-08-13 23:19:20,584 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2362630.0, ans=0.0 2024-08-13 23:19:28,088 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 16 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-13 23:19:29,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=2362630.0, ans=10.0 2024-08-13 23:19:33,856 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 4400, loss[loss=0.1081, beats_loss=0.0104, ecapa_loss=0.0001783, whisper_loss=0.09595, over 22558.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0107, ecapa_loss=0.00016, whisper_loss=0.09107, over 3854284.28 frames. ], batch size: 92, lr: 3.71e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:19:37,129 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 23 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-13 23:19:37,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=2362730.0, ans=10.0 2024-08-13 23:19:48,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2362830.0, ans=0.125 2024-08-13 23:19:51,783 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-13 23:20:04,993 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.55 vs. limit=10.0 2024-08-13 23:20:12,907 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 8 from LS+wenet, 11 from Vox, 40 fro AS 2024-08-13 23:20:20,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2363030.0, ans=0.2 2024-08-13 23:20:20,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2363030.0, ans=0.125 2024-08-13 23:20:34,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2363130.0, ans=0.125 2024-08-13 23:20:35,612 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2363130.0, ans=0.0 2024-08-13 23:20:48,662 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 4450, loss[loss=0.07697, beats_loss=0.01288, ecapa_loss=0.0001489, whisper_loss=0.0626, over 14604.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01071, ecapa_loss=0.0001594, whisper_loss=0.09122, over 3845638.07 frames. ], batch size: 61, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:20:56,770 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2363230.0, ans=0.0 2024-08-13 23:21:14,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2363330.0, ans=0.0 2024-08-13 23:21:28,623 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.990e+01 2.409e+01 2.664e+01 2.942e+01 4.100e+01, threshold=5.327e+01, percent-clipped=0.0 2024-08-13 23:21:30,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2363430.0, ans=0.0 2024-08-13 23:21:34,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2363430.0, ans=0.125 2024-08-13 23:21:36,425 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2363530.0, ans=0.125 2024-08-13 23:21:47,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2363530.0, ans=0.2 2024-08-13 23:21:55,590 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 19 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-13 23:22:09,834 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 4500, loss[loss=0.1063, beats_loss=0.01149, ecapa_loss=0.0001487, whisper_loss=0.09333, over 22500.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01064, ecapa_loss=0.0001613, whisper_loss=0.0915, over 3869119.18 frames. ], batch size: 90, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:22:22,025 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.37 vs. limit=15.0 2024-08-13 23:22:27,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2363830.0, ans=0.0 2024-08-13 23:22:37,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2363830.0, ans=0.125 2024-08-13 23:22:42,686 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2363930.0, ans=0.125 2024-08-13 23:22:44,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2363930.0, ans=0.0 2024-08-13 23:22:57,472 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.49 vs. limit=15.0 2024-08-13 23:22:58,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2364030.0, ans=0.0 2024-08-13 23:23:00,134 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.08 vs. limit=6.0 2024-08-13 23:23:07,431 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 23 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-13 23:23:10,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2364130.0, ans=0.0 2024-08-13 23:23:24,664 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 4550, loss[loss=0.1065, beats_loss=0.0103, ecapa_loss=0.000142, whisper_loss=0.09475, over 20636.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01071, ecapa_loss=0.0001619, whisper_loss=0.091, over 3888744.01 frames. ], batch size: 79, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:23:27,418 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 22 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-13 23:23:42,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2364330.0, ans=0.125 2024-08-13 23:23:58,639 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-13 23:24:00,077 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.794e+01 2.368e+01 2.686e+01 2.952e+01 5.692e+01, threshold=5.373e+01, percent-clipped=1.0 2024-08-13 23:24:03,277 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.053e+01 2024-08-13 23:24:11,545 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2364530.0, ans=0.125 2024-08-13 23:24:13,177 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2364530.0, ans=0.2 2024-08-13 23:24:15,866 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2364530.0, ans=0.1 2024-08-13 23:24:30,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2364630.0, ans=0.2 2024-08-13 23:24:33,936 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 4600, loss[loss=0.1059, beats_loss=0.009075, ecapa_loss=0.0001593, whisper_loss=0.09523, over 18479.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01078, ecapa_loss=0.0001609, whisper_loss=0.09088, over 3906523.21 frames. ], batch size: 73, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:24:49,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2364830.0, ans=0.125 2024-08-13 23:24:53,508 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-13 23:24:53,677 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2364830.0, ans=0.125 2024-08-13 23:24:58,807 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 27 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-13 23:25:16,823 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2365030.0, ans=0.1 2024-08-13 23:25:23,471 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 10 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-13 23:25:41,623 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 4650, loss[loss=0.09998, beats_loss=0.01325, ecapa_loss=0.0001371, whisper_loss=0.08535, over 21502.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01083, ecapa_loss=0.0001607, whisper_loss=0.0902, over 3893500.36 frames. ], batch size: 85, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:25:43,581 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2365230.0, ans=0.0 2024-08-13 23:25:44,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2365230.0, ans=0.1 2024-08-13 23:25:48,950 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 24 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-13 23:26:09,995 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-13 23:26:15,241 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.809e+01 2.449e+01 2.734e+01 2.969e+01 1.115e+02, threshold=5.467e+01, percent-clipped=2.0 2024-08-13 23:26:23,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2365530.0, ans=0.1 2024-08-13 23:26:27,978 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 17 from LS+wenet, 7 from Vox, 32 fro AS 2024-08-13 23:26:36,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2365630.0, ans=0.0 2024-08-13 23:26:47,416 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 4700, loss[loss=0.08032, beats_loss=0.009507, ecapa_loss=0.0001804, whisper_loss=0.06901, over 15435.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01082, ecapa_loss=0.0001607, whisper_loss=0.09048, over 3879404.83 frames. ], batch size: 64, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:26:51,547 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2365730.0, ans=0.0 2024-08-13 23:26:52,942 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 21 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-13 23:26:54,490 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2365730.0, ans=0.2 2024-08-13 23:26:56,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2365730.0, ans=0.1 2024-08-13 23:27:07,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2365830.0, ans=0.0 2024-08-13 23:27:17,769 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-13 23:27:28,266 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-13 23:27:36,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2366030.0, ans=0.0 2024-08-13 23:27:52,748 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 4750, loss[loss=0.09873, beats_loss=0.01163, ecapa_loss=0.000144, whisper_loss=0.08567, over 22418.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01086, ecapa_loss=0.0001604, whisper_loss=0.09005, over 3891170.44 frames. ], batch size: 91, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:27:56,078 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 16 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-13 23:28:00,772 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.35 vs. limit=15.0 2024-08-13 23:28:15,069 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-13 23:28:21,120 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.20 vs. limit=15.0 2024-08-13 23:28:25,299 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.418e+01 2.670e+01 2.931e+01 4.166e+01, threshold=5.341e+01, percent-clipped=0.0 2024-08-13 23:28:28,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2366430.0, ans=0.0 2024-08-13 23:28:47,528 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-13 23:28:57,834 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 4800, loss[loss=0.09281, beats_loss=0.01117, ecapa_loss=0.000151, whisper_loss=0.08013, over 14019.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0109, ecapa_loss=0.0001601, whisper_loss=0.09015, over 3917290.37 frames. ], batch size: 55, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:29:01,412 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.65 vs. limit=22.5 2024-08-13 23:29:07,060 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-13 23:29:17,833 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2366830.0, ans=0.125 2024-08-13 23:29:18,909 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-13 23:29:32,341 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.68 vs. limit=15.0 2024-08-13 23:29:34,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2366930.0, ans=0.125 2024-08-13 23:29:44,782 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2367030.0, ans=0.125 2024-08-13 23:29:47,791 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.57 vs. limit=5.0 2024-08-13 23:29:50,904 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 27 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-13 23:30:02,731 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 4850, loss[loss=0.1077, beats_loss=0.01262, ecapa_loss=0.0001348, whisper_loss=0.09371, over 23932.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0109, ecapa_loss=0.0001595, whisper_loss=0.09053, over 3930779.48 frames. ], batch size: 93, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:30:13,290 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 30 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-13 23:30:15,167 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.93 vs. limit=15.0 2024-08-13 23:30:23,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2367330.0, ans=0.2 2024-08-13 23:30:23,772 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.70 vs. limit=15.0 2024-08-13 23:30:33,081 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.56 vs. limit=15.0 2024-08-13 23:30:35,048 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.351e+01 2.637e+01 2.912e+01 5.043e+01, threshold=5.273e+01, percent-clipped=0.0 2024-08-13 23:30:36,520 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 28 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-13 23:30:48,515 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.38 vs. limit=15.0 2024-08-13 23:30:49,615 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2367530.0, ans=0.0 2024-08-13 23:31:03,924 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-13 23:31:05,100 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 19 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-13 23:31:05,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2367630.0, ans=0.125 2024-08-13 23:31:06,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2367730.0, ans=0.0 2024-08-13 23:31:07,474 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 4900, loss[loss=0.08305, beats_loss=0.0128, ecapa_loss=0.000156, whisper_loss=0.06869, over 15383.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01089, ecapa_loss=0.0001596, whisper_loss=0.09047, over 3897329.68 frames. ], batch size: 63, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:31:22,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2367830.0, ans=0.125 2024-08-13 23:31:40,491 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 27 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-13 23:31:42,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2367930.0, ans=0.0 2024-08-13 23:31:51,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2368030.0, ans=0.0 2024-08-13 23:31:51,233 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2368030.0, ans=0.0 2024-08-13 23:31:52,295 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-13 23:31:53,725 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 24 from LS+wenet, 18 from Vox, 17 fro AS 2024-08-13 23:32:01,899 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2368130.0, ans=0.125 2024-08-13 23:32:13,305 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 4950, loss[loss=0.105, beats_loss=0.008082, ecapa_loss=0.0001434, whisper_loss=0.09544, over 14439.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01084, ecapa_loss=0.0001591, whisper_loss=0.09021, over 3863109.80 frames. ], batch size: 54, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:32:18,168 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2368230.0, ans=0.2 2024-08-13 23:32:27,484 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2368330.0, ans=0.125 2024-08-13 23:32:29,876 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-13 23:32:46,507 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.294e+01 2.547e+01 2.845e+01 3.862e+01, threshold=5.095e+01, percent-clipped=0.0 2024-08-13 23:32:47,280 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.81 vs. limit=22.5 2024-08-13 23:32:53,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2368530.0, ans=0.0 2024-08-13 23:32:54,391 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 35 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-13 23:32:55,070 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.84 vs. limit=15.0 2024-08-13 23:33:07,417 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-13 23:33:19,131 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 5000, loss[loss=0.09883, beats_loss=0.01115, ecapa_loss=0.0001745, whisper_loss=0.08594, over 21743.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01084, ecapa_loss=0.0001595, whisper_loss=0.09072, over 3881804.59 frames. ], batch size: 92, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:33:19,352 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-13 23:33:29,215 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-13 23:33:33,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2368830.0, ans=0.0 2024-08-13 23:33:39,877 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 25 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-13 23:33:55,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2368930.0, ans=0.125 2024-08-13 23:33:58,410 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.67 vs. limit=15.0 2024-08-13 23:34:01,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2369030.0, ans=0.1 2024-08-13 23:34:02,920 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-13 23:34:14,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2369130.0, ans=0.1 2024-08-13 23:34:23,207 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 5050, loss[loss=0.09916, beats_loss=0.01049, ecapa_loss=0.0002014, whisper_loss=0.08665, over 18637.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01087, ecapa_loss=0.0001595, whisper_loss=0.09129, over 3892221.73 frames. ], batch size: 80, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:34:23,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2369230.0, ans=0.125 2024-08-13 23:34:23,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2369230.0, ans=0.125 2024-08-13 23:34:44,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2369330.0, ans=0.0 2024-08-13 23:34:48,908 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 13 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-13 23:34:55,450 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.793e+01 2.306e+01 2.530e+01 2.921e+01 5.103e+01, threshold=5.061e+01, percent-clipped=1.0 2024-08-13 23:34:58,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2369430.0, ans=0.0 2024-08-13 23:34:59,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2369430.0, ans=0.1 2024-08-13 23:35:01,201 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2369530.0, ans=0.2 2024-08-13 23:35:03,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2369530.0, ans=0.0 2024-08-13 23:35:11,051 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2369530.0, ans=0.125 2024-08-13 23:35:13,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2369630.0, ans=0.1 2024-08-13 23:35:19,927 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=2369630.0, ans=10.0 2024-08-13 23:35:27,809 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 5100, loss[loss=0.111, beats_loss=0.01087, ecapa_loss=0.0001673, whisper_loss=0.09845, over 16570.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01094, ecapa_loss=0.0001587, whisper_loss=0.09073, over 3874691.86 frames. ], batch size: 66, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:35:30,322 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-13 23:35:31,519 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 21 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-13 23:35:43,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2369830.0, ans=0.0 2024-08-13 23:35:52,174 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 10 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-13 23:35:53,618 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 28 from LS+wenet, 23 from Vox, 14 fro AS 2024-08-13 23:35:54,826 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 14 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-13 23:35:57,488 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 26 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-13 23:36:02,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2369930.0, ans=0.09899494936611666 2024-08-13 23:36:04,174 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2369930.0, ans=0.0 2024-08-13 23:36:21,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2370130.0, ans=0.1 2024-08-13 23:36:26,150 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2370130.0, ans=0.125 2024-08-13 23:36:32,319 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 5150, loss[loss=0.09412, beats_loss=0.01148, ecapa_loss=0.0001634, whisper_loss=0.08101, over 19700.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01084, ecapa_loss=0.0001584, whisper_loss=0.09117, over 3839653.05 frames. ], batch size: 80, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:36:32,805 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2370230.0, ans=0.1 2024-08-13 23:36:40,717 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2370230.0, ans=0.05 2024-08-13 23:36:45,969 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=2370330.0, ans=0.05 2024-08-13 23:36:46,411 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.62 vs. limit=15.0 2024-08-13 23:37:04,138 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 26 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-13 23:37:05,053 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.950e+01 2.435e+01 2.636e+01 3.072e+01 5.034e+01, threshold=5.273e+01, percent-clipped=0.0 2024-08-13 23:37:14,227 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2370530.0, ans=0.125 2024-08-13 23:37:16,729 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 22 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-13 23:37:23,395 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 23:37:34,928 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 22 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-13 23:37:35,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2370630.0, ans=0.125 2024-08-13 23:37:35,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2370630.0, ans=0.125 2024-08-13 23:37:37,338 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 5200, loss[loss=0.1193, beats_loss=0.008109, ecapa_loss=0.0001976, whisper_loss=0.1092, over 19413.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01081, ecapa_loss=0.0001578, whisper_loss=0.09121, over 3813614.41 frames. ], batch size: 79, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:37:39,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2370730.0, ans=0.125 2024-08-13 23:37:40,009 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 22 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-13 23:37:49,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2370830.0, ans=0.2 2024-08-13 23:37:56,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2370830.0, ans=0.1 2024-08-13 23:37:59,215 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2370830.0, ans=0.125 2024-08-13 23:38:00,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2370830.0, ans=0.0 2024-08-13 23:38:01,878 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.22 vs. limit=22.5 2024-08-13 23:38:15,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2371030.0, ans=0.09899494936611666 2024-08-13 23:38:34,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2371130.0, ans=0.125 2024-08-13 23:38:40,793 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 5250, loss[loss=0.07563, beats_loss=0.0102, ecapa_loss=0.0001525, whisper_loss=0.06391, over 14030.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0108, ecapa_loss=0.0001594, whisper_loss=0.09047, over 3779698.66 frames. ], batch size: 58, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:38:54,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2371330.0, ans=0.2 2024-08-13 23:38:56,802 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 25 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-13 23:38:58,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=2371330.0, ans=10.0 2024-08-13 23:39:13,103 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.686e+01 2.304e+01 2.584e+01 2.839e+01 8.080e+01, threshold=5.168e+01, percent-clipped=1.0 2024-08-13 23:39:17,560 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.84 vs. limit=12.0 2024-08-13 23:39:35,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2371630.0, ans=0.2 2024-08-13 23:39:35,553 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.10 vs. limit=15.0 2024-08-13 23:39:35,674 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.78 vs. limit=15.0 2024-08-13 23:39:39,217 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2371630.0, ans=0.125 2024-08-13 23:39:45,200 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 5300, loss[loss=0.1175, beats_loss=0.01044, ecapa_loss=0.0001412, whisper_loss=0.1056, over 23125.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01079, ecapa_loss=0.0001594, whisper_loss=0.09081, over 3801950.04 frames. ], batch size: 88, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:39:47,828 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 28 from LS+wenet, 10 from Vox, 16 fro AS 2024-08-13 23:40:02,767 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.33 vs. limit=15.0 2024-08-13 23:40:03,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2371830.0, ans=0.125 2024-08-13 23:40:22,737 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 8 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-13 23:40:31,597 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2372030.0, ans=0.125 2024-08-13 23:40:34,548 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.95 vs. limit=15.0 2024-08-13 23:40:46,700 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-13 23:40:46,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2372130.0, ans=0.2 2024-08-13 23:40:48,939 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 5350, loss[loss=0.1116, beats_loss=0.01188, ecapa_loss=0.000135, whisper_loss=0.09842, over 19654.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01078, ecapa_loss=0.0001599, whisper_loss=0.09043, over 3807721.47 frames. ], batch size: 75, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:40:54,101 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.83 vs. limit=6.0 2024-08-13 23:40:54,686 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2372230.0, ans=0.125 2024-08-13 23:40:55,986 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2372230.0, ans=0.125 2024-08-13 23:41:09,960 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2372330.0, ans=0.125 2024-08-13 23:41:15,054 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 22 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-13 23:41:21,441 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.749e+01 2.441e+01 2.659e+01 2.902e+01 4.183e+01, threshold=5.318e+01, percent-clipped=0.0 2024-08-13 23:41:53,148 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 5400, loss[loss=0.1031, beats_loss=0.01022, ecapa_loss=0.0001375, whisper_loss=0.0915, over 23479.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01073, ecapa_loss=0.0001594, whisper_loss=0.09095, over 3838767.41 frames. ], batch size: 91, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:42:08,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2372830.0, ans=0.07 2024-08-13 23:42:26,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2372930.0, ans=0.0 2024-08-13 23:42:30,742 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.57 vs. limit=5.0 2024-08-13 23:42:44,053 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 30 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-13 23:42:45,472 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2373130.0, ans=0.0 2024-08-13 23:42:57,392 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 5450, loss[loss=0.0916, beats_loss=0.01278, ecapa_loss=0.0001476, whisper_loss=0.07734, over 16560.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01076, ecapa_loss=0.0001585, whisper_loss=0.09108, over 3860460.86 frames. ], batch size: 71, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:43:07,580 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 25 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-13 23:43:09,057 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2373330.0, ans=0.0 2024-08-13 23:43:25,673 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-13 23:43:28,611 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2373430.0, ans=0.125 2024-08-13 23:43:29,351 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.699e+01 2.305e+01 2.546e+01 2.870e+01 4.387e+01, threshold=5.093e+01, percent-clipped=0.0 2024-08-13 23:43:32,594 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2373430.0, ans=0.125 2024-08-13 23:43:43,643 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 25 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-13 23:43:49,634 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.63 vs. limit=22.5 2024-08-13 23:43:52,903 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 17 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-13 23:43:53,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2373630.0, ans=0.0 2024-08-13 23:43:57,525 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.17 vs. limit=10.0 2024-08-13 23:44:02,541 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 5500, loss[loss=0.1227, beats_loss=0.008378, ecapa_loss=0.0001524, whisper_loss=0.1128, over 21233.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01076, ecapa_loss=0.0001584, whisper_loss=0.09028, over 3846752.57 frames. ], batch size: 77, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:44:12,570 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 25 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-13 23:44:17,128 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 17 from LS+wenet, 32 from Vox, 40 fro AS 2024-08-13 23:44:27,187 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.86 vs. limit=10.0 2024-08-13 23:44:58,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2374030.0, ans=0.1 2024-08-13 23:45:14,994 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 5550, loss[loss=0.096, beats_loss=0.009614, ecapa_loss=0.0001927, whisper_loss=0.08446, over 21566.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01081, ecapa_loss=0.0001589, whisper_loss=0.0902, over 3886865.29 frames. ], batch size: 91, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:45:19,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2374230.0, ans=0.07 2024-08-13 23:45:25,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2374230.0, ans=0.1 2024-08-13 23:45:29,659 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-13 23:45:42,791 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 28 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-13 23:45:45,682 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 14 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-13 23:45:51,017 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.017e+01 2.310e+01 2.523e+01 2.896e+01 4.190e+01, threshold=5.046e+01, percent-clipped=0.0 2024-08-13 23:45:52,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2374430.0, ans=0.95 2024-08-13 23:46:14,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2374630.0, ans=0.1 2024-08-13 23:46:17,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2374630.0, ans=0.09899494936611666 2024-08-13 23:46:26,381 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 5600, loss[loss=0.08903, beats_loss=0.01112, ecapa_loss=0.0001546, whisper_loss=0.07637, over 13895.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01077, ecapa_loss=0.0001596, whisper_loss=0.09064, over 3859417.32 frames. ], batch size: 53, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:46:36,655 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2374730.0, ans=0.125 2024-08-13 23:46:45,491 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.41 vs. limit=15.0 2024-08-13 23:46:46,608 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2374830.0, ans=0.125 2024-08-13 23:46:49,422 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 17 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-13 23:46:54,597 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2374930.0, ans=0.1 2024-08-13 23:46:59,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2374930.0, ans=0.125 2024-08-13 23:47:05,044 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2374930.0, ans=0.1 2024-08-13 23:47:12,942 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2375030.0, ans=0.0 2024-08-13 23:47:17,482 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2375030.0, ans=0.07 2024-08-13 23:47:26,274 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.62 vs. limit=22.5 2024-08-13 23:47:38,598 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 21 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-13 23:47:39,940 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 5650, loss[loss=0.09076, beats_loss=0.01107, ecapa_loss=0.0001685, whisper_loss=0.078, over 18767.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01081, ecapa_loss=0.0001608, whisper_loss=0.09025, over 3887837.27 frames. ], batch size: 76, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:47:44,338 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 23:47:44,571 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.07 vs. limit=10.0 2024-08-13 23:47:56,602 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.68 vs. limit=5.0 2024-08-13 23:48:02,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2375330.0, ans=0.05 2024-08-13 23:48:06,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=2375430.0, ans=15.0 2024-08-13 23:48:08,468 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.91 vs. limit=15.0 2024-08-13 23:48:09,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2375430.0, ans=0.1 2024-08-13 23:48:13,904 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.753e+01 2.432e+01 2.622e+01 2.958e+01 1.611e+02, threshold=5.244e+01, percent-clipped=2.0 2024-08-13 23:48:24,917 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2375530.0, ans=0.035 2024-08-13 23:48:37,656 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 31 from Vox, 28 fro AS 2024-08-13 23:48:39,968 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 23 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-13 23:48:45,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2375730.0, ans=0.125 2024-08-13 23:48:45,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=2375730.0, ans=22.5 2024-08-13 23:48:46,081 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 5700, loss[loss=0.123, beats_loss=0.009202, ecapa_loss=0.0001548, whisper_loss=0.1123, over 24358.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01079, ecapa_loss=0.0001604, whisper_loss=0.09125, over 3915107.43 frames. ], batch size: 94, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:48:56,368 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.33 vs. limit=15.0 2024-08-13 23:49:03,707 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2375830.0, ans=0.2 2024-08-13 23:49:06,075 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-13 23:49:12,004 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 23 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-13 23:49:12,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2375930.0, ans=0.125 2024-08-13 23:49:12,703 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.71 vs. limit=10.0 2024-08-13 23:49:15,249 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2375930.0, ans=0.1 2024-08-13 23:49:19,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2375930.0, ans=0.125 2024-08-13 23:49:20,617 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 23 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-13 23:49:34,409 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.96 vs. limit=15.0 2024-08-13 23:49:46,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2376130.0, ans=0.125 2024-08-13 23:49:56,271 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.01 vs. limit=6.0 2024-08-13 23:49:57,094 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 5750, loss[loss=0.1191, beats_loss=0.0121, ecapa_loss=0.0001792, whisper_loss=0.1052, over 22457.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01074, ecapa_loss=0.0001612, whisper_loss=0.09204, over 3918045.51 frames. ], batch size: 91, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:49:58,624 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 24 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-13 23:49:59,822 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 23 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-13 23:50:00,323 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.57 vs. limit=10.0 2024-08-13 23:50:19,254 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2376330.0, ans=0.0 2024-08-13 23:50:24,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2376430.0, ans=0.125 2024-08-13 23:50:25,330 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.99 vs. limit=10.0 2024-08-13 23:50:32,594 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.912e+01 2.376e+01 2.677e+01 2.886e+01 5.408e+01, threshold=5.355e+01, percent-clipped=1.0 2024-08-13 23:50:40,599 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-13 23:50:54,919 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-13 23:51:09,491 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 5800, loss[loss=0.1226, beats_loss=0.009136, ecapa_loss=0.0001615, whisper_loss=0.1118, over 17680.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01077, ecapa_loss=0.0001609, whisper_loss=0.09171, over 3919570.20 frames. ], batch size: 67, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:51:18,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2376730.0, ans=0.1 2024-08-13 23:51:19,336 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 17 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-13 23:51:23,732 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=9.708e+01 2024-08-13 23:51:42,900 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.409e+01 2024-08-13 23:51:52,531 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-13 23:51:55,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2377030.0, ans=0.1 2024-08-13 23:51:56,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2377030.0, ans=0.125 2024-08-13 23:51:56,982 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.48 vs. limit=15.0 2024-08-13 23:52:01,833 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-13 23:52:14,879 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 23 from LS+wenet, 11 from Vox, 36 fro AS 2024-08-13 23:52:17,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2377230.0, ans=0.0 2024-08-13 23:52:18,449 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 5850, loss[loss=0.1009, beats_loss=0.01048, ecapa_loss=0.000171, whisper_loss=0.08867, over 17530.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01078, ecapa_loss=0.0001601, whisper_loss=0.09135, over 3923855.52 frames. ], batch size: 70, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:52:22,219 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-13 23:52:27,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2377230.0, ans=0.025 2024-08-13 23:52:44,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2377430.0, ans=0.125 2024-08-13 23:52:46,327 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.59 vs. limit=15.0 2024-08-13 23:52:50,587 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.87 vs. limit=15.0 2024-08-13 23:52:50,902 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.427e+01 2.667e+01 3.028e+01 6.435e+01, threshold=5.335e+01, percent-clipped=1.0 2024-08-13 23:52:55,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2377430.0, ans=0.125 2024-08-13 23:52:59,162 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 14 from Vox, 44 fro AS 2024-08-13 23:53:05,488 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2377530.0, ans=0.125 2024-08-13 23:53:15,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2377630.0, ans=0.0 2024-08-13 23:53:16,105 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 31 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-13 23:53:23,660 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 5900, loss[loss=0.1156, beats_loss=0.01087, ecapa_loss=0.0001268, whisper_loss=0.1035, over 17088.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01074, ecapa_loss=0.0001608, whisper_loss=0.09102, over 3889704.76 frames. ], batch size: 64, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:53:38,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2377830.0, ans=0.1 2024-08-13 23:53:38,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2377830.0, ans=0.125 2024-08-13 23:53:39,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2377830.0, ans=0.1 2024-08-13 23:53:42,218 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2377830.0, ans=0.0 2024-08-13 23:53:58,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2377930.0, ans=0.2 2024-08-13 23:54:08,367 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2378030.0, ans=0.05 2024-08-13 23:54:19,340 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-13 23:54:23,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2378130.0, ans=0.0 2024-08-13 23:54:26,814 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 14 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-13 23:54:28,245 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 5950, loss[loss=0.07073, beats_loss=0.01365, ecapa_loss=0.0001561, whisper_loss=0.05552, over 15906.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01082, ecapa_loss=0.0001608, whisper_loss=0.09012, over 3871271.90 frames. ], batch size: 66, lr: 3.69e-03, grad_scale: 1.152921504606847e+18 2024-08-13 23:54:41,727 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.05 vs. limit=15.0 2024-08-13 23:54:43,364 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2024-08-13 23:54:49,359 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2378330.0, ans=0.1 2024-08-13 23:55:00,185 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.905e+01 2.346e+01 2.593e+01 2.833e+01 5.502e+01, threshold=5.186e+01, percent-clipped=1.0 2024-08-13 23:55:06,790 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 20 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-13 23:55:20,044 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2378630.0, ans=0.2 2024-08-13 23:55:31,522 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 31 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-13 23:55:32,581 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 6000, loss[loss=0.1199, beats_loss=0.008143, ecapa_loss=0.0001678, whisper_loss=0.1101, over 21686.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01083, ecapa_loss=0.0001609, whisper_loss=0.09038, over 3860452.42 frames. ], batch size: 85, lr: 3.69e-03, grad_scale: 1.152921504606847e+18 2024-08-13 23:55:32,582 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-13 23:56:14,146 INFO [train_multi_KD3.py:1149] (0/4) Epoch 17, validation on ASR_libri: loss=0.2527, beats_loss=0, ecapa_loss=0.0005558, whisper_loss=0.2472, over 922467.00 frames. 2024-08-13 23:56:35,175 INFO [train_multi_KD3.py:1149] (0/4) Epoch 17, validation on SV_voxceleb1: loss=0.004377, beats_loss=0, ecapa_loss=0.0004377, whisper_loss=0, over 939242.00 frames. 2024-08-13 23:58:33,220 INFO [train_multi_KD3.py:1149] (0/4) Epoch 17, validation on AT_audioset: loss=0.02362, beats_loss=0.02362, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 23:58:33,225 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32601MB 2024-08-13 23:58:35,876 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 21 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-13 23:58:36,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2378730.0, ans=0.0 2024-08-13 23:58:38,294 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-13 23:58:55,201 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-13 23:59:08,036 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-13 23:59:20,708 WARNING [optim.py:496] (0/4) Scaling gradients by 0.09559547156095505, model_norm_threshold=51.8635368347168 2024-08-13 23:59:20,955 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.26, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.554e+04, grad_sumsq=7.554e+04, orig_rms_sq=1.000e+00 2024-08-13 23:59:29,901 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2379130.0, ans=0.125 2024-08-13 23:59:30,944 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 16 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-13 23:59:42,980 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2379230.0, ans=0.0 2024-08-13 23:59:43,469 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.18 vs. limit=15.0 2024-08-13 23:59:44,467 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 6050, loss[loss=0.1146, beats_loss=0.008613, ecapa_loss=0.0001855, whisper_loss=0.1042, over 21324.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01082, ecapa_loss=0.0001609, whisper_loss=0.09059, over 3842905.75 frames. ], batch size: 89, lr: 3.69e-03, grad_scale: 1.152921504606847e+18 2024-08-13 23:59:44,580 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 28 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-13 23:59:47,481 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 27 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-13 23:59:54,181 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=8.47 vs. limit=12.0 2024-08-13 23:59:58,982 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 16 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-14 00:00:08,001 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2379330.0, ans=0.1 2024-08-14 00:00:12,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2379430.0, ans=0.1 2024-08-14 00:00:20,912 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.813e+01 2.343e+01 2.535e+01 2.756e+01 5.425e+02, threshold=5.070e+01, percent-clipped=3.0 2024-08-14 00:00:21,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2379430.0, ans=0.125 2024-08-14 00:00:42,492 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-14 00:00:45,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2379630.0, ans=0.125 2024-08-14 00:00:57,816 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 6100, loss[loss=0.106, beats_loss=0.01137, ecapa_loss=0.0001926, whisper_loss=0.0927, over 21339.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01079, ecapa_loss=0.0001611, whisper_loss=0.09114, over 3843920.78 frames. ], batch size: 87, lr: 3.69e-03, grad_scale: 1.152921504606847e+18 2024-08-14 00:01:06,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2379730.0, ans=0.125 2024-08-14 00:01:16,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2379830.0, ans=0.125 2024-08-14 00:01:40,897 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.60 vs. limit=15.0 2024-08-14 00:01:49,150 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 22 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-14 00:01:51,055 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.59 vs. limit=15.0 2024-08-14 00:01:55,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2380130.0, ans=0.2 2024-08-14 00:01:56,329 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.51 vs. limit=15.0 2024-08-14 00:02:04,246 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.78 vs. limit=12.0 2024-08-14 00:02:04,685 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 6150, loss[loss=0.1008, beats_loss=0.0108, ecapa_loss=0.0001458, whisper_loss=0.0885, over 22569.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01077, ecapa_loss=0.0001605, whisper_loss=0.09074, over 3836294.53 frames. ], batch size: 91, lr: 3.69e-03, grad_scale: 1.152921504606847e+18 2024-08-14 00:02:11,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2380230.0, ans=0.125 2024-08-14 00:02:23,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2380330.0, ans=0.2 2024-08-14 00:02:31,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2380430.0, ans=0.1 2024-08-14 00:02:32,009 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 00:02:36,936 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.905e+01 2.475e+01 2.774e+01 3.233e+01 4.746e+01, threshold=5.548e+01, percent-clipped=1.0 2024-08-14 00:02:46,571 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2380530.0, ans=0.125 2024-08-14 00:02:52,609 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-14 00:02:59,597 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2380630.0, ans=0.125 2024-08-14 00:03:09,127 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 6200, loss[loss=0.08276, beats_loss=0.01153, ecapa_loss=0.0002374, whisper_loss=0.06886, over 13713.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01077, ecapa_loss=0.0001598, whisper_loss=0.09133, over 3863221.19 frames. ], batch size: 60, lr: 3.69e-03, grad_scale: 1.152921504606847e+18 2024-08-14 00:03:12,206 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 30 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-14 00:03:22,969 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2380830.0, ans=0.2 2024-08-14 00:03:36,376 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 17 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-14 00:03:46,609 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 27 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-14 00:03:54,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2381030.0, ans=0.0 2024-08-14 00:04:06,000 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 22 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-14 00:04:12,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2381130.0, ans=0.125 2024-08-14 00:04:15,111 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 6250, loss[loss=0.1023, beats_loss=0.009638, ecapa_loss=0.0001593, whisper_loss=0.09106, over 16095.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01072, ecapa_loss=0.0001598, whisper_loss=0.09121, over 3841882.77 frames. ], batch size: 61, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:04:31,592 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2381330.0, ans=0.125 2024-08-14 00:04:44,279 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.79 vs. limit=15.0 2024-08-14 00:04:48,485 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.400e+01 2.693e+01 3.116e+01 1.076e+02, threshold=5.386e+01, percent-clipped=3.0 2024-08-14 00:04:56,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2381530.0, ans=0.125 2024-08-14 00:05:04,237 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-14 00:05:15,990 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 21 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-14 00:05:19,759 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 6300, loss[loss=0.1229, beats_loss=0.009015, ecapa_loss=0.0001542, whisper_loss=0.1124, over 21790.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01071, ecapa_loss=0.0001603, whisper_loss=0.09156, over 3838604.19 frames. ], batch size: 84, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:05:27,302 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.579e+05 2024-08-14 00:05:32,356 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=2381830.0, ans=0.05 2024-08-14 00:05:36,329 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-14 00:05:40,289 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2381830.0, ans=0.125 2024-08-14 00:05:44,142 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2381930.0, ans=0.125 2024-08-14 00:05:50,177 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 16 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-14 00:05:53,304 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2381930.0, ans=0.1 2024-08-14 00:05:57,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2382030.0, ans=0.125 2024-08-14 00:06:09,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2382130.0, ans=0.125 2024-08-14 00:06:16,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2382130.0, ans=0.0 2024-08-14 00:06:23,702 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 6350, loss[loss=0.1085, beats_loss=0.009261, ecapa_loss=0.0001643, whisper_loss=0.0976, over 18058.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01069, ecapa_loss=0.000161, whisper_loss=0.09111, over 3839420.39 frames. ], batch size: 72, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:06:27,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2382230.0, ans=0.04949747468305833 2024-08-14 00:06:56,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2382430.0, ans=0.125 2024-08-14 00:06:57,626 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.923e+01 2.344e+01 2.620e+01 2.945e+01 1.011e+02, threshold=5.239e+01, percent-clipped=2.0 2024-08-14 00:06:59,353 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 00:07:05,111 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.71 vs. limit=12.0 2024-08-14 00:07:16,745 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2382630.0, ans=0.0 2024-08-14 00:07:21,249 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-14 00:07:23,768 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 21 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-14 00:07:28,797 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 6400, loss[loss=0.1034, beats_loss=0.009891, ecapa_loss=0.0001703, whisper_loss=0.09178, over 22873.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01069, ecapa_loss=0.0001616, whisper_loss=0.09108, over 3853390.75 frames. ], batch size: 93, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:07:31,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2382730.0, ans=0.0 2024-08-14 00:07:35,479 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 36 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-14 00:07:35,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2382730.0, ans=0.125 2024-08-14 00:07:44,553 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 21 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-14 00:07:52,510 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2382830.0, ans=0.125 2024-08-14 00:07:55,782 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.25 vs. limit=22.5 2024-08-14 00:08:10,080 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.23 vs. limit=15.0 2024-08-14 00:08:13,420 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2383030.0, ans=0.125 2024-08-14 00:08:19,568 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2383130.0, ans=0.2 2024-08-14 00:08:28,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2383130.0, ans=0.125 2024-08-14 00:08:34,032 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 6450, loss[loss=0.1228, beats_loss=0.009541, ecapa_loss=0.0001456, whisper_loss=0.1118, over 22814.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01068, ecapa_loss=0.0001607, whisper_loss=0.09137, over 3833305.94 frames. ], batch size: 88, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:08:48,265 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2383330.0, ans=0.125 2024-08-14 00:08:53,405 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 15 from LS+wenet, 26 from Vox, 52 fro AS 2024-08-14 00:08:57,223 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2383330.0, ans=0.125 2024-08-14 00:08:58,436 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2383430.0, ans=0.125 2024-08-14 00:09:04,754 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2383430.0, ans=0.0 2024-08-14 00:09:06,867 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.325e+01 2.600e+01 2.932e+01 4.417e+01, threshold=5.200e+01, percent-clipped=0.0 2024-08-14 00:09:12,794 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-14 00:09:30,330 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-14 00:09:37,822 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 6500, loss[loss=0.09238, beats_loss=0.01258, ecapa_loss=0.000145, whisper_loss=0.07835, over 23212.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01068, ecapa_loss=0.0001612, whisper_loss=0.09127, over 3850520.13 frames. ], batch size: 95, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:09:52,568 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2383830.0, ans=0.0 2024-08-14 00:09:54,786 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 32 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-14 00:10:09,152 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 21 from LS+wenet, 24 from Vox, 47 fro AS 2024-08-14 00:10:11,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2383930.0, ans=0.125 2024-08-14 00:10:12,735 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.46 vs. limit=15.0 2024-08-14 00:10:29,921 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-14 00:10:40,737 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 35 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-14 00:10:41,720 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 6550, loss[loss=0.1221, beats_loss=0.008494, ecapa_loss=0.0001851, whisper_loss=0.1118, over 22535.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01064, ecapa_loss=0.0001606, whisper_loss=0.09213, over 3902836.11 frames. ], batch size: 92, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:10:44,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2384230.0, ans=0.125 2024-08-14 00:10:49,102 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.27 vs. limit=6.0 2024-08-14 00:10:52,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2384230.0, ans=0.2 2024-08-14 00:10:54,247 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.44 vs. limit=10.0 2024-08-14 00:10:56,212 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 28 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-14 00:11:02,633 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2384330.0, ans=0.0 2024-08-14 00:11:03,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2384330.0, ans=0.125 2024-08-14 00:11:09,013 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 17 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-14 00:11:10,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2384430.0, ans=0.125 2024-08-14 00:11:15,105 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.910e+01 2.424e+01 2.648e+01 2.996e+01 4.448e+01, threshold=5.297e+01, percent-clipped=0.0 2024-08-14 00:11:24,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2384530.0, ans=0.2 2024-08-14 00:11:29,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2384530.0, ans=0.1 2024-08-14 00:11:30,876 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 35 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-14 00:11:33,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2384630.0, ans=0.125 2024-08-14 00:11:38,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2384630.0, ans=0.0 2024-08-14 00:11:44,711 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2384730.0, ans=0.125 2024-08-14 00:11:45,384 WARNING [optim.py:496] (0/4) Scaling gradients by 0.09902217984199524, model_norm_threshold=52.96651840209961 2024-08-14 00:11:45,561 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.30, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.471e+04, grad_sumsq=8.471e+04, orig_rms_sq=1.000e+00 2024-08-14 00:11:45,588 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 6600, loss[loss=0.1015, beats_loss=0.01215, ecapa_loss=0.0001562, whisper_loss=0.0878, over 18696.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.0106, ecapa_loss=0.0001611, whisper_loss=0.0933, over 3959989.98 frames. ], batch size: 76, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:11:45,694 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-14 00:11:55,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2384730.0, ans=0.0 2024-08-14 00:12:03,512 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-14 00:12:05,903 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 27 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-14 00:12:49,213 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 6650, loss[loss=0.09866, beats_loss=0.01232, ecapa_loss=0.0001597, whisper_loss=0.08474, over 20650.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01057, ecapa_loss=0.0001614, whisper_loss=0.09357, over 3959192.25 frames. ], batch size: 83, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:12:54,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2385230.0, ans=0.125 2024-08-14 00:12:55,680 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 30 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-14 00:12:59,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2385230.0, ans=0.125 2024-08-14 00:13:06,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2385330.0, ans=0.125 2024-08-14 00:13:22,206 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.754e+01 2.456e+01 2.724e+01 3.056e+01 5.349e+02, threshold=5.448e+01, percent-clipped=1.0 2024-08-14 00:13:31,900 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.85 vs. limit=15.0 2024-08-14 00:13:36,530 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2385530.0, ans=0.125 2024-08-14 00:13:41,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2385630.0, ans=0.1 2024-08-14 00:13:49,423 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 22 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-14 00:13:53,112 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 6700, loss[loss=0.09689, beats_loss=0.01227, ecapa_loss=0.0001701, whisper_loss=0.08291, over 19694.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01061, ecapa_loss=0.0001626, whisper_loss=0.09304, over 3951391.38 frames. ], batch size: 80, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:13:53,297 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 24 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-14 00:13:58,773 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2385730.0, ans=0.04949747468305833 2024-08-14 00:14:12,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2385830.0, ans=0.125 2024-08-14 00:14:26,782 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2385930.0, ans=0.125 2024-08-14 00:14:30,626 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 28 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-14 00:14:38,297 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 30 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-14 00:14:43,820 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2386130.0, ans=0.125 2024-08-14 00:14:43,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2386130.0, ans=0.1 2024-08-14 00:14:57,580 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 6750, loss[loss=0.09464, beats_loss=0.0119, ecapa_loss=0.0001571, whisper_loss=0.08117, over 21505.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01059, ecapa_loss=0.0001627, whisper_loss=0.09305, over 3939700.49 frames. ], batch size: 88, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:14:57,770 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 16 from Vox, 47 fro AS 2024-08-14 00:14:57,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2386230.0, ans=0.0 2024-08-14 00:15:02,858 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 17 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-14 00:15:04,843 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.27 vs. limit=15.0 2024-08-14 00:15:17,376 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-14 00:15:31,289 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.840e+01 2.377e+01 2.658e+01 2.891e+01 6.359e+01, threshold=5.316e+01, percent-clipped=1.0 2024-08-14 00:15:44,035 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.90 vs. limit=15.0 2024-08-14 00:15:44,208 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.34 vs. limit=6.0 2024-08-14 00:15:45,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2386530.0, ans=0.1 2024-08-14 00:16:02,326 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 6800, loss[loss=0.103, beats_loss=0.01245, ecapa_loss=0.0001637, whisper_loss=0.08893, over 22309.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01068, ecapa_loss=0.0001623, whisper_loss=0.09192, over 3922950.34 frames. ], batch size: 91, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:16:05,183 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.04 vs. limit=15.0 2024-08-14 00:16:09,687 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 14 from LS+wenet, 25 from Vox, 22 fro AS 2024-08-14 00:16:21,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2386830.0, ans=0.0 2024-08-14 00:16:25,449 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 22 from LS+wenet, 29 from Vox, 40 fro AS 2024-08-14 00:16:25,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=2386830.0, ans=10.0 2024-08-14 00:16:28,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2386930.0, ans=0.1 2024-08-14 00:16:32,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2386930.0, ans=0.0 2024-08-14 00:16:40,149 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 35 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-14 00:16:47,017 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.39 vs. limit=15.0 2024-08-14 00:16:53,695 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 29 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-14 00:16:54,164 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.95 vs. limit=22.5 2024-08-14 00:17:00,384 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-14 00:17:06,489 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 6850, loss[loss=0.1157, beats_loss=0.01061, ecapa_loss=0.0001527, whisper_loss=0.1035, over 22436.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01072, ecapa_loss=0.0001605, whisper_loss=0.09227, over 3939266.68 frames. ], batch size: 89, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:17:06,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2387230.0, ans=0.2 2024-08-14 00:17:13,715 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.26 vs. limit=10.0 2024-08-14 00:17:17,960 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.91 vs. limit=12.0 2024-08-14 00:17:18,605 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 42 from LS+wenet, 13 from Vox, 39 fro AS 2024-08-14 00:17:25,094 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 27 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-14 00:17:26,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2387330.0, ans=0.0 2024-08-14 00:17:28,941 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 18 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-14 00:17:30,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2387330.0, ans=0.125 2024-08-14 00:17:30,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2387330.0, ans=0.07 2024-08-14 00:17:41,093 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.806e+01 2.424e+01 2.658e+01 2.894e+01 9.462e+01, threshold=5.316e+01, percent-clipped=2.0 2024-08-14 00:17:56,939 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-14 00:18:00,803 INFO [train_multi_KD3.py:844] (0/4) A total of 96 cuts. 25 from LS+wenet, 22 from Vox, 49 fro AS 2024-08-14 00:18:06,109 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2387630.0, ans=0.07 2024-08-14 00:18:11,974 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 6900, loss[loss=0.09348, beats_loss=0.013, ecapa_loss=0.0001511, whisper_loss=0.07897, over 19702.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01085, ecapa_loss=0.0001612, whisper_loss=0.09146, over 3916034.28 frames. ], batch size: 83, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:18:30,883 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 21 from LS+wenet, 27 from Vox, 24 fro AS 2024-08-14 00:18:42,807 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.30 vs. limit=12.0 2024-08-14 00:18:56,546 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.33 vs. limit=22.5 2024-08-14 00:18:58,913 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2388030.0, ans=0.125 2024-08-14 00:19:03,490 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2388030.0, ans=0.2 2024-08-14 00:19:09,705 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 28 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-14 00:19:14,622 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.47 vs. limit=10.0 2024-08-14 00:19:20,635 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 6950, loss[loss=0.09781, beats_loss=0.01346, ecapa_loss=0.0001206, whisper_loss=0.08315, over 23483.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01081, ecapa_loss=0.0001598, whisper_loss=0.09161, over 3887957.14 frames. ], batch size: 93, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:19:34,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2388330.0, ans=0.125 2024-08-14 00:19:51,095 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2388430.0, ans=0.125 2024-08-14 00:20:00,391 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.095e+01 2.466e+01 2.702e+01 3.028e+01 4.381e+01, threshold=5.405e+01, percent-clipped=0.0 2024-08-14 00:20:07,697 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 00:20:25,479 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 20 from LS+wenet, 11 from Vox, 34 fro AS 2024-08-14 00:20:38,024 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 7000, loss[loss=0.1135, beats_loss=0.009135, ecapa_loss=0.0001678, whisper_loss=0.1027, over 19227.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01087, ecapa_loss=0.0001596, whisper_loss=0.09037, over 3864742.24 frames. ], batch size: 78, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:20:50,711 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 22 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-14 00:20:51,039 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2388730.0, ans=0.125 2024-08-14 00:20:59,264 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.98 vs. limit=15.0 2024-08-14 00:21:04,917 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2388830.0, ans=0.125 2024-08-14 00:21:06,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2388830.0, ans=0.0 2024-08-14 00:21:24,584 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 27 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-14 00:21:26,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2389030.0, ans=0.125 2024-08-14 00:21:39,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2389030.0, ans=0.125 2024-08-14 00:21:50,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2389130.0, ans=0.0 2024-08-14 00:21:53,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2389130.0, ans=0.125 2024-08-14 00:21:58,303 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 7050, loss[loss=0.1209, beats_loss=0.01045, ecapa_loss=0.0001476, whisper_loss=0.109, over 22435.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01081, ecapa_loss=0.0001602, whisper_loss=0.09087, over 3874764.07 frames. ], batch size: 87, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:22:00,782 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 28 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-14 00:22:08,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2389230.0, ans=0.125 2024-08-14 00:22:21,339 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 22 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 00:22:28,327 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2389330.0, ans=0.0 2024-08-14 00:22:30,314 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 25 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-14 00:22:42,099 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.720e+01 2.266e+01 2.592e+01 2.903e+01 1.485e+02, threshold=5.183e+01, percent-clipped=2.0 2024-08-14 00:22:51,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2389530.0, ans=0.0 2024-08-14 00:22:54,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2389530.0, ans=0.015 2024-08-14 00:23:07,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2389630.0, ans=0.125 2024-08-14 00:23:14,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2389630.0, ans=0.125 2024-08-14 00:23:15,300 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2389630.0, ans=0.2 2024-08-14 00:23:19,726 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 7100, loss[loss=0.1054, beats_loss=0.009612, ecapa_loss=0.0002115, whisper_loss=0.09364, over 21440.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01082, ecapa_loss=0.000159, whisper_loss=0.0908, over 3900025.36 frames. ], batch size: 92, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:23:28,775 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 35 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-14 00:23:49,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2389930.0, ans=0.0 2024-08-14 00:24:08,550 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-14 00:24:36,177 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-14 00:24:39,426 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 7150, loss[loss=0.09187, beats_loss=0.01007, ecapa_loss=0.0002039, whisper_loss=0.07976, over 19697.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01079, ecapa_loss=0.000159, whisper_loss=0.09133, over 3924146.90 frames. ], batch size: 87, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:24:43,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2390230.0, ans=0.2 2024-08-14 00:24:46,685 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-14 00:24:58,925 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 26 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-14 00:25:09,000 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-14 00:25:14,420 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.88 vs. limit=6.0 2024-08-14 00:25:20,421 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.006e+01 2.385e+01 2.638e+01 3.035e+01 4.278e+01, threshold=5.277e+01, percent-clipped=0.0 2024-08-14 00:25:30,273 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2390530.0, ans=0.0 2024-08-14 00:25:57,503 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 7200, loss[loss=0.09918, beats_loss=0.0107, ecapa_loss=0.0001139, whisper_loss=0.08734, over 22336.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.0107, ecapa_loss=0.0001596, whisper_loss=0.09199, over 3921099.10 frames. ], batch size: 84, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:25:59,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2390730.0, ans=0.05 2024-08-14 00:26:04,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2390730.0, ans=0.125 2024-08-14 00:26:09,306 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 23 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-14 00:26:17,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2390830.0, ans=0.125 2024-08-14 00:26:21,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2390830.0, ans=0.0 2024-08-14 00:26:29,286 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-14 00:26:34,759 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.30 vs. limit=15.0 2024-08-14 00:26:35,587 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 26 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-14 00:26:42,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2391030.0, ans=0.125 2024-08-14 00:26:45,131 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 23 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-14 00:26:45,806 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.12 vs. limit=15.0 2024-08-14 00:27:10,790 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2391130.0, ans=0.1 2024-08-14 00:27:12,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2391130.0, ans=0.2 2024-08-14 00:27:14,489 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 7250, loss[loss=0.1001, beats_loss=0.01237, ecapa_loss=0.0001242, whisper_loss=0.08649, over 23039.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01072, ecapa_loss=0.00016, whisper_loss=0.09156, over 3888004.23 frames. ], batch size: 90, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:27:20,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2391230.0, ans=0.1 2024-08-14 00:27:35,256 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 27 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-14 00:27:41,076 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 21 from LS+wenet, 8 from Vox, 46 fro AS 2024-08-14 00:27:43,079 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-14 00:27:44,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2391430.0, ans=0.0 2024-08-14 00:27:55,093 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.401e+01 2.589e+01 2.911e+01 7.095e+01, threshold=5.179e+01, percent-clipped=1.0 2024-08-14 00:27:59,177 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.73 vs. limit=22.5 2024-08-14 00:28:24,013 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 24 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-14 00:28:33,707 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 7300, loss[loss=0.09662, beats_loss=0.01174, ecapa_loss=0.0001494, whisper_loss=0.08338, over 22381.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01069, ecapa_loss=0.0001611, whisper_loss=0.09216, over 3893922.63 frames. ], batch size: 92, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:28:45,135 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2391730.0, ans=0.125 2024-08-14 00:28:58,522 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.73 vs. limit=15.0 2024-08-14 00:29:22,756 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 16 from LS+wenet, 9 from Vox, 29 fro AS 2024-08-14 00:29:29,549 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2392030.0, ans=0.05 2024-08-14 00:29:50,851 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 7350, loss[loss=0.1163, beats_loss=0.007465, ecapa_loss=0.0002186, whisper_loss=0.1067, over 18177.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01077, ecapa_loss=0.0001605, whisper_loss=0.09204, over 3891640.91 frames. ], batch size: 77, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:29:58,623 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 14 from Vox, 47 fro AS 2024-08-14 00:30:32,264 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.930e+01 2.397e+01 2.587e+01 2.821e+01 4.137e+01, threshold=5.175e+01, percent-clipped=0.0 2024-08-14 00:30:40,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2392530.0, ans=0.0 2024-08-14 00:30:51,275 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2392530.0, ans=0.125 2024-08-14 00:31:03,152 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 27 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-14 00:31:06,940 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 33 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-14 00:31:08,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2392630.0, ans=0.125 2024-08-14 00:31:10,397 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2392630.0, ans=0.0 2024-08-14 00:31:12,991 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 7400, loss[loss=0.103, beats_loss=0.006824, ecapa_loss=0.0002235, whisper_loss=0.09395, over 13059.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01076, ecapa_loss=0.0001615, whisper_loss=0.09175, over 3897871.67 frames. ], batch size: 54, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:31:34,195 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2392830.0, ans=0.125 2024-08-14 00:32:01,352 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.25 vs. limit=15.0 2024-08-14 00:32:02,818 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.07 vs. limit=15.0 2024-08-14 00:32:31,720 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 34 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-14 00:32:34,303 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 7450, loss[loss=0.1073, beats_loss=0.01257, ecapa_loss=0.0001626, whisper_loss=0.09311, over 21248.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01079, ecapa_loss=0.0001614, whisper_loss=0.09122, over 3871798.64 frames. ], batch size: 85, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:32:34,473 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 25 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-14 00:32:49,791 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.98 vs. limit=15.0 2024-08-14 00:32:55,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2393330.0, ans=0.0 2024-08-14 00:32:57,925 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 26 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-14 00:33:02,990 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2024-08-14 00:33:06,780 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 23 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-14 00:33:08,492 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2393430.0, ans=0.2 2024-08-14 00:33:18,158 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2393430.0, ans=0.125 2024-08-14 00:33:19,158 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.758e+01 2.395e+01 2.642e+01 3.080e+01 4.669e+01, threshold=5.285e+01, percent-clipped=0.0 2024-08-14 00:33:32,538 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.26 vs. limit=22.5 2024-08-14 00:33:42,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2393630.0, ans=0.0 2024-08-14 00:34:22,872 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.41 vs. limit=15.0 2024-08-14 00:34:24,310 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 7500, loss[loss=0.1113, beats_loss=0.01125, ecapa_loss=0.0001621, whisper_loss=0.09845, over 22506.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01075, ecapa_loss=0.0001613, whisper_loss=0.09107, over 3877475.94 frames. ], batch size: 91, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:34:25,875 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 25 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-14 00:34:38,386 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.96 vs. limit=15.0 2024-08-14 00:34:58,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2393930.0, ans=0.0 2024-08-14 00:35:14,495 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2394030.0, ans=0.0 2024-08-14 00:35:19,968 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2394030.0, ans=0.0 2024-08-14 00:35:29,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2394130.0, ans=0.125 2024-08-14 00:35:44,482 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 7550, loss[loss=0.0877, beats_loss=0.01167, ecapa_loss=0.0001868, whisper_loss=0.07416, over 18283.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01081, ecapa_loss=0.00016, whisper_loss=0.09043, over 3875083.53 frames. ], batch size: 79, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:35:44,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2394230.0, ans=0.125 2024-08-14 00:36:16,113 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 21 from LS+wenet, 30 from Vox, 27 fro AS 2024-08-14 00:36:20,410 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2394430.0, ans=0.2 2024-08-14 00:36:26,054 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.957e+01 2.308e+01 2.563e+01 2.921e+01 3.982e+01, threshold=5.125e+01, percent-clipped=0.0 2024-08-14 00:36:43,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2394530.0, ans=0.2 2024-08-14 00:36:46,252 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 24 from LS+wenet, 30 from Vox, 39 fro AS 2024-08-14 00:37:00,916 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 19 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-14 00:37:05,689 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 7600, loss[loss=0.07944, beats_loss=0.01412, ecapa_loss=0.000173, whisper_loss=0.06359, over 16396.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0108, ecapa_loss=0.0001611, whisper_loss=0.09012, over 3862831.21 frames. ], batch size: 72, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:37:11,209 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 25 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-14 00:37:19,494 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.43 vs. limit=22.5 2024-08-14 00:37:33,337 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2394830.0, ans=0.0 2024-08-14 00:37:45,858 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.82 vs. limit=6.0 2024-08-14 00:37:49,869 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 14 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-14 00:37:52,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2395030.0, ans=0.125 2024-08-14 00:37:55,880 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 20 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-14 00:38:10,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2395130.0, ans=0.125 2024-08-14 00:38:12,356 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2395130.0, ans=0.125 2024-08-14 00:38:15,189 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 27 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-14 00:38:16,680 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 25 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-14 00:38:24,422 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 7650, loss[loss=0.09621, beats_loss=0.01072, ecapa_loss=0.0002269, whisper_loss=0.08322, over 20343.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01078, ecapa_loss=0.0001604, whisper_loss=0.09031, over 3852533.11 frames. ], batch size: 89, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:38:26,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2395230.0, ans=0.5 2024-08-14 00:38:26,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=2395230.0, ans=15.0 2024-08-14 00:38:28,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2395230.0, ans=0.1 2024-08-14 00:38:39,673 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 27 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-14 00:38:44,645 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-14 00:39:01,545 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2395430.0, ans=0.0 2024-08-14 00:39:07,182 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.336e+01 2.593e+01 2.907e+01 5.798e+01, threshold=5.186e+01, percent-clipped=1.0 2024-08-14 00:39:15,906 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 25 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-14 00:39:22,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2395530.0, ans=0.07 2024-08-14 00:39:27,787 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.00 vs. limit=15.0 2024-08-14 00:39:28,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2395630.0, ans=0.0 2024-08-14 00:39:32,271 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.52 vs. limit=15.0 2024-08-14 00:39:47,195 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 7700, loss[loss=0.09352, beats_loss=0.01093, ecapa_loss=0.0001714, whisper_loss=0.08087, over 23021.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01082, ecapa_loss=0.0001606, whisper_loss=0.08939, over 3847169.54 frames. ], batch size: 93, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:39:49,484 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-14 00:40:05,264 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 20 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-14 00:40:05,600 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2395830.0, ans=0.125 2024-08-14 00:40:13,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2395830.0, ans=0.2 2024-08-14 00:40:22,588 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 00:40:33,127 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 18 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-14 00:40:35,816 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 25 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-14 00:40:42,914 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.63 vs. limit=15.0 2024-08-14 00:40:44,033 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 25 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-14 00:40:51,203 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.12 vs. limit=6.0 2024-08-14 00:41:04,169 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 22 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-14 00:41:07,499 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 7750, loss[loss=0.1206, beats_loss=0.009703, ecapa_loss=0.0001667, whisper_loss=0.1093, over 19242.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01073, ecapa_loss=0.0001603, whisper_loss=0.09006, over 3857713.52 frames. ], batch size: 76, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:41:09,173 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2396230.0, ans=0.125 2024-08-14 00:41:20,105 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-14 00:41:28,783 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-14 00:41:35,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2396330.0, ans=0.2 2024-08-14 00:41:42,385 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 30 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-14 00:41:47,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2396430.0, ans=0.5 2024-08-14 00:41:49,007 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.794e+01 2.485e+01 2.781e+01 3.099e+01 5.095e+01, threshold=5.562e+01, percent-clipped=0.0 2024-08-14 00:41:50,805 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 00:41:52,058 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2396430.0, ans=0.125 2024-08-14 00:41:56,713 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 19 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-14 00:41:58,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2396530.0, ans=0.0 2024-08-14 00:42:07,597 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2396530.0, ans=0.125 2024-08-14 00:42:26,651 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 7800, loss[loss=0.1112, beats_loss=0.009785, ecapa_loss=0.0002027, whisper_loss=0.09942, over 22897.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0107, ecapa_loss=0.0001607, whisper_loss=0.09007, over 3853926.44 frames. ], batch size: 92, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:42:26,762 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 23 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-14 00:42:26,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2396730.0, ans=0.1 2024-08-14 00:42:57,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2396830.0, ans=0.125 2024-08-14 00:42:58,821 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2396930.0, ans=0.0 2024-08-14 00:43:23,665 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 22 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-14 00:43:25,701 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 19 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-14 00:43:27,988 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.24 vs. limit=22.5 2024-08-14 00:43:31,660 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.75 vs. limit=10.0 2024-08-14 00:43:49,896 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 7850, loss[loss=0.07233, beats_loss=0.01208, ecapa_loss=0.0001685, whisper_loss=0.05857, over 19906.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01081, ecapa_loss=0.0001595, whisper_loss=0.09034, over 3872690.31 frames. ], batch size: 83, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:44:10,291 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 16 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-14 00:44:15,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2397330.0, ans=0.2 2024-08-14 00:44:18,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2397330.0, ans=0.0 2024-08-14 00:44:21,326 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-14 00:44:24,430 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 18 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-14 00:44:30,190 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.344e+01 2.594e+01 2.942e+01 8.076e+01, threshold=5.188e+01, percent-clipped=2.0 2024-08-14 00:44:37,850 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 35 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-14 00:44:48,904 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 27 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-14 00:44:50,465 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2397630.0, ans=0.125 2024-08-14 00:44:58,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2397630.0, ans=0.125 2024-08-14 00:45:01,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2397630.0, ans=0.125 2024-08-14 00:45:08,982 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 7900, loss[loss=0.125, beats_loss=0.009705, ecapa_loss=0.0001189, whisper_loss=0.1141, over 18973.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01086, ecapa_loss=0.000159, whisper_loss=0.0913, over 3883101.97 frames. ], batch size: 69, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:45:13,139 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.138e+00 2024-08-14 00:45:34,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2397830.0, ans=0.1 2024-08-14 00:45:40,417 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2397930.0, ans=0.125 2024-08-14 00:46:11,920 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=2398130.0, ans=22.5 2024-08-14 00:46:19,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2398130.0, ans=0.0 2024-08-14 00:46:24,429 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.33 vs. limit=15.0 2024-08-14 00:46:27,245 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 7950, loss[loss=0.08048, beats_loss=0.01192, ecapa_loss=0.0001743, whisper_loss=0.06681, over 21408.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01095, ecapa_loss=0.0001586, whisper_loss=0.09087, over 3897375.25 frames. ], batch size: 89, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:46:43,401 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.52 vs. limit=10.0 2024-08-14 00:46:45,500 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 20 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-14 00:47:06,102 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.004e+01 2.380e+01 2.671e+01 3.071e+01 4.593e+01, threshold=5.341e+01, percent-clipped=0.0 2024-08-14 00:47:07,957 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 22 from LS+wenet, 24 from Vox, 16 fro AS 2024-08-14 00:47:11,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2398530.0, ans=0.125 2024-08-14 00:47:14,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2398530.0, ans=0.125 2024-08-14 00:47:28,598 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2398630.0, ans=0.125 2024-08-14 00:47:41,107 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 8000, loss[loss=0.1188, beats_loss=0.009778, ecapa_loss=0.0001385, whisper_loss=0.1077, over 23038.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01098, ecapa_loss=0.0001579, whisper_loss=0.09097, over 3891621.58 frames. ], batch size: 90, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:47:43,388 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 33 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-14 00:47:48,852 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2024-08-14 00:47:52,029 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-14 00:48:00,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2398830.0, ans=0.0 2024-08-14 00:48:01,200 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 27 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-14 00:48:13,515 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 29 from LS+wenet, 32 from Vox, 33 fro AS 2024-08-14 00:48:48,192 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-14 00:48:51,544 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.20 vs. limit=15.0 2024-08-14 00:48:57,345 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 8050, loss[loss=0.06341, beats_loss=0.01405, ecapa_loss=0.0001547, whisper_loss=0.04781, over 17025.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01086, ecapa_loss=0.0001591, whisper_loss=0.09159, over 3893082.48 frames. ], batch size: 73, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:49:03,052 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 32 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-14 00:49:14,778 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.19 vs. limit=15.0 2024-08-14 00:49:21,796 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.36 vs. limit=10.0 2024-08-14 00:49:33,873 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.792e+01 2.422e+01 2.734e+01 3.214e+01 1.918e+02, threshold=5.469e+01, percent-clipped=2.0 2024-08-14 00:49:37,096 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.17 vs. limit=22.5 2024-08-14 00:49:49,043 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 27 from LS+wenet, 18 from Vox, 49 fro AS 2024-08-14 00:50:02,685 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2399630.0, ans=0.05 2024-08-14 00:50:10,129 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 8100, loss[loss=0.09717, beats_loss=0.01062, ecapa_loss=0.0001851, whisper_loss=0.0847, over 20137.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01084, ecapa_loss=0.0001591, whisper_loss=0.09149, over 3899212.58 frames. ], batch size: 84, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:50:17,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2399730.0, ans=0.0 2024-08-14 00:50:19,342 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.033e+01 2024-08-14 00:50:34,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2399830.0, ans=0.0 2024-08-14 00:50:41,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2399930.0, ans=0.09899494936611666 2024-08-14 00:50:49,758 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.21 vs. limit=15.0 2024-08-14 00:50:51,796 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-240000.pt 2024-08-14 00:51:16,924 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 00:51:16,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2400130.0, ans=0.125 2024-08-14 00:51:21,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2400130.0, ans=0.0 2024-08-14 00:51:32,795 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 8150, loss[loss=0.1014, beats_loss=0.01259, ecapa_loss=0.0001381, whisper_loss=0.0874, over 22222.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01083, ecapa_loss=0.0001592, whisper_loss=0.09181, over 3900802.82 frames. ], batch size: 90, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:51:40,262 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2400230.0, ans=0.0 2024-08-14 00:52:12,008 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 33 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-14 00:52:13,550 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.374e+01 2.607e+01 2.976e+01 8.538e+01, threshold=5.213e+01, percent-clipped=1.0 2024-08-14 00:52:25,781 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-14 00:52:29,243 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.41 vs. limit=22.5 2024-08-14 00:52:39,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2400630.0, ans=0.0 2024-08-14 00:52:49,241 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 8200, loss[loss=0.09553, beats_loss=0.01174, ecapa_loss=0.0001813, whisper_loss=0.08197, over 20362.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01082, ecapa_loss=0.0001602, whisper_loss=0.09103, over 3893785.86 frames. ], batch size: 87, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:52:53,840 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2400730.0, ans=0.1 2024-08-14 00:53:05,700 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 17 from LS+wenet, 18 from Vox, 52 fro AS 2024-08-14 00:53:05,985 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2400830.0, ans=0.2 2024-08-14 00:53:33,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2401030.0, ans=0.125 2024-08-14 00:53:44,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2401030.0, ans=0.125 2024-08-14 00:54:02,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2401130.0, ans=0.125 2024-08-14 00:54:03,250 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 29 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-14 00:54:06,653 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 8250, loss[loss=0.1098, beats_loss=0.009247, ecapa_loss=0.0001783, whisper_loss=0.09877, over 19814.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01079, ecapa_loss=0.0001606, whisper_loss=0.09113, over 3885696.45 frames. ], batch size: 79, lr: 3.68e-03, grad_scale: 1.152921504606847e+18 2024-08-14 00:54:18,822 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2401230.0, ans=0.125 2024-08-14 00:54:23,285 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 25 from LS+wenet, 13 from Vox, 48 fro AS 2024-08-14 00:54:24,821 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-14 00:54:32,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2401330.0, ans=0.1 2024-08-14 00:54:46,421 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.920e+01 2.415e+01 2.692e+01 3.047e+01 4.213e+01, threshold=5.383e+01, percent-clipped=0.0 2024-08-14 00:54:46,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2401430.0, ans=0.0 2024-08-14 00:54:58,028 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 12 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-14 00:55:26,675 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 8300, loss[loss=0.1013, beats_loss=0.01105, ecapa_loss=0.0001438, whisper_loss=0.08882, over 14067.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01081, ecapa_loss=0.000159, whisper_loss=0.09077, over 3894005.53 frames. ], batch size: 55, lr: 3.68e-03, grad_scale: 1.152921504606847e+18 2024-08-14 00:55:28,037 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-14 00:55:30,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2401730.0, ans=0.125 2024-08-14 00:55:33,711 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2401730.0, ans=0.125 2024-08-14 00:55:33,723 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=2401730.0, ans=0.5 2024-08-14 00:55:45,244 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2401830.0, ans=0.2 2024-08-14 00:55:53,894 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2401830.0, ans=0.0 2024-08-14 00:55:58,492 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2401930.0, ans=0.0 2024-08-14 00:56:07,803 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.70 vs. limit=15.0 2024-08-14 00:56:23,922 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2402030.0, ans=0.2 2024-08-14 00:56:25,593 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2402030.0, ans=0.125 2024-08-14 00:56:31,681 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.23 vs. limit=15.0 2024-08-14 00:56:43,673 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 27 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-14 00:56:50,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2402230.0, ans=0.125 2024-08-14 00:56:51,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2402230.0, ans=0.04949747468305833 2024-08-14 00:56:52,026 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 8350, loss[loss=0.1019, beats_loss=0.009607, ecapa_loss=0.0002146, whisper_loss=0.09018, over 16793.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01079, ecapa_loss=0.000159, whisper_loss=0.09115, over 3900363.20 frames. ], batch size: 69, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 00:57:12,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2402330.0, ans=0.125 2024-08-14 00:57:36,821 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.282e+01 2.635e+01 3.067e+01 5.691e+01, threshold=5.270e+01, percent-clipped=1.0 2024-08-14 00:57:39,074 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-14 00:57:52,002 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2402530.0, ans=0.0 2024-08-14 00:57:58,639 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2402530.0, ans=0.125 2024-08-14 00:58:11,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2402630.0, ans=0.0 2024-08-14 00:58:18,325 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 8400, loss[loss=0.1139, beats_loss=0.008744, ecapa_loss=0.0001594, whisper_loss=0.1036, over 18753.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.0107, ecapa_loss=0.0001597, whisper_loss=0.0919, over 3911033.23 frames. ], batch size: 71, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 00:58:20,799 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 23 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-14 00:58:29,447 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2402730.0, ans=0.125 2024-08-14 00:58:48,651 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 00:58:51,857 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2402930.0, ans=0.1 2024-08-14 00:59:42,668 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 8450, loss[loss=0.08932, beats_loss=0.0123, ecapa_loss=0.0001357, whisper_loss=0.07566, over 21083.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01072, ecapa_loss=0.0001602, whisper_loss=0.09115, over 3857400.33 frames. ], batch size: 84, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 00:59:44,956 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2403230.0, ans=0.5 2024-08-14 00:59:57,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2403230.0, ans=0.0 2024-08-14 01:00:00,154 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 18 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-14 01:00:08,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2403330.0, ans=0.125 2024-08-14 01:00:26,976 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.360e+01 2.603e+01 2.918e+01 4.445e+01, threshold=5.206e+01, percent-clipped=0.0 2024-08-14 01:00:31,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2403430.0, ans=10.0 2024-08-14 01:00:41,451 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 17 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-14 01:00:59,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2403630.0, ans=0.125 2024-08-14 01:01:05,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2403630.0, ans=0.1 2024-08-14 01:01:08,679 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 8500, loss[loss=0.1088, beats_loss=0.009872, ecapa_loss=0.0001563, whisper_loss=0.09732, over 21046.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01072, ecapa_loss=0.0001601, whisper_loss=0.09109, over 3871441.93 frames. ], batch size: 84, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:01:09,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2403730.0, ans=0.0 2024-08-14 01:01:15,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2403730.0, ans=0.125 2024-08-14 01:02:28,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2404130.0, ans=0.0 2024-08-14 01:02:33,768 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 8550, loss[loss=0.1077, beats_loss=0.01085, ecapa_loss=0.0001515, whisper_loss=0.09536, over 23202.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01066, ecapa_loss=0.0001598, whisper_loss=0.09221, over 3906706.42 frames. ], batch size: 91, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:02:52,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=2404330.0, ans=0.05 2024-08-14 01:02:53,382 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 31 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-14 01:02:55,767 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 22 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-14 01:03:04,543 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 28 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-14 01:03:18,051 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.347e+01 2.626e+01 2.928e+01 4.701e+01, threshold=5.252e+01, percent-clipped=0.0 2024-08-14 01:03:22,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2404430.0, ans=0.0 2024-08-14 01:03:23,732 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 28 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-14 01:03:27,131 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.89 vs. limit=15.0 2024-08-14 01:03:49,359 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2404630.0, ans=0.2 2024-08-14 01:03:49,677 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.86 vs. limit=22.5 2024-08-14 01:03:51,134 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2404630.0, ans=0.2 2024-08-14 01:04:02,839 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 8600, loss[loss=0.08794, beats_loss=0.01396, ecapa_loss=0.0001477, whisper_loss=0.0725, over 15918.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01073, ecapa_loss=0.0001601, whisper_loss=0.09186, over 3869523.64 frames. ], batch size: 67, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:04:26,639 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 36 from LS+wenet, 34 from Vox, 21 fro AS 2024-08-14 01:04:34,307 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 22 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-14 01:04:59,710 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.36 vs. limit=22.5 2024-08-14 01:05:08,473 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.34 vs. limit=6.0 2024-08-14 01:05:29,647 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 8650, loss[loss=0.07343, beats_loss=0.01492, ecapa_loss=0.0001551, whisper_loss=0.05696, over 20831.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01071, ecapa_loss=0.0001612, whisper_loss=0.09178, over 3871784.81 frames. ], batch size: 88, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:05:35,612 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2405230.0, ans=0.5 2024-08-14 01:06:08,492 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.01 vs. limit=22.5 2024-08-14 01:06:15,013 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.912e+01 2.307e+01 2.549e+01 2.918e+01 2.030e+02, threshold=5.098e+01, percent-clipped=1.0 2024-08-14 01:06:23,903 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 24 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-14 01:06:25,330 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 24 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-14 01:06:26,516 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-14 01:06:36,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2405530.0, ans=0.0 2024-08-14 01:06:41,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2405630.0, ans=0.0 2024-08-14 01:06:58,174 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 8700, loss[loss=0.1101, beats_loss=0.01003, ecapa_loss=0.0001493, whisper_loss=0.09858, over 15220.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01072, ecapa_loss=0.0001607, whisper_loss=0.09166, over 3880773.71 frames. ], batch size: 60, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:07:25,071 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2405830.0, ans=0.0 2024-08-14 01:07:27,740 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.34 vs. limit=6.0 2024-08-14 01:07:58,928 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 26 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-14 01:08:22,226 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 8750, loss[loss=0.1028, beats_loss=0.01053, ecapa_loss=0.0001347, whisper_loss=0.09096, over 16493.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01069, ecapa_loss=0.0001603, whisper_loss=0.0919, over 3865747.37 frames. ], batch size: 62, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:08:58,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2406330.0, ans=0.015 2024-08-14 01:08:59,493 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-14 01:09:06,146 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.19 vs. limit=6.0 2024-08-14 01:09:12,051 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 14 from Vox, 44 fro AS 2024-08-14 01:09:14,443 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.356e+01 2.644e+01 3.033e+01 3.229e+02, threshold=5.288e+01, percent-clipped=1.0 2024-08-14 01:09:28,967 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.49 vs. limit=10.0 2024-08-14 01:09:35,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2406530.0, ans=0.125 2024-08-14 01:09:44,911 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2406630.0, ans=0.0 2024-08-14 01:09:59,016 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 22 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-14 01:10:05,274 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 8800, loss[loss=0.0925, beats_loss=0.01264, ecapa_loss=0.0001844, whisper_loss=0.07802, over 21751.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01084, ecapa_loss=0.0001606, whisper_loss=0.09087, over 3890576.72 frames. ], batch size: 92, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:10:05,617 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2406730.0, ans=0.1 2024-08-14 01:10:11,723 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.48 vs. limit=10.0 2024-08-14 01:10:20,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2406830.0, ans=0.2 2024-08-14 01:10:22,495 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2406830.0, ans=0.125 2024-08-14 01:10:29,588 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2406830.0, ans=0.125 2024-08-14 01:10:38,441 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 37 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-14 01:10:50,536 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.41 vs. limit=22.5 2024-08-14 01:11:52,624 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 8850, loss[loss=0.0982, beats_loss=0.009152, ecapa_loss=0.0001912, whisper_loss=0.08714, over 18766.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01088, ecapa_loss=0.0001596, whisper_loss=0.09051, over 3876005.81 frames. ], batch size: 79, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:12:33,812 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=2407330.0, ans=0.025 2024-08-14 01:12:40,461 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 01:12:44,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2407430.0, ans=0.0 2024-08-14 01:12:53,732 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.900e+01 2.365e+01 2.669e+01 3.063e+01 4.484e+01, threshold=5.339e+01, percent-clipped=0.0 2024-08-14 01:12:54,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2407430.0, ans=0.2 2024-08-14 01:13:03,013 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-14 01:13:05,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2407530.0, ans=0.125 2024-08-14 01:13:05,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2407530.0, ans=0.125 2024-08-14 01:13:14,435 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2407530.0, ans=0.125 2024-08-14 01:13:23,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2407630.0, ans=0.0 2024-08-14 01:13:30,357 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 23 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-14 01:13:45,484 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 8900, loss[loss=0.0938, beats_loss=0.01196, ecapa_loss=0.0001841, whisper_loss=0.08, over 20513.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01094, ecapa_loss=0.0001588, whisper_loss=0.09023, over 3870877.56 frames. ], batch size: 87, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:13:46,816 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.41 vs. limit=15.0 2024-08-14 01:14:25,995 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 15 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-14 01:14:32,286 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 15 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-14 01:14:34,476 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.10 vs. limit=15.0 2024-08-14 01:14:42,483 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 23 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-14 01:15:07,406 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2408130.0, ans=0.2 2024-08-14 01:15:14,881 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2408130.0, ans=0.125 2024-08-14 01:15:17,997 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2408130.0, ans=0.1 2024-08-14 01:15:26,558 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 8950, loss[loss=0.1091, beats_loss=0.01066, ecapa_loss=0.0001763, whisper_loss=0.09666, over 22154.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0109, ecapa_loss=0.0001587, whisper_loss=0.09082, over 3886450.75 frames. ], batch size: 90, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:15:39,615 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2408230.0, ans=0.125 2024-08-14 01:15:46,188 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 35 from LS+wenet, 31 from Vox, 26 fro AS 2024-08-14 01:15:48,596 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 20 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-14 01:15:58,788 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.51 vs. limit=15.0 2024-08-14 01:16:10,062 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 25 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-14 01:16:15,408 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.300e+01 2.488e+01 2.810e+01 4.417e+01, threshold=4.975e+01, percent-clipped=0.0 2024-08-14 01:16:30,838 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.21 vs. limit=10.0 2024-08-14 01:16:42,633 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 24 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-14 01:16:50,479 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 9000, loss[loss=0.1105, beats_loss=0.008668, ecapa_loss=0.0002241, whisper_loss=0.09961, over 14814.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01088, ecapa_loss=0.0001596, whisper_loss=0.09057, over 3875625.93 frames. ], batch size: 60, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:16:50,481 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-14 01:17:32,623 INFO [train_multi_KD3.py:1149] (0/4) Epoch 17, validation on ASR_libri: loss=0.2537, beats_loss=0, ecapa_loss=0.0005618, whisper_loss=0.2481, over 922467.00 frames. 2024-08-14 01:17:50,462 INFO [train_multi_KD3.py:1149] (0/4) Epoch 17, validation on SV_voxceleb1: loss=0.004363, beats_loss=0, ecapa_loss=0.0004363, whisper_loss=0, over 939242.00 frames. 2024-08-14 01:20:00,227 INFO [train_multi_KD3.py:1149] (0/4) Epoch 17, validation on AT_audioset: loss=0.02365, beats_loss=0.02365, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 01:20:00,231 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32601MB 2024-08-14 01:20:10,056 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-14 01:20:14,033 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-14 01:20:27,404 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 16 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-14 01:20:36,963 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 25 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-14 01:20:42,051 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2409030.0, ans=0.125 2024-08-14 01:20:42,098 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2409030.0, ans=0.0 2024-08-14 01:20:53,933 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 16 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 01:20:55,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2409030.0, ans=0.125 2024-08-14 01:20:58,374 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 26 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-14 01:21:13,270 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 9050, loss[loss=0.1148, beats_loss=0.01026, ecapa_loss=0.0001529, whisper_loss=0.103, over 20382.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0108, ecapa_loss=0.0001593, whisper_loss=0.09123, over 3846517.25 frames. ], batch size: 81, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:21:15,648 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.02 vs. limit=12.0 2024-08-14 01:21:26,700 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2409330.0, ans=0.0 2024-08-14 01:21:48,060 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-14 01:21:52,855 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.446e+01 2.670e+01 2.988e+01 4.436e+01, threshold=5.340e+01, percent-clipped=0.0 2024-08-14 01:22:15,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2409630.0, ans=0.125 2024-08-14 01:22:28,951 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 9100, loss[loss=0.1158, beats_loss=0.01016, ecapa_loss=0.0001589, whisper_loss=0.1041, over 22241.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01071, ecapa_loss=0.0001606, whisper_loss=0.09222, over 3861724.35 frames. ], batch size: 89, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:22:41,360 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-14 01:22:43,529 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2409830.0, ans=0.125 2024-08-14 01:22:45,740 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.30 vs. limit=6.0 2024-08-14 01:22:54,981 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2409830.0, ans=0.125 2024-08-14 01:23:03,407 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2409930.0, ans=0.0 2024-08-14 01:23:04,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2409930.0, ans=0.0 2024-08-14 01:23:16,893 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.17 vs. limit=15.0 2024-08-14 01:23:31,925 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2410130.0, ans=0.09899494936611666 2024-08-14 01:23:46,983 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2410230.0, ans=0.125 2024-08-14 01:23:48,022 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 9150, loss[loss=0.1175, beats_loss=0.008024, ecapa_loss=0.0002239, whisper_loss=0.1073, over 21423.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01076, ecapa_loss=0.0001599, whisper_loss=0.09209, over 3900911.81 frames. ], batch size: 88, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:23:59,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2410230.0, ans=0.1 2024-08-14 01:24:09,011 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 21 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-14 01:24:12,323 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 26 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-14 01:24:12,530 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2410330.0, ans=0.125 2024-08-14 01:24:29,314 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.433e+01 2.654e+01 2.886e+01 8.462e+01, threshold=5.308e+01, percent-clipped=1.0 2024-08-14 01:24:49,505 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2410530.0, ans=0.1 2024-08-14 01:24:58,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2410630.0, ans=0.035 2024-08-14 01:25:01,112 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 18 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-14 01:25:03,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2410630.0, ans=0.2 2024-08-14 01:25:08,573 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 9200, loss[loss=0.1095, beats_loss=0.01053, ecapa_loss=0.0001693, whisper_loss=0.0973, over 22341.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01084, ecapa_loss=0.0001593, whisper_loss=0.09208, over 3916270.39 frames. ], batch size: 90, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:25:11,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2410730.0, ans=0.0 2024-08-14 01:25:22,161 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 25 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-14 01:25:30,583 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 22 from LS+wenet, 15 from Vox, 17 fro AS 2024-08-14 01:25:31,929 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-14 01:25:44,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2410930.0, ans=0.0 2024-08-14 01:25:52,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2410930.0, ans=0.0 2024-08-14 01:25:56,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2411030.0, ans=0.125 2024-08-14 01:26:01,199 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 15 from LS+wenet, 25 from Vox, 18 fro AS 2024-08-14 01:26:04,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2411030.0, ans=0.0 2024-08-14 01:26:06,500 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2411030.0, ans=0.1 2024-08-14 01:26:26,833 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2411130.0, ans=0.125 2024-08-14 01:26:30,181 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 9250, loss[loss=0.09457, beats_loss=0.01152, ecapa_loss=0.0001704, whisper_loss=0.08135, over 16054.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01078, ecapa_loss=0.0001605, whisper_loss=0.09256, over 3923207.88 frames. ], batch size: 66, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:26:53,388 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-14 01:27:07,593 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 39 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-14 01:27:09,706 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.31 vs. limit=15.0 2024-08-14 01:27:10,511 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.850e+01 2.291e+01 2.608e+01 2.884e+01 5.366e+01, threshold=5.217e+01, percent-clipped=1.0 2024-08-14 01:27:19,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2411530.0, ans=0.0 2024-08-14 01:27:20,341 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-14 01:27:22,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2411530.0, ans=0.125 2024-08-14 01:27:28,068 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 29 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-14 01:27:33,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2411630.0, ans=0.125 2024-08-14 01:27:43,677 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 19 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-14 01:27:49,537 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 9300, loss[loss=0.08757, beats_loss=0.01252, ecapa_loss=0.0001674, whisper_loss=0.07338, over 21893.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01089, ecapa_loss=0.0001604, whisper_loss=0.09133, over 3911456.98 frames. ], batch size: 90, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:28:05,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2411830.0, ans=0.2 2024-08-14 01:28:12,053 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-14 01:28:18,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=2411830.0, ans=15.0 2024-08-14 01:28:22,632 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 38 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-14 01:28:25,664 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 20 from LS+wenet, 37 from Vox, 38 fro AS 2024-08-14 01:28:39,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2412030.0, ans=0.125 2024-08-14 01:28:50,910 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.73 vs. limit=15.0 2024-08-14 01:28:51,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2412130.0, ans=0.0 2024-08-14 01:28:51,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2412130.0, ans=0.0 2024-08-14 01:29:07,614 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 9350, loss[loss=0.107, beats_loss=0.01064, ecapa_loss=0.0001574, whisper_loss=0.0948, over 22960.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01088, ecapa_loss=0.0001609, whisper_loss=0.09095, over 3890684.47 frames. ], batch size: 90, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:29:46,901 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.42 vs. limit=15.0 2024-08-14 01:29:47,733 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.674e+01 2.279e+01 2.558e+01 2.915e+01 7.467e+01, threshold=5.116e+01, percent-clipped=2.0 2024-08-14 01:30:13,350 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 30 from LS+wenet, 28 from Vox, 25 fro AS 2024-08-14 01:30:26,377 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 9400, loss[loss=0.09547, beats_loss=0.01073, ecapa_loss=0.0001597, whisper_loss=0.08314, over 19064.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01092, ecapa_loss=0.0001609, whisper_loss=0.08997, over 3867384.80 frames. ], batch size: 78, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:30:30,656 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.45 vs. limit=22.5 2024-08-14 01:30:38,295 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 22 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-14 01:30:41,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2412830.0, ans=0.125 2024-08-14 01:30:55,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2412830.0, ans=0.125 2024-08-14 01:31:07,581 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2412930.0, ans=0.2 2024-08-14 01:31:35,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2413130.0, ans=0.0 2024-08-14 01:31:47,758 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 21 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-14 01:31:48,950 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 9450, loss[loss=0.08586, beats_loss=0.01303, ecapa_loss=0.0001889, whisper_loss=0.07094, over 20481.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01091, ecapa_loss=0.0001619, whisper_loss=0.08885, over 3874063.46 frames. ], batch size: 87, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:31:50,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2413230.0, ans=0.1 2024-08-14 01:31:55,745 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2413230.0, ans=0.125 2024-08-14 01:32:07,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2413330.0, ans=0.1 2024-08-14 01:32:18,206 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 22 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-14 01:32:25,176 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.65 vs. limit=15.0 2024-08-14 01:32:35,008 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.995e+01 2.507e+01 2.797e+01 3.259e+01 9.131e+01, threshold=5.593e+01, percent-clipped=2.0 2024-08-14 01:33:00,456 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-14 01:33:00,917 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.50 vs. limit=22.5 2024-08-14 01:33:16,755 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 9500, loss[loss=0.1176, beats_loss=0.01128, ecapa_loss=0.0001707, whisper_loss=0.1047, over 19531.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.011, ecapa_loss=0.000161, whisper_loss=0.08849, over 3896833.61 frames. ], batch size: 77, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:33:29,600 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 18 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-14 01:33:50,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2413930.0, ans=0.0 2024-08-14 01:33:50,269 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.87 vs. limit=6.0 2024-08-14 01:33:53,060 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2413930.0, ans=0.0 2024-08-14 01:34:19,888 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 23 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-14 01:34:36,778 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 9550, loss[loss=0.1158, beats_loss=0.01064, ecapa_loss=0.0001352, whisper_loss=0.1038, over 21324.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01093, ecapa_loss=0.0001624, whisper_loss=0.08824, over 3871069.98 frames. ], batch size: 84, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:34:49,032 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 19 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-14 01:34:52,685 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 23 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-14 01:35:00,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2414330.0, ans=0.0 2024-08-14 01:35:03,873 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 26 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-14 01:35:11,739 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.93 vs. limit=15.0 2024-08-14 01:35:13,458 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2414430.0, ans=0.125 2024-08-14 01:35:17,558 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.838e+01 2.395e+01 2.666e+01 3.161e+01 6.328e+01, threshold=5.331e+01, percent-clipped=1.0 2024-08-14 01:35:20,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=2414430.0, ans=6.0 2024-08-14 01:35:32,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2414530.0, ans=0.2 2024-08-14 01:35:47,719 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=9.760e-02 2024-08-14 01:35:52,909 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.82 vs. limit=10.0 2024-08-14 01:35:57,724 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 9600, loss[loss=0.1174, beats_loss=0.0101, ecapa_loss=0.0001129, whisper_loss=0.1061, over 19485.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01081, ecapa_loss=0.0001627, whisper_loss=0.08878, over 3826591.58 frames. ], batch size: 71, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:35:59,547 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 30 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-14 01:36:10,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2414730.0, ans=0.125 2024-08-14 01:36:13,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2414830.0, ans=0.125 2024-08-14 01:36:35,081 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.33 vs. limit=15.0 2024-08-14 01:36:38,655 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-14 01:36:54,526 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 14 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-14 01:37:21,940 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.52 vs. limit=22.5 2024-08-14 01:37:23,190 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 20 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-14 01:37:26,353 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 9650, loss[loss=0.107, beats_loss=0.01124, ecapa_loss=0.0001525, whisper_loss=0.09424, over 22223.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0107, ecapa_loss=0.0001624, whisper_loss=0.08991, over 3813252.14 frames. ], batch size: 89, lr: 3.66e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:37:53,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2415330.0, ans=0.0 2024-08-14 01:38:09,241 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.345e+01 2.616e+01 2.966e+01 4.263e+01, threshold=5.231e+01, percent-clipped=0.0 2024-08-14 01:38:11,293 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.69 vs. limit=22.5 2024-08-14 01:38:14,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2415530.0, ans=0.0 2024-08-14 01:38:26,660 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 26 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-14 01:38:38,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2415630.0, ans=0.125 2024-08-14 01:38:45,356 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 15 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-14 01:38:49,108 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 9700, loss[loss=0.101, beats_loss=0.01076, ecapa_loss=0.0001631, whisper_loss=0.08857, over 19501.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01065, ecapa_loss=0.0001634, whisper_loss=0.09015, over 3816116.13 frames. ], batch size: 78, lr: 3.66e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:38:52,387 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 31 from Vox, 37 fro AS 2024-08-14 01:39:04,773 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.55 vs. limit=15.0 2024-08-14 01:39:30,870 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.23 vs. limit=15.0 2024-08-14 01:39:37,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2416030.0, ans=0.0 2024-08-14 01:39:46,386 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.84 vs. limit=15.0 2024-08-14 01:39:47,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2416030.0, ans=0.125 2024-08-14 01:39:50,557 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2416030.0, ans=0.125 2024-08-14 01:40:10,292 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 9750, loss[loss=0.08028, beats_loss=0.01235, ecapa_loss=0.0001329, whisper_loss=0.06661, over 16981.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01074, ecapa_loss=0.0001615, whisper_loss=0.08922, over 3816800.22 frames. ], batch size: 66, lr: 3.66e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:40:10,476 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 19 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-14 01:40:14,135 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 14 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-14 01:40:19,000 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 18 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-14 01:40:27,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2416330.0, ans=0.035 2024-08-14 01:40:36,425 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2416330.0, ans=0.125 2024-08-14 01:40:41,630 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.92 vs. limit=6.0 2024-08-14 01:40:42,747 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.51 vs. limit=22.5 2024-08-14 01:40:47,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2416430.0, ans=0.0 2024-08-14 01:40:51,020 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.366e+01 2.693e+01 3.078e+01 7.887e+01, threshold=5.385e+01, percent-clipped=1.0 2024-08-14 01:40:59,117 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2416530.0, ans=0.2 2024-08-14 01:41:10,298 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 37 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-14 01:41:11,569 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-14 01:41:13,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2416630.0, ans=0.0 2024-08-14 01:41:26,872 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 9800, loss[loss=0.0955, beats_loss=0.009501, ecapa_loss=0.0001924, whisper_loss=0.08408, over 16750.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01068, ecapa_loss=0.0001607, whisper_loss=0.09038, over 3826290.85 frames. ], batch size: 70, lr: 3.66e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:41:27,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2416730.0, ans=0.1 2024-08-14 01:41:55,390 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 24 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-14 01:41:57,452 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2416930.0, ans=0.2 2024-08-14 01:42:02,139 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 29 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-14 01:42:22,633 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.76 vs. limit=15.0 2024-08-14 01:42:23,567 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 27 from LS+wenet, 16 from Vox, 13 fro AS 2024-08-14 01:42:26,607 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2417130.0, ans=0.0 2024-08-14 01:42:36,356 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2417230.0, ans=0.125 2024-08-14 01:42:36,774 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.30 vs. limit=22.5 2024-08-14 01:42:37,521 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 9850, loss[loss=0.09797, beats_loss=0.01142, ecapa_loss=0.000127, whisper_loss=0.08528, over 21961.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01081, ecapa_loss=0.0001592, whisper_loss=0.09018, over 3849289.73 frames. ], batch size: 88, lr: 3.66e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:42:49,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2417330.0, ans=0.125 2024-08-14 01:42:53,952 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.31 vs. limit=15.0 2024-08-14 01:42:57,177 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 23 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-14 01:43:11,529 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.716e+01 2.319e+01 2.530e+01 2.883e+01 5.906e+01, threshold=5.059e+01, percent-clipped=1.0 2024-08-14 01:43:12,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2417430.0, ans=0.1 2024-08-14 01:43:20,248 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.97 vs. limit=15.0 2024-08-14 01:43:22,637 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 27 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-14 01:43:25,657 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 32 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-14 01:43:31,658 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.69 vs. limit=15.0 2024-08-14 01:43:40,624 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 30 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-14 01:43:42,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2417630.0, ans=0.125 2024-08-14 01:43:44,592 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 9900, loss[loss=0.1022, beats_loss=0.01253, ecapa_loss=0.0001351, whisper_loss=0.08827, over 16807.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0108, ecapa_loss=0.000159, whisper_loss=0.09136, over 3885683.88 frames. ], batch size: 71, lr: 3.66e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:43:44,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2417730.0, ans=0.125 2024-08-14 01:43:49,483 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.14 vs. limit=15.0 2024-08-14 01:43:55,291 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 22 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-14 01:44:01,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2417830.0, ans=0.1 2024-08-14 01:44:15,943 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2417930.0, ans=0.0 2024-08-14 01:44:20,445 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.38 vs. limit=15.0 2024-08-14 01:44:31,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2418030.0, ans=0.1 2024-08-14 01:44:40,908 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.10 vs. limit=10.0 2024-08-14 01:44:52,651 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 9950, loss[loss=0.1172, beats_loss=0.008888, ecapa_loss=0.0001862, whisper_loss=0.1065, over 21654.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01077, ecapa_loss=0.0001611, whisper_loss=0.09164, over 3893352.19 frames. ], batch size: 87, lr: 3.66e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:45:01,360 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.71 vs. limit=15.0 2024-08-14 01:45:07,174 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 16 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-14 01:45:24,589 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-14 01:45:26,879 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.958e+01 2.412e+01 2.652e+01 3.138e+01 4.371e+01, threshold=5.303e+01, percent-clipped=0.0 2024-08-14 01:45:30,414 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.57 vs. limit=15.0 2024-08-14 01:45:32,790 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=2418530.0, ans=15.0 2024-08-14 01:45:35,429 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2418530.0, ans=0.125 2024-08-14 01:45:59,374 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.16 vs. limit=22.5 2024-08-14 01:45:59,857 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 10000, loss[loss=0.07918, beats_loss=0.009233, ecapa_loss=0.0001745, whisper_loss=0.0682, over 18578.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01078, ecapa_loss=0.0001619, whisper_loss=0.09104, over 3865722.21 frames. ], batch size: 73, lr: 3.66e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:46:07,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2418730.0, ans=0.1 2024-08-14 01:46:07,297 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.whiten.whitening_limit, batch_count=2418730.0, ans=12.0 2024-08-14 01:46:16,168 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 26 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-14 01:46:24,048 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-14 01:46:27,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2418930.0, ans=0.0 2024-08-14 01:46:38,626 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 19 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-14 01:46:41,444 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 25 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-14 01:46:51,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2419130.0, ans=0.125 2024-08-14 01:46:52,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2419130.0, ans=0.0 2024-08-14 01:46:56,016 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2419130.0, ans=0.1 2024-08-14 01:47:06,592 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 10050, loss[loss=0.1035, beats_loss=0.01053, ecapa_loss=0.0001263, whisper_loss=0.0917, over 16007.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0108, ecapa_loss=0.0001604, whisper_loss=0.09071, over 3846467.25 frames. ], batch size: 60, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 01:47:07,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2419230.0, ans=0.125 2024-08-14 01:47:12,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2419230.0, ans=0.125 2024-08-14 01:47:17,843 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 15 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-14 01:47:18,167 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 01:47:23,405 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2419330.0, ans=0.1 2024-08-14 01:47:29,770 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=2419330.0, ans=0.05 2024-08-14 01:47:40,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2419430.0, ans=0.0 2024-08-14 01:47:42,745 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.962e+01 2.384e+01 2.686e+01 2.960e+01 2.282e+02, threshold=5.371e+01, percent-clipped=3.0 2024-08-14 01:47:43,596 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.56 vs. limit=15.0 2024-08-14 01:47:47,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2419530.0, ans=0.125 2024-08-14 01:47:51,283 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2419530.0, ans=0.125 2024-08-14 01:47:57,198 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.66 vs. limit=12.0 2024-08-14 01:48:14,341 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 10100, loss[loss=0.1157, beats_loss=0.01048, ecapa_loss=0.0001162, whisper_loss=0.104, over 21437.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01083, ecapa_loss=0.0001599, whisper_loss=0.09084, over 3888201.27 frames. ], batch size: 78, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 01:48:40,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2419930.0, ans=0.125 2024-08-14 01:48:42,087 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2419930.0, ans=0.125 2024-08-14 01:48:48,911 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2419930.0, ans=0.125 2024-08-14 01:48:54,410 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.00 vs. limit=15.0 2024-08-14 01:48:55,458 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2420030.0, ans=0.125 2024-08-14 01:49:00,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2420030.0, ans=0.0 2024-08-14 01:49:22,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2420130.0, ans=0.0 2024-08-14 01:49:24,088 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 10150, loss[loss=0.1148, beats_loss=0.01122, ecapa_loss=0.0001736, whisper_loss=0.1019, over 22146.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01083, ecapa_loss=0.0001602, whisper_loss=0.09078, over 3906798.50 frames. ], batch size: 91, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 01:49:31,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2420230.0, ans=0.0 2024-08-14 01:49:32,542 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 27 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-14 01:49:53,604 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 28 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-14 01:50:00,595 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 15 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-14 01:50:05,426 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.404e+01 2.645e+01 2.951e+01 4.259e+01, threshold=5.291e+01, percent-clipped=0.0 2024-08-14 01:50:12,709 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 23 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-14 01:50:17,332 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 17 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-14 01:50:19,634 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.56 vs. limit=15.0 2024-08-14 01:50:20,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2420530.0, ans=0.0 2024-08-14 01:50:37,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2420630.0, ans=0.125 2024-08-14 01:50:38,984 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 28 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-14 01:50:43,668 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 10200, loss[loss=0.1157, beats_loss=0.0098, ecapa_loss=0.0001734, whisper_loss=0.1042, over 22761.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0108, ecapa_loss=0.0001605, whisper_loss=0.09103, over 3908591.61 frames. ], batch size: 92, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 01:50:54,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2420730.0, ans=0.0 2024-08-14 01:51:00,336 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-14 01:51:18,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2420930.0, ans=0.0 2024-08-14 01:51:24,477 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 34 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-14 01:51:41,143 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.26 vs. limit=15.0 2024-08-14 01:51:49,413 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 32 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-14 01:52:06,315 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 30 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-14 01:52:13,239 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 10250, loss[loss=0.09598, beats_loss=0.01067, ecapa_loss=0.0001807, whisper_loss=0.08351, over 21333.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0108, ecapa_loss=0.0001613, whisper_loss=0.09133, over 3961577.42 frames. ], batch size: 87, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 01:52:20,221 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2421230.0, ans=0.2 2024-08-14 01:52:23,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2421230.0, ans=0.0 2024-08-14 01:52:28,795 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.064e-02 2024-08-14 01:53:00,012 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 19 from LS+wenet, 29 from Vox, 44 fro AS 2024-08-14 01:53:01,392 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.900e+01 2.474e+01 2.733e+01 3.124e+01 2.948e+02, threshold=5.467e+01, percent-clipped=2.0 2024-08-14 01:53:05,263 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 22 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-14 01:53:12,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2421530.0, ans=0.0 2024-08-14 01:53:20,304 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.43 vs. limit=22.5 2024-08-14 01:53:26,730 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-14 01:53:32,407 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 27 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-14 01:53:42,552 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 10300, loss[loss=0.1049, beats_loss=0.01119, ecapa_loss=0.0001285, whisper_loss=0.0924, over 18926.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0108, ecapa_loss=0.0001612, whisper_loss=0.0912, over 3938239.41 frames. ], batch size: 76, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 01:53:43,201 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2421730.0, ans=0.125 2024-08-14 01:53:50,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2421730.0, ans=0.1 2024-08-14 01:54:07,276 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 24 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-14 01:54:27,196 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 26 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-14 01:54:41,820 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 23 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-14 01:54:42,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2422030.0, ans=0.0 2024-08-14 01:54:48,980 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2422030.0, ans=0.125 2024-08-14 01:54:54,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2422130.0, ans=0.1 2024-08-14 01:55:01,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=2422130.0, ans=0.025 2024-08-14 01:55:06,884 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 26 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-14 01:55:07,137 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2422130.0, ans=0.2 2024-08-14 01:55:11,350 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 10350, loss[loss=0.07883, beats_loss=0.01034, ecapa_loss=0.0001798, whisper_loss=0.06669, over 14850.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01082, ecapa_loss=0.0001605, whisper_loss=0.09139, over 3954369.89 frames. ], batch size: 62, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 01:55:13,588 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2422230.0, ans=0.0 2024-08-14 01:55:56,405 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.917e+01 2.357e+01 2.601e+01 3.091e+01 4.779e+01, threshold=5.203e+01, percent-clipped=0.0 2024-08-14 01:56:01,289 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2422530.0, ans=0.1 2024-08-14 01:56:03,375 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 23 from LS+wenet, 30 from Vox, 40 fro AS 2024-08-14 01:56:03,545 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2422530.0, ans=0.1 2024-08-14 01:56:04,394 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 18 from LS+wenet, 26 from Vox, 22 fro AS 2024-08-14 01:56:12,504 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 31 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-14 01:56:17,614 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.16 vs. limit=15.0 2024-08-14 01:56:32,531 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 10400, loss[loss=0.09572, beats_loss=0.01049, ecapa_loss=0.0001887, whisper_loss=0.08334, over 13802.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01081, ecapa_loss=0.0001596, whisper_loss=0.09097, over 3919925.25 frames. ], batch size: 57, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 01:57:05,597 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 25 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-14 01:57:37,481 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 31 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-14 01:57:41,600 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 10450, loss[loss=0.1297, beats_loss=0.008733, ecapa_loss=0.0001656, whisper_loss=0.1194, over 16405.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01083, ecapa_loss=0.0001594, whisper_loss=0.09039, over 3891842.77 frames. ], batch size: 65, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 01:57:45,936 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 36 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-14 01:57:54,178 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.09 vs. limit=15.0 2024-08-14 01:57:57,282 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 35 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-14 01:58:01,057 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-14 01:58:01,404 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.49 vs. limit=10.0 2024-08-14 01:58:16,727 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.961e+01 2.463e+01 2.702e+01 3.082e+01 4.541e+01, threshold=5.404e+01, percent-clipped=0.0 2024-08-14 01:58:22,417 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 27 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-14 01:58:29,505 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2423530.0, ans=0.1 2024-08-14 01:58:35,357 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.67 vs. limit=15.0 2024-08-14 01:58:37,828 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.94 vs. limit=15.0 2024-08-14 01:58:41,480 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2423630.0, ans=0.0 2024-08-14 01:58:46,298 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 32 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-14 01:58:46,527 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2423730.0, ans=0.0 2024-08-14 01:58:47,456 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 10500, loss[loss=0.1111, beats_loss=0.01072, ecapa_loss=0.0001722, whisper_loss=0.09866, over 22678.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01081, ecapa_loss=0.0001583, whisper_loss=0.09074, over 3879015.29 frames. ], batch size: 93, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 01:58:56,696 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-14 01:58:58,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2423730.0, ans=0.2 2024-08-14 01:59:24,922 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.87 vs. limit=22.5 2024-08-14 01:59:25,674 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 15 from LS+wenet, 24 from Vox, 18 fro AS 2024-08-14 01:59:25,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2424030.0, ans=0.1 2024-08-14 01:59:25,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2424030.0, ans=0.125 2024-08-14 01:59:26,133 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.05 vs. limit=15.0 2024-08-14 01:59:28,378 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 37 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-14 01:59:32,171 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 25 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-14 01:59:41,309 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 22 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-14 01:59:52,588 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 10550, loss[loss=0.0881, beats_loss=0.01253, ecapa_loss=0.0001527, whisper_loss=0.07404, over 20424.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01083, ecapa_loss=0.0001587, whisper_loss=0.0907, over 3878745.29 frames. ], batch size: 84, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 01:59:58,209 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2424230.0, ans=0.2 2024-08-14 02:00:10,985 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.77 vs. limit=15.0 2024-08-14 02:00:18,223 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 28 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-14 02:00:19,480 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 14 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-14 02:00:27,175 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.316e+01 2.599e+01 2.857e+01 9.329e+01, threshold=5.198e+01, percent-clipped=3.0 2024-08-14 02:00:57,050 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 10600, loss[loss=0.1121, beats_loss=0.01021, ecapa_loss=0.0001757, whisper_loss=0.1001, over 22440.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01079, ecapa_loss=0.00016, whisper_loss=0.09097, over 3883724.80 frames. ], batch size: 89, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:01:03,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2424730.0, ans=0.5 2024-08-14 02:01:38,560 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-14 02:01:41,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2425030.0, ans=0.2 2024-08-14 02:01:44,762 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-14 02:02:01,600 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 10650, loss[loss=0.1079, beats_loss=0.009927, ecapa_loss=0.0001996, whisper_loss=0.09598, over 21061.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01071, ecapa_loss=0.00016, whisper_loss=0.09082, over 3857305.01 frames. ], batch size: 89, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:02:01,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2425230.0, ans=0.1 2024-08-14 02:02:07,275 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2425230.0, ans=0.125 2024-08-14 02:02:20,964 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.67 vs. limit=15.0 2024-08-14 02:02:26,767 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 34 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-14 02:02:37,114 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.012e+01 2.364e+01 2.670e+01 2.895e+01 4.194e+01, threshold=5.340e+01, percent-clipped=0.0 2024-08-14 02:02:46,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2425530.0, ans=0.125 2024-08-14 02:02:55,669 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 25 from LS+wenet, 32 from Vox, 35 fro AS 2024-08-14 02:03:00,832 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 25 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-14 02:03:03,513 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 18 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-14 02:03:07,493 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 10700, loss[loss=0.1159, beats_loss=0.01102, ecapa_loss=0.0001309, whisper_loss=0.1036, over 23133.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01072, ecapa_loss=0.0001597, whisper_loss=0.09119, over 3872952.26 frames. ], batch size: 90, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:03:19,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2425830.0, ans=0.0 2024-08-14 02:03:24,281 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 28 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-14 02:03:44,324 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 35 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-14 02:03:48,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2426030.0, ans=0.1 2024-08-14 02:03:57,727 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 23 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-14 02:04:03,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2426130.0, ans=0.1 2024-08-14 02:04:13,570 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 10750, loss[loss=0.1002, beats_loss=0.012, ecapa_loss=0.0001818, whisper_loss=0.08642, over 20786.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01072, ecapa_loss=0.0001601, whisper_loss=0.09158, over 3898598.42 frames. ], batch size: 89, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:04:13,916 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2426230.0, ans=0.0 2024-08-14 02:04:13,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2426230.0, ans=0.125 2024-08-14 02:04:17,991 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2426230.0, ans=0.125 2024-08-14 02:04:23,963 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-14 02:04:27,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=2426330.0, ans=15.0 2024-08-14 02:04:36,823 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.14 vs. limit=22.5 2024-08-14 02:04:49,428 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.977e+01 2.445e+01 2.667e+01 2.966e+01 4.209e+01, threshold=5.334e+01, percent-clipped=0.0 2024-08-14 02:04:57,374 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 16 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-14 02:05:06,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2426630.0, ans=0.125 2024-08-14 02:05:07,942 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 16 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-14 02:05:15,424 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2426630.0, ans=0.125 2024-08-14 02:05:16,962 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.77 vs. limit=15.0 2024-08-14 02:05:20,539 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 10800, loss[loss=0.08782, beats_loss=0.01155, ecapa_loss=0.0001763, whisper_loss=0.0745, over 16687.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01072, ecapa_loss=0.0001608, whisper_loss=0.09169, over 3910810.91 frames. ], batch size: 69, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:05:27,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2426730.0, ans=0.1 2024-08-14 02:05:46,160 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2426830.0, ans=0.125 2024-08-14 02:05:51,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2426930.0, ans=0.0 2024-08-14 02:05:52,609 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 22 from LS+wenet, 20 from Vox, 16 fro AS 2024-08-14 02:06:02,591 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 28 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-14 02:06:09,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2427030.0, ans=0.1 2024-08-14 02:06:25,431 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-14 02:06:30,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2427130.0, ans=0.1 2024-08-14 02:06:31,443 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2427130.0, ans=0.0 2024-08-14 02:06:39,693 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 10850, loss[loss=0.07394, beats_loss=0.009295, ecapa_loss=0.0001625, whisper_loss=0.06302, over 17414.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01064, ecapa_loss=0.0001596, whisper_loss=0.09232, over 3909413.76 frames. ], batch size: 69, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:06:45,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2427230.0, ans=0.0 2024-08-14 02:06:50,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2427230.0, ans=0.125 2024-08-14 02:07:00,130 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 36 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-14 02:07:06,877 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 21 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-14 02:07:15,426 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 24 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-14 02:07:22,003 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.381e+01 2.677e+01 3.006e+01 4.441e+01, threshold=5.355e+01, percent-clipped=0.0 2024-08-14 02:07:27,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2427530.0, ans=0.2 2024-08-14 02:07:45,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2427630.0, ans=0.125 2024-08-14 02:07:53,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=2427630.0, ans=0.95 2024-08-14 02:07:57,466 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 10900, loss[loss=0.1026, beats_loss=0.01134, ecapa_loss=0.0001576, whisper_loss=0.08964, over 21599.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01065, ecapa_loss=0.000159, whisper_loss=0.09235, over 3914263.00 frames. ], batch size: 87, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:08:04,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=2427730.0, ans=6.0 2024-08-14 02:08:20,252 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.73 vs. limit=12.0 2024-08-14 02:08:44,507 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.83 vs. limit=22.5 2024-08-14 02:08:47,699 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2428030.0, ans=0.0 2024-08-14 02:08:49,307 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.98 vs. limit=22.5 2024-08-14 02:09:09,158 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 10950, loss[loss=0.08468, beats_loss=0.01173, ecapa_loss=0.0001338, whisper_loss=0.07161, over 18040.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01065, ecapa_loss=0.0001585, whisper_loss=0.09232, over 3865453.85 frames. ], batch size: 74, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:09:13,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2428230.0, ans=0.0 2024-08-14 02:09:18,581 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2024-08-14 02:09:18,614 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.16 vs. limit=10.0 2024-08-14 02:09:19,628 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2428230.0, ans=0.1 2024-08-14 02:09:22,217 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.64 vs. limit=22.5 2024-08-14 02:09:26,042 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 15 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-14 02:09:46,056 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.804e+01 2.387e+01 2.678e+01 3.232e+01 4.538e+01, threshold=5.356e+01, percent-clipped=0.0 2024-08-14 02:09:46,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2428430.0, ans=0.125 2024-08-14 02:09:48,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2428430.0, ans=0.125 2024-08-14 02:10:15,338 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.68 vs. limit=15.0 2024-08-14 02:10:16,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2428730.0, ans=0.2 2024-08-14 02:10:17,040 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 11000, loss[loss=0.1008, beats_loss=0.008169, ecapa_loss=0.0002103, whisper_loss=0.09051, over 15723.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01062, ecapa_loss=0.0001596, whisper_loss=0.09254, over 3885803.61 frames. ], batch size: 65, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:10:22,025 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.38 vs. limit=15.0 2024-08-14 02:10:24,003 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 38 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-14 02:10:33,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=2428830.0, ans=15.0 2024-08-14 02:10:34,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2428830.0, ans=0.125 2024-08-14 02:10:40,093 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.34 vs. limit=15.0 2024-08-14 02:10:50,561 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 21 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-14 02:10:56,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2429030.0, ans=0.125 2024-08-14 02:10:58,870 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2429030.0, ans=0.125 2024-08-14 02:11:01,698 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2429030.0, ans=10.0 2024-08-14 02:11:05,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2429030.0, ans=0.125 2024-08-14 02:11:23,100 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 11050, loss[loss=0.1038, beats_loss=0.008498, ecapa_loss=0.0002011, whisper_loss=0.09326, over 15079.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01058, ecapa_loss=0.0001607, whisper_loss=0.09212, over 3897918.53 frames. ], batch size: 62, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:11:23,607 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2429230.0, ans=0.0 2024-08-14 02:11:25,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2429230.0, ans=0.125 2024-08-14 02:11:30,447 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.96 vs. limit=22.5 2024-08-14 02:11:39,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2429330.0, ans=0.125 2024-08-14 02:11:39,254 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2429330.0, ans=0.1 2024-08-14 02:11:58,558 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.682e+01 2.348e+01 2.595e+01 2.854e+01 6.191e+01, threshold=5.189e+01, percent-clipped=1.0 2024-08-14 02:12:10,958 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.43 vs. limit=10.0 2024-08-14 02:12:12,351 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.95 vs. limit=15.0 2024-08-14 02:12:13,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2429530.0, ans=0.125 2024-08-14 02:12:22,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2429630.0, ans=0.125 2024-08-14 02:12:28,383 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 11100, loss[loss=0.0813, beats_loss=0.01224, ecapa_loss=0.0001432, whisper_loss=0.06764, over 17172.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01062, ecapa_loss=0.0001588, whisper_loss=0.09232, over 3928165.14 frames. ], batch size: 72, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:12:31,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2429730.0, ans=0.04949747468305833 2024-08-14 02:12:36,845 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.06 vs. limit=15.0 2024-08-14 02:12:51,973 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 13 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-14 02:13:03,197 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 20 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-14 02:13:11,469 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 26 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-14 02:13:14,030 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-14 02:13:17,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2430030.0, ans=0.0 2024-08-14 02:13:19,033 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.28 vs. limit=22.5 2024-08-14 02:13:24,711 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 24 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-14 02:13:32,171 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 39 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-14 02:13:35,891 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 11150, loss[loss=0.1163, beats_loss=0.009506, ecapa_loss=0.0001567, whisper_loss=0.1052, over 22929.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01057, ecapa_loss=0.0001589, whisper_loss=0.09234, over 3914617.60 frames. ], batch size: 90, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:13:40,297 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 02:13:51,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2430330.0, ans=0.1 2024-08-14 02:13:59,355 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2430330.0, ans=0.125 2024-08-14 02:14:06,778 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 30 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-14 02:14:12,353 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.864e+01 2.319e+01 2.556e+01 2.861e+01 3.873e+01, threshold=5.113e+01, percent-clipped=0.0 2024-08-14 02:14:15,060 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 23 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-14 02:14:16,694 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 34 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-14 02:14:34,402 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2430630.0, ans=0.0 2024-08-14 02:14:42,671 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2430730.0, ans=0.0 2024-08-14 02:14:43,548 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 11200, loss[loss=0.09481, beats_loss=0.01133, ecapa_loss=0.0001744, whisper_loss=0.08174, over 21470.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01055, ecapa_loss=0.0001603, whisper_loss=0.0925, over 3911352.43 frames. ], batch size: 91, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:14:46,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2430730.0, ans=0.0 2024-08-14 02:15:01,353 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 19 from LS+wenet, 25 from Vox, 21 fro AS 2024-08-14 02:15:50,380 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 11250, loss[loss=0.1131, beats_loss=0.01046, ecapa_loss=0.0001803, whisper_loss=0.1008, over 15018.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01056, ecapa_loss=0.0001612, whisper_loss=0.09191, over 3889958.13 frames. ], batch size: 62, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:15:51,974 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 20 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-14 02:16:00,376 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2431230.0, ans=0.0 2024-08-14 02:16:04,415 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 20 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-14 02:16:05,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2431330.0, ans=0.0 2024-08-14 02:16:12,541 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 23 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-14 02:16:13,926 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-14 02:16:14,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2431330.0, ans=0.125 2024-08-14 02:16:27,753 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.401e+01 2.755e+01 3.055e+01 4.281e+01, threshold=5.509e+01, percent-clipped=0.0 2024-08-14 02:16:37,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2431530.0, ans=0.125 2024-08-14 02:16:47,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2431630.0, ans=0.0 2024-08-14 02:16:50,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2431630.0, ans=0.1 2024-08-14 02:16:57,993 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 11300, loss[loss=0.1036, beats_loss=0.009208, ecapa_loss=0.0001596, whisper_loss=0.09279, over 14484.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01055, ecapa_loss=0.0001603, whisper_loss=0.09178, over 3854217.64 frames. ], batch size: 56, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:16:59,407 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 20 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-14 02:17:15,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2431830.0, ans=0.125 2024-08-14 02:17:38,430 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2432030.0, ans=0.2 2024-08-14 02:18:04,293 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 11350, loss[loss=0.1162, beats_loss=0.01004, ecapa_loss=0.0001529, whisper_loss=0.1046, over 22909.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.0105, ecapa_loss=0.0001614, whisper_loss=0.09229, over 3852407.70 frames. ], batch size: 90, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:18:21,171 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-14 02:18:21,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2432330.0, ans=0.0 2024-08-14 02:18:22,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2432330.0, ans=0.125 2024-08-14 02:18:26,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2432330.0, ans=0.0 2024-08-14 02:18:27,530 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2432330.0, ans=0.125 2024-08-14 02:18:29,662 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.07 vs. limit=10.0 2024-08-14 02:18:37,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2432430.0, ans=0.0 2024-08-14 02:18:41,298 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.926e+01 2.318e+01 2.543e+01 2.878e+01 4.882e+01, threshold=5.086e+01, percent-clipped=0.0 2024-08-14 02:18:44,320 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2432530.0, ans=0.0 2024-08-14 02:18:59,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2432630.0, ans=0.125 2024-08-14 02:19:11,461 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 11400, loss[loss=0.1026, beats_loss=0.01321, ecapa_loss=0.0001306, whisper_loss=0.08807, over 14173.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01053, ecapa_loss=0.000162, whisper_loss=0.09261, over 3865446.23 frames. ], batch size: 57, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:19:18,731 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-14 02:19:27,911 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 15 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-14 02:19:53,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2433030.0, ans=0.125 2024-08-14 02:19:56,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=2433030.0, ans=15.0 2024-08-14 02:20:04,124 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 18 from LS+wenet, 20 from Vox, 16 fro AS 2024-08-14 02:20:05,640 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2433130.0, ans=0.125 2024-08-14 02:20:08,539 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.83 vs. limit=12.0 2024-08-14 02:20:10,883 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2433130.0, ans=0.1 2024-08-14 02:20:18,436 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 11450, loss[loss=0.08576, beats_loss=0.01351, ecapa_loss=0.0001592, whisper_loss=0.07066, over 19002.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01058, ecapa_loss=0.0001615, whisper_loss=0.09239, over 3888989.98 frames. ], batch size: 78, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:20:30,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2433230.0, ans=0.125 2024-08-14 02:20:42,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2433330.0, ans=0.125 2024-08-14 02:20:49,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2433430.0, ans=0.0 2024-08-14 02:20:56,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2433430.0, ans=0.0 2024-08-14 02:20:56,910 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.916e+01 2.469e+01 2.659e+01 2.977e+01 4.724e+01, threshold=5.318e+01, percent-clipped=0.0 2024-08-14 02:20:58,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2433430.0, ans=0.1 2024-08-14 02:21:07,378 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.70 vs. limit=15.0 2024-08-14 02:21:26,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2433730.0, ans=0.95 2024-08-14 02:21:27,544 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 11500, loss[loss=0.1249, beats_loss=0.007466, ecapa_loss=0.0001732, whisper_loss=0.1157, over 22199.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01063, ecapa_loss=0.0001617, whisper_loss=0.09155, over 3859757.83 frames. ], batch size: 84, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:21:51,838 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 22 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-14 02:21:53,138 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-14 02:22:15,252 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.43 vs. limit=22.5 2024-08-14 02:22:26,473 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2434130.0, ans=0.2 2024-08-14 02:22:34,103 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 11550, loss[loss=0.106, beats_loss=0.01019, ecapa_loss=0.0001476, whisper_loss=0.09429, over 21987.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01055, ecapa_loss=0.0001635, whisper_loss=0.0919, over 3850973.56 frames. ], batch size: 86, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:22:34,300 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 29 from LS+wenet, 31 from Vox, 25 fro AS 2024-08-14 02:22:51,930 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.13 vs. limit=6.0 2024-08-14 02:22:58,610 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-14 02:23:05,332 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 35 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-14 02:23:06,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2434430.0, ans=0.2 2024-08-14 02:23:07,826 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 23 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-14 02:23:10,240 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.081e+01 2.439e+01 2.716e+01 3.080e+01 3.847e+02, threshold=5.432e+01, percent-clipped=2.0 2024-08-14 02:23:16,979 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.68 vs. limit=15.0 2024-08-14 02:23:17,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2434530.0, ans=0.0 2024-08-14 02:23:35,117 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 02:23:37,730 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-14 02:23:41,920 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 11600, loss[loss=0.1112, beats_loss=0.008824, ecapa_loss=0.00019, whisper_loss=0.1005, over 23693.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01058, ecapa_loss=0.0001626, whisper_loss=0.09186, over 3883283.30 frames. ], batch size: 91, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:23:42,098 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 21 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-14 02:24:03,591 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2434830.0, ans=0.2 2024-08-14 02:24:28,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2435030.0, ans=0.0 2024-08-14 02:24:28,039 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2435030.0, ans=10.0 2024-08-14 02:24:29,561 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.06 vs. limit=10.0 2024-08-14 02:24:30,671 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.19 vs. limit=15.0 2024-08-14 02:24:51,554 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 17 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-14 02:24:51,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2435230.0, ans=0.0 2024-08-14 02:24:52,713 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 11650, loss[loss=0.08566, beats_loss=0.01277, ecapa_loss=0.0001679, whisper_loss=0.07121, over 13886.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.0106, ecapa_loss=0.0001625, whisper_loss=0.09185, over 3866050.62 frames. ], batch size: 56, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:25:02,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2435230.0, ans=0.125 2024-08-14 02:25:30,582 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-14 02:25:32,241 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.000e+01 2.452e+01 2.685e+01 3.038e+01 4.511e+01, threshold=5.370e+01, percent-clipped=0.0 2024-08-14 02:25:49,329 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 22 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-14 02:26:04,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2435630.0, ans=0.0 2024-08-14 02:26:06,686 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 11700, loss[loss=0.09052, beats_loss=0.01306, ecapa_loss=0.0001915, whisper_loss=0.07554, over 18529.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01069, ecapa_loss=0.0001623, whisper_loss=0.09248, over 3873374.07 frames. ], batch size: 82, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:26:06,800 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-14 02:26:17,690 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 24 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-14 02:26:19,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2435730.0, ans=0.125 2024-08-14 02:26:34,143 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.82 vs. limit=15.0 2024-08-14 02:26:56,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2436030.0, ans=0.0 2024-08-14 02:26:57,999 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 22 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-14 02:26:59,494 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 19 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-14 02:27:11,507 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 22 from LS+wenet, 11 from Vox, 22 fro AS 2024-08-14 02:27:19,696 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.52 vs. limit=10.0 2024-08-14 02:27:23,957 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 11750, loss[loss=0.101, beats_loss=0.01179, ecapa_loss=0.0001164, whisper_loss=0.08807, over 22939.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01077, ecapa_loss=0.0001612, whisper_loss=0.0925, over 3911987.40 frames. ], batch size: 90, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:27:35,870 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.66 vs. limit=15.0 2024-08-14 02:28:03,315 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.416e+01 2.659e+01 3.015e+01 1.752e+02, threshold=5.317e+01, percent-clipped=1.0 2024-08-14 02:28:03,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2436430.0, ans=0.2 2024-08-14 02:28:19,878 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-14 02:28:21,367 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2436630.0, ans=0.125 2024-08-14 02:28:21,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2436630.0, ans=0.125 2024-08-14 02:28:21,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2436630.0, ans=0.0 2024-08-14 02:28:37,421 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 11800, loss[loss=0.08486, beats_loss=0.01132, ecapa_loss=0.0001315, whisper_loss=0.07222, over 21028.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01086, ecapa_loss=0.0001605, whisper_loss=0.0918, over 3941374.48 frames. ], batch size: 82, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:28:55,597 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 02:28:55,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2436830.0, ans=0.125 2024-08-14 02:29:00,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2436830.0, ans=0.125 2024-08-14 02:29:02,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2436830.0, ans=0.125 2024-08-14 02:29:10,480 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-14 02:29:21,744 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-14 02:29:24,748 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.50 vs. limit=15.0 2024-08-14 02:29:25,871 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2437030.0, ans=0.1 2024-08-14 02:29:31,515 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2437130.0, ans=0.125 2024-08-14 02:29:39,892 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2437130.0, ans=0.0 2024-08-14 02:29:43,110 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.50 vs. limit=15.0 2024-08-14 02:29:46,160 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 11850, loss[loss=0.08664, beats_loss=0.01165, ecapa_loss=0.0001137, whisper_loss=0.07385, over 18578.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01086, ecapa_loss=0.0001597, whisper_loss=0.09129, over 3955466.88 frames. ], batch size: 71, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:29:46,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2437230.0, ans=0.2 2024-08-14 02:29:50,629 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 24 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-14 02:30:01,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2437330.0, ans=0.0 2024-08-14 02:30:07,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2437330.0, ans=0.125 2024-08-14 02:30:08,525 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2437330.0, ans=0.125 2024-08-14 02:30:18,268 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 29 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-14 02:30:23,382 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.460e+01 2.783e+01 3.243e+01 6.982e+01, threshold=5.565e+01, percent-clipped=2.0 2024-08-14 02:30:25,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2437430.0, ans=0.125 2024-08-14 02:30:26,028 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.73 vs. limit=6.0 2024-08-14 02:30:32,341 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2437530.0, ans=0.1 2024-08-14 02:30:39,565 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 22 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-14 02:30:46,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2437630.0, ans=0.1 2024-08-14 02:30:50,884 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.61 vs. limit=10.0 2024-08-14 02:30:55,402 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 11900, loss[loss=0.09956, beats_loss=0.01376, ecapa_loss=0.0001686, whisper_loss=0.08411, over 19657.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01088, ecapa_loss=0.0001593, whisper_loss=0.09167, over 3971509.95 frames. ], batch size: 81, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:30:56,329 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.43 vs. limit=22.5 2024-08-14 02:31:00,861 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-14 02:31:02,770 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2437730.0, ans=0.125 2024-08-14 02:31:16,316 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 27 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-14 02:31:29,525 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=2437930.0, ans=0.2 2024-08-14 02:31:55,518 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 23 from LS+wenet, 33 from Vox, 35 fro AS 2024-08-14 02:31:59,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2438130.0, ans=0.125 2024-08-14 02:32:03,879 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 11950, loss[loss=0.09273, beats_loss=0.01367, ecapa_loss=0.0001314, whisper_loss=0.07774, over 19384.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01086, ecapa_loss=0.0001592, whisper_loss=0.0913, over 3924187.52 frames. ], batch size: 79, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:32:21,221 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-14 02:32:41,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2438430.0, ans=0.125 2024-08-14 02:32:42,577 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.387e+01 2.634e+01 2.951e+01 4.369e+01, threshold=5.267e+01, percent-clipped=0.0 2024-08-14 02:32:46,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2438530.0, ans=0.125 2024-08-14 02:32:53,088 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2438530.0, ans=0.125 2024-08-14 02:33:15,496 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 12000, loss[loss=0.1055, beats_loss=0.009946, ecapa_loss=0.0001668, whisper_loss=0.09385, over 21064.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01083, ecapa_loss=0.0001587, whisper_loss=0.09124, over 3912316.40 frames. ], batch size: 84, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:33:15,498 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-14 02:34:00,815 INFO [train_multi_KD3.py:1149] (0/4) Epoch 17, validation on ASR_libri: loss=0.2528, beats_loss=0, ecapa_loss=0.0005541, whisper_loss=0.2473, over 922467.00 frames. 2024-08-14 02:34:21,451 INFO [train_multi_KD3.py:1149] (0/4) Epoch 17, validation on SV_voxceleb1: loss=0.004448, beats_loss=0, ecapa_loss=0.0004448, whisper_loss=0, over 939242.00 frames. 2024-08-14 02:36:27,262 INFO [train_multi_KD3.py:1149] (0/4) Epoch 17, validation on AT_audioset: loss=0.02358, beats_loss=0.02358, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 02:36:27,272 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32601MB 2024-08-14 02:36:38,781 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 17 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-14 02:36:48,688 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 22 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-14 02:36:50,279 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 28 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-14 02:37:06,278 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.44 vs. limit=12.0 2024-08-14 02:37:15,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2439030.0, ans=0.125 2024-08-14 02:37:21,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2439130.0, ans=0.0 2024-08-14 02:37:36,021 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 12050, loss[loss=0.0919, beats_loss=0.007395, ecapa_loss=0.0002319, whisper_loss=0.08219, over 14886.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01087, ecapa_loss=0.0001585, whisper_loss=0.09057, over 3877017.23 frames. ], batch size: 57, lr: 3.65e-03, grad_scale: 1.152921504606847e+18 2024-08-14 02:37:53,097 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2439330.0, ans=0.0 2024-08-14 02:37:59,615 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2439330.0, ans=0.125 2024-08-14 02:38:05,914 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2439430.0, ans=0.2 2024-08-14 02:38:08,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2439430.0, ans=0.1 2024-08-14 02:38:14,311 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.755e+01 2.389e+01 2.572e+01 2.864e+01 7.729e+01, threshold=5.144e+01, percent-clipped=2.0 2024-08-14 02:38:42,016 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.14 vs. limit=15.0 2024-08-14 02:38:44,420 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 12100, loss[loss=0.09714, beats_loss=0.01124, ecapa_loss=0.0001542, whisper_loss=0.08436, over 21649.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01087, ecapa_loss=0.0001588, whisper_loss=0.0906, over 3874227.27 frames. ], batch size: 86, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:39:22,912 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-244000.pt 2024-08-14 02:39:30,221 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2440030.0, ans=0.1 2024-08-14 02:39:44,557 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 22 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-14 02:39:47,850 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-14 02:40:01,453 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 12150, loss[loss=0.09832, beats_loss=0.009879, ecapa_loss=0.0001748, whisper_loss=0.08669, over 14482.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01088, ecapa_loss=0.0001592, whisper_loss=0.09021, over 3855160.22 frames. ], batch size: 58, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:40:12,947 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.63 vs. limit=12.0 2024-08-14 02:40:34,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2440430.0, ans=0.07 2024-08-14 02:40:42,553 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.78 vs. limit=15.0 2024-08-14 02:40:44,593 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.942e+01 2.448e+01 2.795e+01 3.138e+01 2.484e+02, threshold=5.590e+01, percent-clipped=2.0 2024-08-14 02:41:03,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2440630.0, ans=0.125 2024-08-14 02:41:18,183 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 12200, loss[loss=0.1169, beats_loss=0.0108, ecapa_loss=0.0001556, whisper_loss=0.1046, over 21415.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01083, ecapa_loss=0.0001589, whisper_loss=0.09061, over 3850313.60 frames. ], batch size: 87, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:41:34,507 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2440830.0, ans=0.125 2024-08-14 02:42:16,048 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 20 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-14 02:42:19,145 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 26 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-14 02:42:23,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2441130.0, ans=0.125 2024-08-14 02:42:33,196 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 12250, loss[loss=0.1178, beats_loss=0.009059, ecapa_loss=0.0001713, whisper_loss=0.107, over 20585.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01074, ecapa_loss=0.0001605, whisper_loss=0.09144, over 3831981.03 frames. ], batch size: 83, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:42:55,972 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.93 vs. limit=12.0 2024-08-14 02:43:00,444 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.82 vs. limit=10.0 2024-08-14 02:43:14,067 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.990e+01 2.529e+01 2.845e+01 3.228e+01 1.360e+02, threshold=5.691e+01, percent-clipped=2.0 2024-08-14 02:43:15,836 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2441530.0, ans=0.0 2024-08-14 02:43:33,870 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2441630.0, ans=0.0 2024-08-14 02:43:43,975 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 21 from LS+wenet, 23 from Vox, 19 fro AS 2024-08-14 02:43:46,777 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 12300, loss[loss=0.112, beats_loss=0.007896, ecapa_loss=0.0001704, whisper_loss=0.1024, over 19792.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01071, ecapa_loss=0.0001596, whisper_loss=0.09128, over 3839516.44 frames. ], batch size: 75, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:43:52,930 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 29 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-14 02:44:15,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2441930.0, ans=0.0 2024-08-14 02:44:24,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2441930.0, ans=0.1 2024-08-14 02:44:28,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2441930.0, ans=0.0 2024-08-14 02:44:39,597 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 25 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-14 02:44:41,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2442030.0, ans=0.2 2024-08-14 02:44:49,592 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.58 vs. limit=10.0 2024-08-14 02:44:56,994 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 12350, loss[loss=0.1087, beats_loss=0.00822, ecapa_loss=0.0001854, whisper_loss=0.09865, over 21442.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01063, ecapa_loss=0.0001615, whisper_loss=0.09161, over 3840005.44 frames. ], batch size: 85, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:44:59,095 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2442230.0, ans=0.125 2024-08-14 02:45:02,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2442230.0, ans=0.125 2024-08-14 02:45:14,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2442330.0, ans=0.125 2024-08-14 02:45:20,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2442330.0, ans=0.125 2024-08-14 02:45:34,484 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.759e+01 2.343e+01 2.707e+01 2.893e+01 7.539e+01, threshold=5.413e+01, percent-clipped=2.0 2024-08-14 02:45:54,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2442630.0, ans=0.125 2024-08-14 02:46:03,073 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 12400, loss[loss=0.06992, beats_loss=0.01113, ecapa_loss=0.0001381, whisper_loss=0.05741, over 15061.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01071, ecapa_loss=0.0001607, whisper_loss=0.09129, over 3828403.95 frames. ], batch size: 58, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:46:03,295 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-14 02:46:15,038 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 25 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-14 02:46:16,568 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2442830.0, ans=0.05 2024-08-14 02:46:25,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2442830.0, ans=0.2 2024-08-14 02:46:25,523 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2442830.0, ans=0.1 2024-08-14 02:46:34,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2442930.0, ans=0.125 2024-08-14 02:46:36,537 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2442930.0, ans=0.125 2024-08-14 02:46:40,194 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 25 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-14 02:46:50,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2443030.0, ans=0.0 2024-08-14 02:47:04,876 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 11 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-14 02:47:07,518 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 12450, loss[loss=0.09951, beats_loss=0.0113, ecapa_loss=0.0001557, whisper_loss=0.08665, over 23041.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01073, ecapa_loss=0.0001605, whisper_loss=0.09063, over 3813472.33 frames. ], batch size: 94, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:47:07,973 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2443230.0, ans=0.125 2024-08-14 02:47:17,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2443230.0, ans=0.0 2024-08-14 02:47:22,349 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2443330.0, ans=0.2 2024-08-14 02:47:27,148 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 20 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-14 02:47:30,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2443330.0, ans=0.125 2024-08-14 02:47:38,421 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.21 vs. limit=15.0 2024-08-14 02:47:39,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2443430.0, ans=0.1 2024-08-14 02:47:44,042 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.385e+01 2.629e+01 3.074e+01 4.896e+01, threshold=5.258e+01, percent-clipped=0.0 2024-08-14 02:47:44,527 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2443430.0, ans=0.2 2024-08-14 02:48:06,281 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 17 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-14 02:48:10,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2443630.0, ans=0.0 2024-08-14 02:48:11,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2443730.0, ans=0.125 2024-08-14 02:48:12,820 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 12500, loss[loss=0.1065, beats_loss=0.0102, ecapa_loss=0.0001746, whisper_loss=0.0946, over 21124.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01077, ecapa_loss=0.0001592, whisper_loss=0.09, over 3841933.82 frames. ], batch size: 87, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:48:17,737 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.34 vs. limit=12.0 2024-08-14 02:48:32,965 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.84 vs. limit=15.0 2024-08-14 02:48:49,061 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 23 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-14 02:48:49,688 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2443930.0, ans=0.2 2024-08-14 02:49:02,410 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2444030.0, ans=0.05 2024-08-14 02:49:04,034 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.50 vs. limit=10.0 2024-08-14 02:49:17,707 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 12550, loss[loss=0.1122, beats_loss=0.01003, ecapa_loss=0.0002019, whisper_loss=0.1001, over 21907.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01072, ecapa_loss=0.0001592, whisper_loss=0.09075, over 3866695.86 frames. ], batch size: 93, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:49:26,920 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 27 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-14 02:49:29,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2444330.0, ans=0.125 2024-08-14 02:49:34,290 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.27 vs. limit=15.0 2024-08-14 02:49:38,998 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 22 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-14 02:49:54,366 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.347e+01 2.679e+01 3.056e+01 5.301e+01, threshold=5.357e+01, percent-clipped=1.0 2024-08-14 02:50:11,549 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2444630.0, ans=0.125 2024-08-14 02:50:15,660 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.89 vs. limit=15.0 2024-08-14 02:50:19,557 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=2444630.0, ans=15.0 2024-08-14 02:50:22,810 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 12600, loss[loss=0.09571, beats_loss=0.01182, ecapa_loss=0.000148, whisper_loss=0.08241, over 15804.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01072, ecapa_loss=0.0001595, whisper_loss=0.09092, over 3882585.14 frames. ], batch size: 62, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:50:25,543 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 16 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-14 02:50:29,067 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 24 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-14 02:50:43,123 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-14 02:50:53,482 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 23 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-14 02:51:11,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2445030.0, ans=0.1 2024-08-14 02:51:14,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2445130.0, ans=0.125 2024-08-14 02:51:27,245 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 12650, loss[loss=0.09668, beats_loss=0.01018, ecapa_loss=0.0001615, whisper_loss=0.08489, over 15941.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01085, ecapa_loss=0.0001597, whisper_loss=0.09001, over 3868586.92 frames. ], batch size: 63, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:51:31,482 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=2445230.0, ans=0.02 2024-08-14 02:51:34,907 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-14 02:51:44,537 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2445330.0, ans=0.09899494936611666 2024-08-14 02:51:50,693 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 21 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-14 02:51:52,254 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-14 02:52:01,686 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.103e-01 2024-08-14 02:52:03,678 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.374e+01 2.633e+01 2.976e+01 1.427e+02, threshold=5.265e+01, percent-clipped=1.0 2024-08-14 02:52:08,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2445530.0, ans=0.125 2024-08-14 02:52:21,383 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2445630.0, ans=0.0 2024-08-14 02:52:26,529 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2445630.0, ans=0.125 2024-08-14 02:52:32,223 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 12700, loss[loss=0.1013, beats_loss=0.01289, ecapa_loss=0.000161, whisper_loss=0.0868, over 21980.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01092, ecapa_loss=0.0001601, whisper_loss=0.08969, over 3899620.08 frames. ], batch size: 90, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:52:39,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2445730.0, ans=0.1 2024-08-14 02:52:47,942 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 28 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-14 02:53:01,523 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 28 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-14 02:53:07,110 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-14 02:53:34,501 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.06 vs. limit=15.0 2024-08-14 02:53:37,658 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 12750, loss[loss=0.09741, beats_loss=0.01125, ecapa_loss=0.0002021, whisper_loss=0.08414, over 16612.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01089, ecapa_loss=0.0001611, whisper_loss=0.0899, over 3894690.10 frames. ], batch size: 70, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:53:48,545 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.94 vs. limit=22.5 2024-08-14 02:53:55,785 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 24 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-14 02:53:56,018 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 02:54:01,493 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2446330.0, ans=0.125 2024-08-14 02:54:14,211 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.854e+01 2.458e+01 2.855e+01 3.170e+01 1.362e+02, threshold=5.709e+01, percent-clipped=3.0 2024-08-14 02:54:41,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2446730.0, ans=0.05 2024-08-14 02:54:42,252 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 12800, loss[loss=0.1071, beats_loss=0.01103, ecapa_loss=0.0001744, whisper_loss=0.09434, over 21063.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01092, ecapa_loss=0.0001615, whisper_loss=0.09027, over 3906951.56 frames. ], batch size: 87, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:54:46,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2446730.0, ans=0.125 2024-08-14 02:54:50,185 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-14 02:54:58,377 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 18 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-14 02:55:20,365 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 22 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-14 02:55:41,288 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 15 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-14 02:55:47,791 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 12850, loss[loss=0.1134, beats_loss=0.009387, ecapa_loss=0.0001712, whisper_loss=0.1023, over 16086.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01088, ecapa_loss=0.0001625, whisper_loss=0.09021, over 3874034.06 frames. ], batch size: 63, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:55:53,735 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2447230.0, ans=0.0 2024-08-14 02:56:07,492 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-14 02:56:09,803 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 21 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-14 02:56:10,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2447330.0, ans=0.0 2024-08-14 02:56:12,675 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2447430.0, ans=0.0 2024-08-14 02:56:23,762 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.411e+01 2.741e+01 3.118e+01 1.301e+02, threshold=5.482e+01, percent-clipped=1.0 2024-08-14 02:56:38,937 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2447630.0, ans=0.125 2024-08-14 02:56:42,529 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 15 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-14 02:56:50,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2447630.0, ans=0.125 2024-08-14 02:56:53,058 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 12900, loss[loss=0.09937, beats_loss=0.01252, ecapa_loss=0.0001587, whisper_loss=0.08527, over 18680.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01093, ecapa_loss=0.0001623, whisper_loss=0.08919, over 3824773.89 frames. ], batch size: 75, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:57:03,690 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 02:57:17,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2447830.0, ans=0.125 2024-08-14 02:57:31,077 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2447930.0, ans=0.1 2024-08-14 02:57:45,401 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 22 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-14 02:57:49,405 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2448130.0, ans=0.0 2024-08-14 02:57:56,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2448130.0, ans=0.0 2024-08-14 02:58:00,455 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 29 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-14 02:58:01,628 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 12950, loss[loss=0.1078, beats_loss=0.0103, ecapa_loss=0.0001288, whisper_loss=0.09624, over 20942.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01083, ecapa_loss=0.0001611, whisper_loss=0.08968, over 3840795.17 frames. ], batch size: 80, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:58:09,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2448230.0, ans=0.125 2024-08-14 02:58:22,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2448330.0, ans=0.2 2024-08-14 02:58:28,126 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.whiten.whitening_limit, batch_count=2448330.0, ans=12.0 2024-08-14 02:58:31,331 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 27 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-14 02:58:33,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2448430.0, ans=0.95 2024-08-14 02:58:41,079 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.282e+01 2.587e+01 2.877e+01 4.043e+01, threshold=5.173e+01, percent-clipped=0.0 2024-08-14 02:58:48,523 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2448530.0, ans=0.0 2024-08-14 02:58:55,270 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-14 02:58:55,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2448530.0, ans=0.0 2024-08-14 02:59:00,788 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 31 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-14 02:59:12,067 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 13000, loss[loss=0.1029, beats_loss=0.01075, ecapa_loss=0.0001572, whisper_loss=0.09056, over 14677.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01072, ecapa_loss=0.0001612, whisper_loss=0.09052, over 3853452.68 frames. ], batch size: 56, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:59:12,749 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.78 vs. limit=6.0 2024-08-14 02:59:13,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2448730.0, ans=0.5 2024-08-14 02:59:21,402 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.27 vs. limit=15.0 2024-08-14 02:59:30,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2448830.0, ans=0.1 2024-08-14 02:59:34,718 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2448830.0, ans=0.1 2024-08-14 02:59:42,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2448930.0, ans=0.125 2024-08-14 02:59:43,899 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2448930.0, ans=0.0 2024-08-14 03:00:01,279 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2449030.0, ans=0.0 2024-08-14 03:00:13,448 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 22 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-14 03:00:27,263 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 13050, loss[loss=0.08472, beats_loss=0.01366, ecapa_loss=0.0001362, whisper_loss=0.0697, over 23001.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01071, ecapa_loss=0.0001616, whisper_loss=0.09058, over 3859981.31 frames. ], batch size: 95, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:00:39,433 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2449230.0, ans=0.2 2024-08-14 03:01:01,898 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.23 vs. limit=22.5 2024-08-14 03:01:17,865 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.70 vs. limit=22.5 2024-08-14 03:01:18,467 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.987e+01 2.546e+01 2.785e+01 3.142e+01 1.124e+02, threshold=5.570e+01, percent-clipped=2.0 2024-08-14 03:01:20,191 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.66 vs. limit=15.0 2024-08-14 03:02:03,168 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 13100, loss[loss=0.1089, beats_loss=0.01245, ecapa_loss=0.0001601, whisper_loss=0.09489, over 21216.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01074, ecapa_loss=0.0001609, whisper_loss=0.09038, over 3886058.24 frames. ], batch size: 86, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:02:08,143 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2449730.0, ans=0.125 2024-08-14 03:02:26,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=2449830.0, ans=10.0 2024-08-14 03:02:30,925 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-14 03:03:07,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2450030.0, ans=0.2 2024-08-14 03:03:17,799 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2450030.0, ans=0.125 2024-08-14 03:03:33,440 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2450130.0, ans=0.125 2024-08-14 03:03:45,164 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-14 03:03:53,833 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 13150, loss[loss=0.09079, beats_loss=0.01196, ecapa_loss=0.0001572, whisper_loss=0.07726, over 19484.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01079, ecapa_loss=0.000159, whisper_loss=0.09029, over 3886480.26 frames. ], batch size: 77, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:04:09,409 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2450230.0, ans=0.025 2024-08-14 03:04:14,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2450230.0, ans=0.07 2024-08-14 03:04:26,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2450330.0, ans=0.2 2024-08-14 03:04:37,226 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 25 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-14 03:04:51,358 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2450430.0, ans=0.125 2024-08-14 03:05:09,515 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.967e+01 2.343e+01 2.636e+01 2.918e+01 3.888e+01, threshold=5.273e+01, percent-clipped=0.0 2024-08-14 03:05:09,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2450430.0, ans=0.125 2024-08-14 03:05:18,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2450530.0, ans=0.125 2024-08-14 03:05:26,376 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.94 vs. limit=22.5 2024-08-14 03:05:32,292 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.02 vs. limit=15.0 2024-08-14 03:05:46,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2450630.0, ans=0.125 2024-08-14 03:05:48,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2450630.0, ans=0.07 2024-08-14 03:06:08,979 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 13200, loss[loss=0.08346, beats_loss=0.01192, ecapa_loss=0.0001328, whisper_loss=0.07021, over 13735.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01075, ecapa_loss=0.0001595, whisper_loss=0.08968, over 3851122.48 frames. ], batch size: 53, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:06:14,740 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 03:06:38,190 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 25 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-14 03:07:03,466 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.89 vs. limit=15.0 2024-08-14 03:07:10,493 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2450930.0, ans=0.1 2024-08-14 03:07:11,901 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-14 03:07:15,253 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.93 vs. limit=15.0 2024-08-14 03:07:56,304 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 9 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-14 03:08:02,940 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 34 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-14 03:08:09,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2451130.0, ans=0.125 2024-08-14 03:08:14,491 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=2451230.0, ans=10.0 2024-08-14 03:08:16,157 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 13250, loss[loss=0.1113, beats_loss=0.01169, ecapa_loss=0.0001353, whisper_loss=0.09826, over 22407.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01067, ecapa_loss=0.0001595, whisper_loss=0.09066, over 3853939.60 frames. ], batch size: 85, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:08:49,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2451330.0, ans=0.1 2024-08-14 03:09:05,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2451430.0, ans=0.125 2024-08-14 03:09:11,009 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-14 03:09:25,015 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 27 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-14 03:09:27,031 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.791e+01 2.488e+01 2.774e+01 3.161e+01 6.895e+01, threshold=5.548e+01, percent-clipped=1.0 2024-08-14 03:09:34,621 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-14 03:09:35,086 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.65 vs. limit=22.5 2024-08-14 03:09:58,069 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 25 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-14 03:10:00,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2451730.0, ans=0.125 2024-08-14 03:10:01,990 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 13300, loss[loss=0.1069, beats_loss=0.01033, ecapa_loss=0.0001752, whisper_loss=0.09486, over 20593.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01063, ecapa_loss=0.0001579, whisper_loss=0.09128, over 3867495.46 frames. ], batch size: 84, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:10:13,836 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 27 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-14 03:10:46,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2451930.0, ans=0.1 2024-08-14 03:10:53,097 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2452030.0, ans=0.1 2024-08-14 03:10:55,518 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.76 vs. limit=22.5 2024-08-14 03:11:24,021 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 03:11:26,835 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 13350, loss[loss=0.1157, beats_loss=0.006127, ecapa_loss=0.0001897, whisper_loss=0.1077, over 15196.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01064, ecapa_loss=0.0001575, whisper_loss=0.09133, over 3902759.15 frames. ], batch size: 61, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:11:28,480 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 14 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-14 03:11:37,748 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.64 vs. limit=15.0 2024-08-14 03:11:40,276 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 22 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-14 03:11:42,572 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.82 vs. limit=22.5 2024-08-14 03:11:49,963 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.06 vs. limit=22.5 2024-08-14 03:11:52,249 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 24 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-14 03:12:02,629 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.41 vs. limit=15.0 2024-08-14 03:12:08,622 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-14 03:12:10,218 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2452430.0, ans=0.125 2024-08-14 03:12:10,332 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 03:12:10,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2452430.0, ans=0.0 2024-08-14 03:12:12,724 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.893e+01 2.377e+01 2.695e+01 3.024e+01 3.722e+01, threshold=5.391e+01, percent-clipped=0.0 2024-08-14 03:12:14,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=2452530.0, ans=0.025 2024-08-14 03:12:16,218 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2452530.0, ans=0.0 2024-08-14 03:12:19,755 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.85 vs. limit=12.0 2024-08-14 03:12:34,994 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 23 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-14 03:12:47,159 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 28 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-14 03:12:48,254 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 13400, loss[loss=0.1098, beats_loss=0.01086, ecapa_loss=0.0001297, whisper_loss=0.09768, over 20857.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01066, ecapa_loss=0.0001586, whisper_loss=0.09106, over 3889729.96 frames. ], batch size: 80, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:12:50,191 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2452730.0, ans=0.0 2024-08-14 03:12:54,554 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 21 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-14 03:13:08,011 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 34 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-14 03:13:15,732 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 35 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-14 03:13:15,983 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2452830.0, ans=0.0 2024-08-14 03:13:25,675 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2452930.0, ans=0.0 2024-08-14 03:13:35,374 INFO [train_multi_KD3.py:844] (0/4) A total of 96 cuts. 35 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-14 03:13:36,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2453030.0, ans=0.125 2024-08-14 03:13:39,143 WARNING [optim.py:496] (0/4) Scaling gradients by 0.06988123059272766, model_norm_threshold=53.90802001953125 2024-08-14 03:13:39,683 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.25, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.512e+05, grad_sumsq=1.512e+05, orig_rms_sq=1.000e+00 2024-08-14 03:13:41,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2453030.0, ans=0.0 2024-08-14 03:13:50,179 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 15 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-14 03:14:06,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2453130.0, ans=0.125 2024-08-14 03:14:09,248 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 13450, loss[loss=0.07588, beats_loss=0.01114, ecapa_loss=0.0001641, whisper_loss=0.0631, over 17396.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01079, ecapa_loss=0.0001573, whisper_loss=0.09079, over 3939187.94 frames. ], batch size: 73, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:14:16,493 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2453230.0, ans=0.125 2024-08-14 03:14:17,723 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2453230.0, ans=0.2 2024-08-14 03:14:38,576 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2024-08-14 03:14:55,078 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.067e+01 2.489e+01 2.722e+01 3.204e+01 7.714e+02, threshold=5.444e+01, percent-clipped=1.0 2024-08-14 03:15:04,058 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.36 vs. limit=10.0 2024-08-14 03:15:06,347 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-14 03:15:07,760 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-14 03:15:16,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2453630.0, ans=0.1 2024-08-14 03:15:27,219 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 13500, loss[loss=0.1048, beats_loss=0.01078, ecapa_loss=0.0001508, whisper_loss=0.09252, over 22251.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01076, ecapa_loss=0.0001586, whisper_loss=0.0911, over 3951098.79 frames. ], batch size: 88, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:15:41,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2453830.0, ans=0.125 2024-08-14 03:16:02,381 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 26 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-14 03:16:09,712 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 21 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-14 03:16:22,020 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 16 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-14 03:16:36,641 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 13550, loss[loss=0.1138, beats_loss=0.009996, ecapa_loss=0.0001606, whisper_loss=0.1022, over 23447.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0108, ecapa_loss=0.0001586, whisper_loss=0.09054, over 3927735.10 frames. ], batch size: 94, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:16:46,867 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 23 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-14 03:16:54,324 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.59 vs. limit=10.0 2024-08-14 03:16:55,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2454330.0, ans=0.0 2024-08-14 03:16:56,326 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-14 03:16:58,568 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 22 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-14 03:16:59,234 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.92 vs. limit=15.0 2024-08-14 03:17:07,009 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.06 vs. limit=6.0 2024-08-14 03:17:12,744 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.332e+01 2.621e+01 2.776e+01 5.086e+01, threshold=5.241e+01, percent-clipped=0.0 2024-08-14 03:17:13,394 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.78 vs. limit=15.0 2024-08-14 03:17:33,527 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 17 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-14 03:17:34,893 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-14 03:17:41,323 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 13600, loss[loss=0.09485, beats_loss=0.01115, ecapa_loss=0.0002106, whisper_loss=0.08159, over 21663.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01083, ecapa_loss=0.0001584, whisper_loss=0.09014, over 3888417.09 frames. ], batch size: 94, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:17:42,067 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.60 vs. limit=15.0 2024-08-14 03:17:43,907 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 28 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-14 03:17:53,005 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 18 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-14 03:18:46,907 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 13650, loss[loss=0.1273, beats_loss=0.009701, ecapa_loss=0.0001496, whisper_loss=0.1161, over 20883.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01078, ecapa_loss=0.0001592, whisper_loss=0.09121, over 3901328.72 frames. ], batch size: 78, lr: 3.63e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:18:48,523 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 34 from LS+wenet, 26 from Vox, 21 fro AS 2024-08-14 03:18:51,457 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 19 from LS+wenet, 23 from Vox, 47 fro AS 2024-08-14 03:18:57,118 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 19 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-14 03:19:14,331 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 03:19:24,886 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.739e+01 2.360e+01 2.649e+01 3.081e+01 1.605e+02, threshold=5.298e+01, percent-clipped=1.0 2024-08-14 03:19:46,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2455630.0, ans=0.09899494936611666 2024-08-14 03:19:57,153 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 13700, loss[loss=0.1094, beats_loss=0.006574, ecapa_loss=0.0001737, whisper_loss=0.1011, over 18053.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01078, ecapa_loss=0.0001602, whisper_loss=0.09152, over 3917466.15 frames. ], batch size: 69, lr: 3.63e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:20:02,136 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.65 vs. limit=15.0 2024-08-14 03:20:47,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2456030.0, ans=0.05 2024-08-14 03:20:51,957 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 21 from LS+wenet, 12 from Vox, 21 fro AS 2024-08-14 03:21:10,977 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 13750, loss[loss=0.09352, beats_loss=0.008609, ecapa_loss=0.0001697, whisper_loss=0.08321, over 16354.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.0107, ecapa_loss=0.0001602, whisper_loss=0.09246, over 3884145.82 frames. ], batch size: 65, lr: 3.63e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:21:11,547 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2456230.0, ans=0.125 2024-08-14 03:21:22,454 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2456230.0, ans=0.125 2024-08-14 03:21:54,664 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.778e+01 2.289e+01 2.530e+01 2.894e+01 7.886e+01, threshold=5.061e+01, percent-clipped=1.0 2024-08-14 03:22:01,688 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2456530.0, ans=0.125 2024-08-14 03:22:13,236 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-14 03:22:13,732 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=2456630.0, ans=22.5 2024-08-14 03:22:14,965 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 15 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-14 03:22:28,436 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 13800, loss[loss=0.1071, beats_loss=0.01246, ecapa_loss=0.0001727, whisper_loss=0.0929, over 16879.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01065, ecapa_loss=0.0001608, whisper_loss=0.09186, over 3857144.77 frames. ], batch size: 69, lr: 3.63e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:22:30,397 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.77 vs. limit=6.0 2024-08-14 03:22:31,461 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 24 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-14 03:22:31,794 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2456730.0, ans=0.1 2024-08-14 03:22:34,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=2456730.0, ans=0.025 2024-08-14 03:22:35,773 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2456730.0, ans=0.125 2024-08-14 03:22:47,182 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.75 vs. limit=22.5 2024-08-14 03:22:48,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2456830.0, ans=0.07 2024-08-14 03:23:13,470 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.34 vs. limit=15.0 2024-08-14 03:23:17,041 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.99 vs. limit=12.0 2024-08-14 03:23:18,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=2457030.0, ans=15.0 2024-08-14 03:23:19,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2457030.0, ans=0.0 2024-08-14 03:23:47,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2457230.0, ans=0.125 2024-08-14 03:23:48,979 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 13850, loss[loss=0.1165, beats_loss=0.009973, ecapa_loss=0.0001971, whisper_loss=0.1046, over 19728.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01067, ecapa_loss=0.0001611, whisper_loss=0.09184, over 3888659.15 frames. ], batch size: 76, lr: 3.63e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:23:55,682 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.69 vs. limit=22.5 2024-08-14 03:24:18,828 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 33 from Vox, 31 fro AS 2024-08-14 03:24:22,844 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.76 vs. limit=15.0 2024-08-14 03:24:27,059 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 29 from Vox, 28 fro AS 2024-08-14 03:24:35,968 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.926e+01 2.485e+01 2.798e+01 3.130e+01 4.713e+02, threshold=5.595e+01, percent-clipped=2.0 2024-08-14 03:24:41,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2457530.0, ans=0.2 2024-08-14 03:24:42,901 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2457530.0, ans=0.125 2024-08-14 03:25:07,020 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 23 from LS+wenet, 8 from Vox, 22 fro AS 2024-08-14 03:25:11,266 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 13900, loss[loss=0.1062, beats_loss=0.009111, ecapa_loss=0.0001755, whisper_loss=0.09537, over 13772.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01056, ecapa_loss=0.0001603, whisper_loss=0.09295, over 3877830.23 frames. ], batch size: 54, lr: 3.63e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:25:50,737 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 20 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-14 03:26:12,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2458030.0, ans=0.0 2024-08-14 03:26:15,229 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 16 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-14 03:26:21,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2458130.0, ans=0.125 2024-08-14 03:26:29,188 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.66 vs. limit=22.5 2024-08-14 03:26:32,387 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-14 03:26:34,147 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 13950, loss[loss=0.1024, beats_loss=0.01205, ecapa_loss=0.0001572, whisper_loss=0.08881, over 17121.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01058, ecapa_loss=0.000159, whisper_loss=0.09233, over 3860243.36 frames. ], batch size: 68, lr: 3.63e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:26:42,467 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 22 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-14 03:26:51,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2458330.0, ans=0.05 2024-08-14 03:26:53,439 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 25 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-14 03:27:06,164 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 19 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-14 03:27:19,775 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.722e+01 2.317e+01 2.641e+01 2.864e+01 9.900e+01, threshold=5.282e+01, percent-clipped=1.0 2024-08-14 03:27:27,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2458530.0, ans=0.2 2024-08-14 03:27:30,598 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 17 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-14 03:27:39,185 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.35 vs. limit=22.5 2024-08-14 03:27:52,770 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 14000, loss[loss=0.1135, beats_loss=0.01032, ecapa_loss=0.0001192, whisper_loss=0.102, over 18251.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01063, ecapa_loss=0.0001583, whisper_loss=0.09218, over 3852413.88 frames. ], batch size: 68, lr: 3.63e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:27:56,498 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 23 from LS+wenet, 10 from Vox, 30 fro AS 2024-08-14 03:28:17,813 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2458830.0, ans=0.125 2024-08-14 03:28:19,244 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2458830.0, ans=0.1 2024-08-14 03:28:38,546 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=25.02 vs. limit=22.5 2024-08-14 03:28:43,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2459030.0, ans=0.125 2024-08-14 03:28:51,304 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2459030.0, ans=0.125 2024-08-14 03:28:54,123 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 22 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-14 03:29:06,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2459130.0, ans=0.2 2024-08-14 03:29:12,394 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 14050, loss[loss=0.07865, beats_loss=0.008211, ecapa_loss=0.0001926, whisper_loss=0.06852, over 15913.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01067, ecapa_loss=0.0001586, whisper_loss=0.09142, over 3852640.04 frames. ], batch size: 62, lr: 3.63e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:29:19,427 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2459230.0, ans=0.125 2024-08-14 03:29:23,724 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.10 vs. limit=22.5 2024-08-14 03:29:23,741 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.89 vs. limit=12.0 2024-08-14 03:29:31,402 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 24 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-14 03:29:31,711 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2459330.0, ans=0.0 2024-08-14 03:29:51,805 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2459430.0, ans=0.125 2024-08-14 03:29:58,120 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.911e+01 2.432e+01 2.589e+01 2.887e+01 3.706e+01, threshold=5.177e+01, percent-clipped=0.0 2024-08-14 03:30:02,463 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.50 vs. limit=22.5 2024-08-14 03:30:03,598 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 36 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-14 03:30:10,903 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2459530.0, ans=0.125 2024-08-14 03:30:13,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=2459630.0, ans=10.0 2024-08-14 03:30:27,610 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 23 from LS+wenet, 21 from Vox, 50 fro AS 2024-08-14 03:30:31,875 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 14100, loss[loss=0.0936, beats_loss=0.01016, ecapa_loss=0.000159, whisper_loss=0.08185, over 20457.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01072, ecapa_loss=0.0001588, whisper_loss=0.09123, over 3860388.40 frames. ], batch size: 84, lr: 3.63e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:30:33,319 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 25 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-14 03:30:33,941 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.31 vs. limit=10.0 2024-08-14 03:30:44,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2459730.0, ans=0.1 2024-08-14 03:30:48,493 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 24 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-14 03:31:03,994 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-14 03:31:17,425 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-14 03:31:26,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2460030.0, ans=0.1 2024-08-14 03:31:28,133 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-14 03:31:33,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2460030.0, ans=0.0 2024-08-14 03:31:33,553 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2460030.0, ans=0.125 2024-08-14 03:31:52,551 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 14150, loss[loss=0.1145, beats_loss=0.01009, ecapa_loss=0.0001734, whisper_loss=0.1027, over 15590.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01074, ecapa_loss=0.0001576, whisper_loss=0.09114, over 3872226.55 frames. ], batch size: 62, lr: 3.63e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:31:56,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2460230.0, ans=0.125 2024-08-14 03:32:08,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2460330.0, ans=0.1 2024-08-14 03:32:18,436 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2460330.0, ans=0.0 2024-08-14 03:32:20,574 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-14 03:32:36,235 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2460430.0, ans=0.0 2024-08-14 03:32:40,285 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.017e+01 2.331e+01 2.560e+01 2.927e+01 4.747e+01, threshold=5.121e+01, percent-clipped=0.0 2024-08-14 03:33:08,607 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=2460630.0, ans=0.5 2024-08-14 03:33:16,373 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 14200, loss[loss=0.1034, beats_loss=0.01218, ecapa_loss=0.0001478, whisper_loss=0.08971, over 23145.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01069, ecapa_loss=0.0001581, whisper_loss=0.09137, over 3858181.94 frames. ], batch size: 92, lr: 3.63e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:33:35,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2460830.0, ans=0.125 2024-08-14 03:33:35,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2460830.0, ans=0.0 2024-08-14 03:33:49,595 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 19 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-14 03:33:52,007 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.10 vs. limit=22.5 2024-08-14 03:34:22,466 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2461130.0, ans=0.125 2024-08-14 03:34:32,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2461130.0, ans=0.2 2024-08-14 03:34:39,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2461230.0, ans=0.0 2024-08-14 03:34:40,271 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 14250, loss[loss=0.09841, beats_loss=0.01106, ecapa_loss=0.0001816, whisper_loss=0.08553, over 21834.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01073, ecapa_loss=0.0001583, whisper_loss=0.09159, over 3861339.61 frames. ], batch size: 93, lr: 3.63e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:34:53,817 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 35 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-14 03:35:07,716 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 29 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-14 03:35:11,639 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 03:35:13,454 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 24 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-14 03:35:13,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2461430.0, ans=0.0 2024-08-14 03:35:14,701 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-14 03:35:16,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2461430.0, ans=0.5 2024-08-14 03:35:25,903 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.881e+01 2.288e+01 2.518e+01 2.897e+01 5.060e+01, threshold=5.036e+01, percent-clipped=0.0 2024-08-14 03:35:47,923 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2461630.0, ans=0.125 2024-08-14 03:35:49,636 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2461630.0, ans=0.2 2024-08-14 03:35:59,762 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 14300, loss[loss=0.09811, beats_loss=0.01079, ecapa_loss=0.0001805, whisper_loss=0.08551, over 21805.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01073, ecapa_loss=0.0001576, whisper_loss=0.09136, over 3875202.02 frames. ], batch size: 92, lr: 3.63e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:36:03,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2461730.0, ans=0.1 2024-08-14 03:36:09,495 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2461730.0, ans=0.125 2024-08-14 03:36:42,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2461930.0, ans=0.125 2024-08-14 03:36:54,609 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 29 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-14 03:36:55,783 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 27 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-14 03:37:02,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2462130.0, ans=0.0 2024-08-14 03:37:10,400 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2462130.0, ans=0.0 2024-08-14 03:37:18,371 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 14350, loss[loss=0.05971, beats_loss=0.01596, ecapa_loss=0.0001295, whisper_loss=0.04246, over 14248.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01076, ecapa_loss=0.0001578, whisper_loss=0.09112, over 3881529.55 frames. ], batch size: 60, lr: 3.63e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:37:18,457 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 18 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-14 03:37:25,013 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 19 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-14 03:37:34,966 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 19 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-14 03:37:35,166 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2462330.0, ans=0.125 2024-08-14 03:37:46,713 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.62 vs. limit=10.0 2024-08-14 03:37:47,727 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 19 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-14 03:37:51,703 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 35 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-14 03:37:51,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2462430.0, ans=0.125 2024-08-14 03:38:01,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2462430.0, ans=0.05 2024-08-14 03:38:03,860 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.804e+01 2.518e+01 2.731e+01 3.066e+01 7.073e+01, threshold=5.463e+01, percent-clipped=1.0 2024-08-14 03:38:08,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2462530.0, ans=0.0 2024-08-14 03:38:11,569 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2462530.0, ans=0.0 2024-08-14 03:38:30,984 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 16 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-14 03:38:32,792 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-14 03:38:34,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2462630.0, ans=0.1 2024-08-14 03:38:36,971 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 14400, loss[loss=0.08855, beats_loss=0.01295, ecapa_loss=0.0001503, whisper_loss=0.07409, over 23016.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01078, ecapa_loss=0.0001565, whisper_loss=0.09095, over 3885929.27 frames. ], batch size: 89, lr: 3.63e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:38:39,504 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.07 vs. limit=15.0 2024-08-14 03:39:08,347 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.12 vs. limit=15.0 2024-08-14 03:39:19,507 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2462930.0, ans=0.125 2024-08-14 03:39:53,618 INFO [train_multi_KD3.py:1116] (0/4) Epoch 17, batch 14450, loss[loss=0.1148, beats_loss=0.008375, ecapa_loss=0.0001621, whisper_loss=0.1048, over 22107.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01078, ecapa_loss=0.0001577, whisper_loss=0.09133, over 3913244.22 frames. ], batch size: 87, lr: 3.63e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:40:12,376 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.86 vs. limit=15.0 2024-08-14 03:40:12,404 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.28 vs. limit=15.0 2024-08-14 03:40:18,525 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 23 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-14 03:40:25,857 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 28 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-14 03:40:30,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=2463430.0, ans=15.0 2024-08-14 03:40:31,233 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 23 from LS+wenet, 9 from Vox, 22 fro AS 2024-08-14 03:40:32,988 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2463430.0, ans=0.1 2024-08-14 03:40:40,727 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.994e+01 2.412e+01 2.642e+01 2.928e+01 4.301e+01, threshold=5.284e+01, percent-clipped=0.0 2024-08-14 03:40:43,853 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.29 vs. limit=15.0 2024-08-14 03:40:48,892 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-14 03:40:49,069 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 03:40:59,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2463630.0, ans=0.0 2024-08-14 03:41:01,877 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 33 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-14 03:41:02,366 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.32 vs. limit=10.0 2024-08-14 03:41:10,298 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/epoch-17.pt 2024-08-14 03:41:52,893 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 0, loss[loss=0.1036, beats_loss=0.008828, ecapa_loss=0.0001776, whisper_loss=0.09301, over 20617.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.008828, ecapa_loss=0.0001776, whisper_loss=0.09301, over 20617.00 frames. ], batch size: 84, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:41:52,895 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-14 03:42:32,745 INFO [train_multi_KD3.py:1149] (0/4) Epoch 18, validation on ASR_libri: loss=0.2539, beats_loss=0, ecapa_loss=0.0005528, whisper_loss=0.2483, over 922467.00 frames. 2024-08-14 03:42:48,671 INFO [train_multi_KD3.py:1149] (0/4) Epoch 18, validation on SV_voxceleb1: loss=0.004396, beats_loss=0, ecapa_loss=0.0004396, whisper_loss=0, over 939242.00 frames. 2024-08-14 03:44:02,235 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.0885, 3.2543, 3.3823, 3.0903], device='cuda:0') 2024-08-14 03:44:34,601 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.2017, 2.9307, 2.9752, 2.8097], device='cuda:0') 2024-08-14 03:44:37,191 INFO [train_multi_KD3.py:1149] (0/4) Epoch 18, validation on AT_audioset: loss=0.0235, beats_loss=0.0235, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 03:44:37,194 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32601MB 2024-08-14 03:45:11,144 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 15 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-14 03:45:50,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2464020.0, ans=0.0 2024-08-14 03:46:16,814 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2464120.0, ans=0.125 2024-08-14 03:46:19,066 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-14 03:46:40,484 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 50, loss[loss=0.1103, beats_loss=0.01162, ecapa_loss=0.0001575, whisper_loss=0.09716, over 22212.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.009897, ecapa_loss=0.0001647, whisper_loss=0.09094, over 882469.48 frames. ], batch size: 88, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:46:47,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2464220.0, ans=0.125 2024-08-14 03:46:49,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2464220.0, ans=0.0 2024-08-14 03:46:53,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2464220.0, ans=0.125 2024-08-14 03:47:47,106 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.767e+01 2.625e+01 2.934e+01 3.274e+01 1.725e+02, threshold=5.869e+01, percent-clipped=1.0 2024-08-14 03:47:50,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2464520.0, ans=0.2 2024-08-14 03:48:09,148 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.17 vs. limit=22.5 2024-08-14 03:48:31,287 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 100, loss[loss=0.09331, beats_loss=0.009174, ecapa_loss=0.0001414, whisper_loss=0.08273, over 17009.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01001, ecapa_loss=0.0001618, whisper_loss=0.08941, over 1533887.01 frames. ], batch size: 63, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:48:48,702 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 28 from LS+wenet, 10 from Vox, 22 fro AS 2024-08-14 03:48:59,837 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 03:49:11,230 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 20 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-14 03:49:24,695 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 22 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-14 03:49:25,179 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.00 vs. limit=6.0 2024-08-14 03:49:27,137 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.01 vs. limit=15.0 2024-08-14 03:49:37,024 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 20 from LS+wenet, 8 from Vox, 27 fro AS 2024-08-14 03:49:42,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2465020.0, ans=0.125 2024-08-14 03:49:52,962 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 27 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-14 03:50:11,149 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.46 vs. limit=15.0 2024-08-14 03:50:14,148 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 150, loss[loss=0.0742, beats_loss=0.01384, ecapa_loss=0.0001904, whisper_loss=0.05846, over 14964.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.009871, ecapa_loss=0.0001607, whisper_loss=0.09034, over 2017905.18 frames. ], batch size: 63, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:50:16,528 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2465220.0, ans=0.0 2024-08-14 03:50:23,133 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2465220.0, ans=0.1 2024-08-14 03:50:33,671 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2465320.0, ans=0.125 2024-08-14 03:50:39,599 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2465320.0, ans=0.2 2024-08-14 03:50:40,859 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2465320.0, ans=0.125 2024-08-14 03:51:02,761 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.250e+01 2.708e+01 3.001e+01 3.363e+01 1.526e+02, threshold=6.002e+01, percent-clipped=2.0 2024-08-14 03:51:15,747 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-14 03:51:33,951 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 200, loss[loss=0.1026, beats_loss=0.008314, ecapa_loss=0.0001689, whisper_loss=0.09264, over 14740.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.009944, ecapa_loss=0.0001608, whisper_loss=0.09164, over 2453865.02 frames. ], batch size: 56, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:51:37,539 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.22 vs. limit=15.0 2024-08-14 03:51:38,975 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 03:51:52,615 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 25 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-14 03:51:52,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2465820.0, ans=0.025 2024-08-14 03:52:04,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2465920.0, ans=0.0 2024-08-14 03:52:06,804 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 25 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-14 03:52:11,895 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2465920.0, ans=0.0 2024-08-14 03:52:11,943 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2465920.0, ans=0.1 2024-08-14 03:52:13,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2465920.0, ans=0.125 2024-08-14 03:52:25,153 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 22 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-14 03:52:55,051 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 250, loss[loss=0.1057, beats_loss=0.009093, ecapa_loss=0.0001449, whisper_loss=0.09519, over 14574.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01007, ecapa_loss=0.0001607, whisper_loss=0.09277, over 2759781.37 frames. ], batch size: 53, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:52:55,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2466220.0, ans=0.0 2024-08-14 03:53:25,914 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2466320.0, ans=0.125 2024-08-14 03:53:46,347 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.813e+01 2.439e+01 2.692e+01 3.141e+01 8.859e+01, threshold=5.385e+01, percent-clipped=1.0 2024-08-14 03:53:46,459 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 26 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-14 03:53:57,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2466520.0, ans=0.0 2024-08-14 03:54:01,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2466520.0, ans=0.125 2024-08-14 03:54:19,674 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 300, loss[loss=0.1245, beats_loss=0.01076, ecapa_loss=0.0001884, whisper_loss=0.1118, over 21743.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01015, ecapa_loss=0.0001618, whisper_loss=0.09295, over 3002058.80 frames. ], batch size: 88, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:54:21,856 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 20 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-14 03:54:33,454 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 23 from LS+wenet, 13 from Vox, 40 fro AS 2024-08-14 03:54:52,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2466920.0, ans=0.2 2024-08-14 03:55:03,942 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2466920.0, ans=0.125 2024-08-14 03:55:08,262 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2467020.0, ans=0.125 2024-08-14 03:55:21,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2467020.0, ans=0.2 2024-08-14 03:55:27,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2467120.0, ans=0.125 2024-08-14 03:55:39,080 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 350, loss[loss=0.07276, beats_loss=0.01371, ecapa_loss=0.0001674, whisper_loss=0.05737, over 17222.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01038, ecapa_loss=0.0001612, whisper_loss=0.0909, over 3157693.60 frames. ], batch size: 72, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:55:44,749 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 19 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-14 03:56:17,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2467420.0, ans=0.125 2024-08-14 03:56:18,396 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 18 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-14 03:56:24,367 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 24 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-14 03:56:25,340 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.807e+01 2.373e+01 2.538e+01 2.756e+01 1.193e+02, threshold=5.077e+01, percent-clipped=2.0 2024-08-14 03:56:30,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2467520.0, ans=0.1 2024-08-14 03:56:39,175 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 19 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-14 03:56:39,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2467620.0, ans=0.0 2024-08-14 03:56:41,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.whiten.whitening_limit, batch_count=2467620.0, ans=12.0 2024-08-14 03:56:55,327 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 400, loss[loss=0.09366, beats_loss=0.01079, ecapa_loss=0.0001301, whisper_loss=0.08157, over 19322.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01048, ecapa_loss=0.0001591, whisper_loss=0.09035, over 3278956.75 frames. ], batch size: 74, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:57:16,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2467820.0, ans=0.125 2024-08-14 03:57:19,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2467820.0, ans=0.1 2024-08-14 03:57:31,531 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 22 from LS+wenet, 11 from Vox, 21 fro AS 2024-08-14 03:57:47,581 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2468020.0, ans=0.1 2024-08-14 03:57:48,899 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 30 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-14 03:57:56,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2468120.0, ans=0.125 2024-08-14 03:57:58,412 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-14 03:58:08,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2468120.0, ans=0.125 2024-08-14 03:58:11,422 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 450, loss[loss=0.1205, beats_loss=0.007871, ecapa_loss=0.0001669, whisper_loss=0.111, over 13795.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01034, ecapa_loss=0.0001596, whisper_loss=0.09103, over 3376393.07 frames. ], batch size: 53, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:58:11,690 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 14 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-14 03:58:23,319 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-14 03:58:36,358 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.04 vs. limit=15.0 2024-08-14 03:58:44,531 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.73 vs. limit=6.0 2024-08-14 03:58:49,660 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 19 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-14 03:58:57,383 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.720e+01 2.249e+01 2.491e+01 2.829e+01 3.988e+01, threshold=4.982e+01, percent-clipped=0.0 2024-08-14 03:59:00,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2468520.0, ans=0.015 2024-08-14 03:59:01,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2468520.0, ans=0.125 2024-08-14 03:59:01,819 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2468520.0, ans=0.1 2024-08-14 03:59:13,732 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 21 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-14 03:59:27,597 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2468720.0, ans=0.0 2024-08-14 03:59:28,845 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 500, loss[loss=0.09722, beats_loss=0.01266, ecapa_loss=0.0001434, whisper_loss=0.08313, over 14386.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01044, ecapa_loss=0.000158, whisper_loss=0.09049, over 3483110.87 frames. ], batch size: 60, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:59:30,859 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 14 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-14 03:59:46,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2468820.0, ans=0.125 2024-08-14 03:59:54,585 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2468820.0, ans=0.0 2024-08-14 03:59:54,772 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.36 vs. limit=22.5 2024-08-14 03:59:55,910 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 33 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-14 04:00:03,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=2468920.0, ans=10.0 2024-08-14 04:00:09,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2468920.0, ans=0.125 2024-08-14 04:00:31,064 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.26 vs. limit=15.0 2024-08-14 04:00:31,990 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 15 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-14 04:00:35,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2469120.0, ans=0.125 2024-08-14 04:00:42,939 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-14 04:00:45,695 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 550, loss[loss=0.1146, beats_loss=0.01144, ecapa_loss=0.0001557, whisper_loss=0.1016, over 22223.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01053, ecapa_loss=0.0001582, whisper_loss=0.09051, over 3602367.88 frames. ], batch size: 88, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:00:55,439 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 25 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-14 04:01:12,881 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2469320.0, ans=0.125 2024-08-14 04:01:22,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2469420.0, ans=0.0 2024-08-14 04:01:32,134 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.839e+01 2.428e+01 2.764e+01 3.145e+01 1.301e+02, threshold=5.528e+01, percent-clipped=2.0 2024-08-14 04:01:38,502 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-14 04:02:01,579 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 600, loss[loss=0.08965, beats_loss=0.01279, ecapa_loss=0.0001622, whisper_loss=0.07524, over 20872.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01056, ecapa_loss=0.0001578, whisper_loss=0.09019, over 3613563.19 frames. ], batch size: 86, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:02:12,152 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-14 04:02:31,066 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 21 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-14 04:02:31,460 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.181e-01 2024-08-14 04:02:48,888 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2470020.0, ans=0.125 2024-08-14 04:02:54,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2470020.0, ans=0.125 2024-08-14 04:02:57,735 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 24 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-14 04:03:12,007 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.81 vs. limit=15.0 2024-08-14 04:03:15,101 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.19 vs. limit=10.0 2024-08-14 04:03:15,899 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 650, loss[loss=0.1044, beats_loss=0.0127, ecapa_loss=0.0001227, whisper_loss=0.09051, over 23755.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01052, ecapa_loss=0.0001588, whisper_loss=0.09073, over 3681017.84 frames. ], batch size: 90, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:03:51,224 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 29 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-14 04:03:55,969 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 18 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-14 04:04:02,215 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.834e+01 2.415e+01 2.559e+01 3.017e+01 4.730e+01, threshold=5.119e+01, percent-clipped=1.0 2024-08-14 04:04:04,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2470520.0, ans=0.1 2024-08-14 04:04:32,390 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 700, loss[loss=0.09772, beats_loss=0.009558, ecapa_loss=0.0001419, whisper_loss=0.08675, over 14667.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01047, ecapa_loss=0.000159, whisper_loss=0.09074, over 3710503.87 frames. ], batch size: 55, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:04:43,718 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.42 vs. limit=10.0 2024-08-14 04:04:44,459 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 20 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-14 04:04:51,493 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.77 vs. limit=15.0 2024-08-14 04:04:55,035 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 20 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-14 04:05:09,300 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.31 vs. limit=15.0 2024-08-14 04:05:10,547 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2470920.0, ans=0.125 2024-08-14 04:05:19,699 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2471020.0, ans=0.125 2024-08-14 04:05:27,149 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2471020.0, ans=0.0 2024-08-14 04:05:28,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2471020.0, ans=0.125 2024-08-14 04:05:34,944 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.60 vs. limit=12.0 2024-08-14 04:05:40,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2471120.0, ans=0.0 2024-08-14 04:05:47,329 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 750, loss[loss=0.09463, beats_loss=0.01076, ecapa_loss=0.0001576, whisper_loss=0.08229, over 14797.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01049, ecapa_loss=0.0001583, whisper_loss=0.09066, over 3723964.31 frames. ], batch size: 59, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:05:55,297 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 12 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-14 04:05:56,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2471220.0, ans=0.0 2024-08-14 04:06:14,423 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-14 04:06:16,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2471420.0, ans=0.125 2024-08-14 04:06:31,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2471520.0, ans=0.07 2024-08-14 04:06:31,911 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.864e+01 2.269e+01 2.490e+01 2.820e+01 4.318e+01, threshold=4.979e+01, percent-clipped=0.0 2024-08-14 04:06:32,529 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2471520.0, ans=0.125 2024-08-14 04:06:34,206 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 22 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-14 04:06:42,698 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 31 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-14 04:06:52,557 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 04:06:53,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2471620.0, ans=0.125 2024-08-14 04:07:01,960 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 800, loss[loss=0.09952, beats_loss=0.01159, ecapa_loss=0.00016, whisper_loss=0.08633, over 22759.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01062, ecapa_loss=0.0001575, whisper_loss=0.08979, over 3756008.01 frames. ], batch size: 93, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:07:07,251 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 18 from LS+wenet, 26 from Vox, 48 fro AS 2024-08-14 04:07:10,578 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2471720.0, ans=0.1 2024-08-14 04:07:10,856 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.91 vs. limit=12.0 2024-08-14 04:07:13,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2471720.0, ans=0.125 2024-08-14 04:07:24,347 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 32 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-14 04:07:33,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2471920.0, ans=0.2 2024-08-14 04:07:34,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2471920.0, ans=0.125 2024-08-14 04:07:40,591 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2471920.0, ans=0.125 2024-08-14 04:07:40,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2471920.0, ans=0.0 2024-08-14 04:07:44,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2471920.0, ans=0.125 2024-08-14 04:08:17,675 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 850, loss[loss=0.09204, beats_loss=0.0113, ecapa_loss=0.0001458, whisper_loss=0.07928, over 13219.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01052, ecapa_loss=0.0001563, whisper_loss=0.09012, over 3778116.68 frames. ], batch size: 54, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:08:20,316 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.67 vs. limit=6.0 2024-08-14 04:08:20,850 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 20 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-14 04:08:27,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2472220.0, ans=0.0 2024-08-14 04:08:27,131 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2472220.0, ans=0.0 2024-08-14 04:08:45,438 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 04:09:01,300 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.448e+01 2.671e+01 3.055e+01 4.887e+01, threshold=5.342e+01, percent-clipped=0.0 2024-08-14 04:09:04,616 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2472520.0, ans=0.125 2024-08-14 04:09:06,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2472520.0, ans=0.125 2024-08-14 04:09:14,550 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-14 04:09:16,461 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2472620.0, ans=0.1 2024-08-14 04:09:33,194 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 900, loss[loss=0.09165, beats_loss=0.01079, ecapa_loss=0.0001532, whisper_loss=0.07932, over 13680.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01052, ecapa_loss=0.0001561, whisper_loss=0.09029, over 3785474.78 frames. ], batch size: 53, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:09:51,602 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 22 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-14 04:10:15,728 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 22 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-14 04:10:20,229 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-14 04:10:25,050 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 20 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-14 04:10:29,472 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 23 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-14 04:10:34,660 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2473120.0, ans=0.0 2024-08-14 04:10:40,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2473120.0, ans=0.0 2024-08-14 04:10:46,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2473120.0, ans=0.2 2024-08-14 04:10:50,827 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 950, loss[loss=0.109, beats_loss=0.01041, ecapa_loss=0.0001458, whisper_loss=0.09717, over 23377.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01061, ecapa_loss=0.0001548, whisper_loss=0.09015, over 3786972.28 frames. ], batch size: 90, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:11:00,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2473220.0, ans=0.2 2024-08-14 04:11:17,204 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2473320.0, ans=0.125 2024-08-14 04:11:20,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2473420.0, ans=0.2 2024-08-14 04:11:35,314 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.848e+01 2.279e+01 2.588e+01 3.016e+01 4.728e+01, threshold=5.177e+01, percent-clipped=0.0 2024-08-14 04:11:47,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2473520.0, ans=0.125 2024-08-14 04:11:58,909 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 18 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-14 04:12:04,798 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 1000, loss[loss=0.1216, beats_loss=0.009971, ecapa_loss=0.0001763, whisper_loss=0.1098, over 15604.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01057, ecapa_loss=0.0001551, whisper_loss=0.09004, over 3758472.55 frames. ], batch size: 61, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:12:19,132 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=7.523e-01 2024-08-14 04:12:25,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2473820.0, ans=0.1 2024-08-14 04:12:41,643 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 16 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-14 04:12:54,337 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2474020.0, ans=0.125 2024-08-14 04:12:56,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2474020.0, ans=0.125 2024-08-14 04:13:02,350 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2474020.0, ans=0.125 2024-08-14 04:13:03,356 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 17 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-14 04:13:21,862 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 1050, loss[loss=0.08804, beats_loss=0.01178, ecapa_loss=0.0001274, whisper_loss=0.07499, over 22045.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01057, ecapa_loss=0.0001533, whisper_loss=0.09015, over 3777556.40 frames. ], batch size: 84, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:13:25,079 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 21 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-14 04:13:38,127 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2474320.0, ans=0.2 2024-08-14 04:13:46,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2474320.0, ans=0.1 2024-08-14 04:14:08,182 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.822e+01 2.402e+01 2.807e+01 3.075e+01 7.896e+01, threshold=5.614e+01, percent-clipped=1.0 2024-08-14 04:14:13,075 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 21 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-14 04:14:31,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2474620.0, ans=0.125 2024-08-14 04:14:38,501 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 1100, loss[loss=0.1021, beats_loss=0.01198, ecapa_loss=0.0001284, whisper_loss=0.0888, over 21211.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01051, ecapa_loss=0.0001533, whisper_loss=0.09055, over 3767654.15 frames. ], batch size: 83, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:15:05,299 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 27 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-14 04:15:10,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2474920.0, ans=0.0 2024-08-14 04:15:17,549 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 18 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-14 04:15:22,039 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-14 04:15:22,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2475020.0, ans=0.2 2024-08-14 04:15:29,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2475020.0, ans=0.125 2024-08-14 04:15:50,573 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.26 vs. limit=22.5 2024-08-14 04:15:52,709 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 1150, loss[loss=0.09253, beats_loss=0.01058, ecapa_loss=0.000141, whisper_loss=0.08054, over 18582.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01048, ecapa_loss=0.0001538, whisper_loss=0.09061, over 3757295.28 frames. ], batch size: 71, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:15:52,985 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 17 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-14 04:15:55,929 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-14 04:16:03,926 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2475220.0, ans=0.125 2024-08-14 04:16:14,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2475320.0, ans=0.125 2024-08-14 04:16:30,309 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 23 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-14 04:16:38,146 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.737e+01 2.338e+01 2.593e+01 2.937e+01 5.602e+01, threshold=5.186e+01, percent-clipped=0.0 2024-08-14 04:16:38,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2475520.0, ans=0.125 2024-08-14 04:16:39,834 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 25 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-14 04:16:49,039 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2475520.0, ans=0.035 2024-08-14 04:16:54,984 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2475620.0, ans=0.125 2024-08-14 04:17:07,623 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 1200, loss[loss=0.1029, beats_loss=0.01085, ecapa_loss=0.0001534, whisper_loss=0.09053, over 23021.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01046, ecapa_loss=0.000155, whisper_loss=0.09081, over 3747167.14 frames. ], batch size: 94, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:17:13,637 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 16 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-14 04:17:25,390 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.74 vs. limit=22.5 2024-08-14 04:17:35,406 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 27 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-14 04:17:47,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2475920.0, ans=0.125 2024-08-14 04:17:54,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2476020.0, ans=0.5 2024-08-14 04:17:59,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2476020.0, ans=0.0 2024-08-14 04:18:14,850 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.53 vs. limit=15.0 2024-08-14 04:18:15,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2476120.0, ans=0.0 2024-08-14 04:18:18,408 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 30 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-14 04:18:21,511 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 1250, loss[loss=0.09637, beats_loss=0.009718, ecapa_loss=0.0001698, whisper_loss=0.08496, over 17813.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0105, ecapa_loss=0.0001546, whisper_loss=0.09073, over 3749579.56 frames. ], batch size: 73, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:18:31,467 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=16.39 vs. limit=15.0 2024-08-14 04:18:36,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2476320.0, ans=0.0 2024-08-14 04:18:43,367 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-14 04:18:59,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2476420.0, ans=0.125 2024-08-14 04:19:07,200 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.365e+01 2.557e+01 2.889e+01 4.348e+01, threshold=5.114e+01, percent-clipped=0.0 2024-08-14 04:19:15,154 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 35 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-14 04:19:21,647 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.14 vs. limit=10.0 2024-08-14 04:19:25,001 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=2476620.0, ans=15.0 2024-08-14 04:19:38,286 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 1300, loss[loss=0.1226, beats_loss=0.00914, ecapa_loss=0.0001805, whisper_loss=0.1117, over 17992.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01057, ecapa_loss=0.0001538, whisper_loss=0.09096, over 3776293.57 frames. ], batch size: 72, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:19:48,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2476720.0, ans=0.1 2024-08-14 04:19:56,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2476820.0, ans=0.125 2024-08-14 04:20:02,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2476820.0, ans=0.125 2024-08-14 04:20:12,480 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 30 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-14 04:20:16,799 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-14 04:20:21,433 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 21 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-14 04:20:27,125 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 18 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-14 04:20:33,700 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2477020.0, ans=0.0 2024-08-14 04:20:33,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2477020.0, ans=0.0 2024-08-14 04:20:36,457 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 22 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-14 04:20:46,766 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2477120.0, ans=0.0 2024-08-14 04:20:55,396 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 1350, loss[loss=0.107, beats_loss=0.008377, ecapa_loss=0.0001942, whisper_loss=0.09672, over 16998.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0106, ecapa_loss=0.0001532, whisper_loss=0.09079, over 3773913.15 frames. ], batch size: 69, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:21:00,218 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 22 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-14 04:21:03,152 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-14 04:21:06,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2477220.0, ans=0.1 2024-08-14 04:21:08,442 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-14 04:21:12,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2477320.0, ans=0.125 2024-08-14 04:21:14,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2477320.0, ans=0.0 2024-08-14 04:21:18,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2477320.0, ans=0.125 2024-08-14 04:21:29,244 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2477420.0, ans=0.0 2024-08-14 04:21:37,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2477420.0, ans=0.0 2024-08-14 04:21:41,215 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.297e+01 2.516e+01 2.764e+01 4.025e+01, threshold=5.033e+01, percent-clipped=0.0 2024-08-14 04:22:11,511 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 1400, loss[loss=0.07898, beats_loss=0.0121, ecapa_loss=0.000204, whisper_loss=0.06484, over 19869.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01061, ecapa_loss=0.0001543, whisper_loss=0.09, over 3767407.20 frames. ], batch size: 84, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:22:15,030 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 23 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 04:22:23,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2477720.0, ans=0.2 2024-08-14 04:23:14,518 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.01 vs. limit=6.0 2024-08-14 04:24:06,586 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 1450, loss[loss=0.1019, beats_loss=0.009404, ecapa_loss=0.000174, whisper_loss=0.09071, over 15285.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01051, ecapa_loss=0.0001539, whisper_loss=0.09007, over 3755402.48 frames. ], batch size: 61, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:24:10,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2478220.0, ans=0.0 2024-08-14 04:24:27,526 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 30 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-14 04:24:29,232 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 25 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-14 04:24:32,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2478320.0, ans=0.0 2024-08-14 04:24:36,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2478320.0, ans=0.2 2024-08-14 04:24:55,516 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.896e+01 2.310e+01 2.554e+01 2.920e+01 4.164e+01, threshold=5.108e+01, percent-clipped=0.0 2024-08-14 04:24:56,383 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2478520.0, ans=0.1 2024-08-14 04:25:12,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2478620.0, ans=0.125 2024-08-14 04:25:20,806 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-14 04:25:29,164 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 1500, loss[loss=0.0937, beats_loss=0.01174, ecapa_loss=0.0001166, whisper_loss=0.08079, over 23095.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0106, ecapa_loss=0.0001538, whisper_loss=0.09025, over 3752798.50 frames. ], batch size: 89, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:25:33,314 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 28 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-14 04:25:37,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2478720.0, ans=0.0 2024-08-14 04:25:42,676 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-14 04:25:55,437 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.98 vs. limit=22.5 2024-08-14 04:25:56,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2478820.0, ans=0.1 2024-08-14 04:25:56,672 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=2.518e-03 2024-08-14 04:26:11,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2478920.0, ans=0.1 2024-08-14 04:26:16,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2478920.0, ans=0.09899494936611666 2024-08-14 04:26:19,207 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 04:26:22,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2479020.0, ans=0.2 2024-08-14 04:26:29,507 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2479020.0, ans=0.125 2024-08-14 04:26:31,228 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.25 vs. limit=15.0 2024-08-14 04:26:50,168 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 1550, loss[loss=0.08738, beats_loss=0.01088, ecapa_loss=0.000173, whisper_loss=0.07477, over 21090.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01054, ecapa_loss=0.0001529, whisper_loss=0.09048, over 3746381.38 frames. ], batch size: 88, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:26:52,241 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 27 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-14 04:26:52,691 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.00 vs. limit=15.0 2024-08-14 04:27:10,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2479320.0, ans=0.125 2024-08-14 04:27:13,773 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.87 vs. limit=15.0 2024-08-14 04:27:39,077 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.798e+01 2.208e+01 2.513e+01 2.710e+01 4.785e+01, threshold=5.026e+01, percent-clipped=0.0 2024-08-14 04:27:59,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2479620.0, ans=0.2 2024-08-14 04:28:00,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2479620.0, ans=0.1 2024-08-14 04:28:03,232 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 22 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-14 04:28:03,640 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=2479620.0, ans=10.0 2024-08-14 04:28:06,779 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2479620.0, ans=0.2 2024-08-14 04:28:11,025 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 1600, loss[loss=0.09147, beats_loss=0.009184, ecapa_loss=0.0001869, whisper_loss=0.08041, over 17371.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01051, ecapa_loss=0.0001522, whisper_loss=0.09057, over 3764771.21 frames. ], batch size: 72, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:28:13,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2479720.0, ans=0.125 2024-08-14 04:28:22,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2479720.0, ans=0.125 2024-08-14 04:28:29,735 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2479820.0, ans=0.125 2024-08-14 04:28:31,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2479820.0, ans=0.125 2024-08-14 04:28:36,321 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.37 vs. limit=22.5 2024-08-14 04:28:37,372 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 19 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-14 04:28:40,685 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-14 04:28:53,821 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-248000.pt 2024-08-14 04:29:02,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2480020.0, ans=0.0 2024-08-14 04:29:04,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2480020.0, ans=0.0 2024-08-14 04:29:06,600 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2480020.0, ans=0.0 2024-08-14 04:29:08,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2480020.0, ans=0.125 2024-08-14 04:29:11,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2480020.0, ans=0.125 2024-08-14 04:29:13,240 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.33 vs. limit=22.5 2024-08-14 04:29:18,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2480120.0, ans=0.125 2024-08-14 04:29:19,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2480120.0, ans=0.0 2024-08-14 04:29:19,958 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.37 vs. limit=22.5 2024-08-14 04:29:25,937 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2480120.0, ans=0.05 2024-08-14 04:29:29,136 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-14 04:29:31,730 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 1650, loss[loss=0.09246, beats_loss=0.01347, ecapa_loss=0.0001224, whisper_loss=0.07777, over 18468.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01053, ecapa_loss=0.0001526, whisper_loss=0.0911, over 3794080.19 frames. ], batch size: 73, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:29:41,622 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 04:29:42,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2480220.0, ans=0.0 2024-08-14 04:29:45,592 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2480320.0, ans=0.125 2024-08-14 04:29:52,892 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 14 from Vox, 44 fro AS 2024-08-14 04:29:56,613 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.94 vs. limit=15.0 2024-08-14 04:30:12,657 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2480420.0, ans=0.1 2024-08-14 04:30:17,119 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.009e+01 2.355e+01 2.575e+01 2.902e+01 4.492e+01, threshold=5.151e+01, percent-clipped=0.0 2024-08-14 04:30:46,911 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 1700, loss[loss=0.09783, beats_loss=0.006511, ecapa_loss=0.0001788, whisper_loss=0.08953, over 17028.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01053, ecapa_loss=0.0001535, whisper_loss=0.09042, over 3771869.01 frames. ], batch size: 66, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:30:47,608 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2480720.0, ans=0.2 2024-08-14 04:30:50,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2480720.0, ans=0.1 2024-08-14 04:30:53,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2480720.0, ans=0.125 2024-08-14 04:30:56,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2480720.0, ans=0.125 2024-08-14 04:30:59,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2480720.0, ans=0.125 2024-08-14 04:31:08,459 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-14 04:31:30,184 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2481020.0, ans=0.0 2024-08-14 04:31:42,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2481020.0, ans=0.1 2024-08-14 04:32:00,346 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 1750, loss[loss=0.06597, beats_loss=0.01291, ecapa_loss=0.0001081, whisper_loss=0.05198, over 19814.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01059, ecapa_loss=0.0001533, whisper_loss=0.09024, over 3800881.98 frames. ], batch size: 78, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:32:06,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=2481220.0, ans=10.0 2024-08-14 04:32:20,349 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2481320.0, ans=0.125 2024-08-14 04:32:24,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2481320.0, ans=0.125 2024-08-14 04:32:30,855 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.09 vs. limit=6.0 2024-08-14 04:32:44,294 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.817e+01 2.328e+01 2.583e+01 3.000e+01 1.080e+02, threshold=5.167e+01, percent-clipped=1.0 2024-08-14 04:33:06,344 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-14 04:33:13,295 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 1800, loss[loss=0.1033, beats_loss=0.007866, ecapa_loss=0.0001943, whisper_loss=0.0935, over 21511.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01051, ecapa_loss=0.0001547, whisper_loss=0.09028, over 3813128.00 frames. ], batch size: 86, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:33:13,904 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2481720.0, ans=0.2 2024-08-14 04:33:18,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2481720.0, ans=0.125 2024-08-14 04:33:18,991 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.38 vs. limit=10.0 2024-08-14 04:33:25,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2481720.0, ans=0.0 2024-08-14 04:33:37,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2481820.0, ans=0.0 2024-08-14 04:33:37,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2481820.0, ans=0.125 2024-08-14 04:34:27,618 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 1850, loss[loss=0.09176, beats_loss=0.0107, ecapa_loss=0.0001672, whisper_loss=0.07938, over 15788.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01042, ecapa_loss=0.0001537, whisper_loss=0.09103, over 3779805.19 frames. ], batch size: 66, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:34:37,278 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2482220.0, ans=0.1 2024-08-14 04:34:38,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2482220.0, ans=0.1 2024-08-14 04:35:13,756 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.002e+01 2.339e+01 2.610e+01 2.958e+01 9.834e+01, threshold=5.220e+01, percent-clipped=1.0 2024-08-14 04:35:23,190 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.26 vs. limit=6.0 2024-08-14 04:35:32,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2482620.0, ans=0.125 2024-08-14 04:35:34,907 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.64 vs. limit=15.0 2024-08-14 04:35:36,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2482620.0, ans=0.125 2024-08-14 04:35:38,639 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.40 vs. limit=15.0 2024-08-14 04:35:40,835 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-14 04:35:44,957 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 1900, loss[loss=0.09076, beats_loss=0.008834, ecapa_loss=0.0001696, whisper_loss=0.08023, over 18878.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01047, ecapa_loss=0.0001538, whisper_loss=0.09085, over 3788924.35 frames. ], batch size: 74, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:35:53,940 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 18 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-14 04:36:08,554 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.10 vs. limit=10.0 2024-08-14 04:36:15,470 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2482920.0, ans=0.2 2024-08-14 04:36:18,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2482920.0, ans=0.125 2024-08-14 04:36:26,745 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2482920.0, ans=0.035 2024-08-14 04:36:29,511 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2483020.0, ans=10.0 2024-08-14 04:36:29,824 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.16 vs. limit=12.0 2024-08-14 04:36:30,645 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 39 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-14 04:36:43,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2483020.0, ans=0.0 2024-08-14 04:36:45,235 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=2483120.0, ans=15.0 2024-08-14 04:36:48,044 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2483120.0, ans=0.09899494936611666 2024-08-14 04:36:58,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2483120.0, ans=0.0 2024-08-14 04:37:00,039 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-14 04:37:01,342 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 1950, loss[loss=0.1027, beats_loss=0.01095, ecapa_loss=0.0001929, whisper_loss=0.08981, over 21226.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01048, ecapa_loss=0.0001543, whisper_loss=0.09075, over 3806621.98 frames. ], batch size: 92, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:37:08,551 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.93 vs. limit=10.0 2024-08-14 04:37:09,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2483220.0, ans=0.125 2024-08-14 04:37:16,512 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2483320.0, ans=0.2 2024-08-14 04:37:18,135 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2483320.0, ans=0.1 2024-08-14 04:37:21,174 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2483320.0, ans=0.1 2024-08-14 04:37:39,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2483420.0, ans=0.0 2024-08-14 04:37:46,521 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.923e+01 2.351e+01 2.542e+01 2.768e+01 3.987e+01, threshold=5.084e+01, percent-clipped=0.0 2024-08-14 04:38:16,661 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 2000, loss[loss=0.09404, beats_loss=0.01144, ecapa_loss=0.0001589, whisper_loss=0.08101, over 19995.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01054, ecapa_loss=0.0001539, whisper_loss=0.09077, over 3792754.37 frames. ], batch size: 84, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:38:28,205 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2483720.0, ans=0.1 2024-08-14 04:38:45,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2483820.0, ans=0.125 2024-08-14 04:38:52,566 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 25 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-14 04:39:14,279 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2484020.0, ans=0.2 2024-08-14 04:39:37,818 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 2050, loss[loss=0.09499, beats_loss=0.01068, ecapa_loss=0.0001846, whisper_loss=0.08246, over 16512.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01055, ecapa_loss=0.000154, whisper_loss=0.09054, over 3843182.21 frames. ], batch size: 70, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:39:40,482 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2484220.0, ans=0.125 2024-08-14 04:39:59,702 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.10 vs. limit=15.0 2024-08-14 04:40:01,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2484320.0, ans=0.07 2024-08-14 04:40:03,916 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-14 04:40:16,358 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2484420.0, ans=0.1 2024-08-14 04:40:20,890 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 23 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-14 04:40:21,223 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2484420.0, ans=0.0 2024-08-14 04:40:23,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2484420.0, ans=0.125 2024-08-14 04:40:25,629 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.087e+01 2.326e+01 2.679e+01 3.072e+01 5.038e+01, threshold=5.357e+01, percent-clipped=0.0 2024-08-14 04:40:30,444 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-14 04:40:36,505 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 16 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-14 04:40:41,096 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-14 04:40:57,155 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 2100, loss[loss=0.0899, beats_loss=0.009821, ecapa_loss=0.0001503, whisper_loss=0.07857, over 14504.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01061, ecapa_loss=0.000154, whisper_loss=0.0908, over 3827583.68 frames. ], batch size: 53, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:41:02,961 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 24 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-14 04:41:13,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2484820.0, ans=0.0 2024-08-14 04:41:16,021 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.11 vs. limit=22.5 2024-08-14 04:41:18,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2484820.0, ans=0.0 2024-08-14 04:41:20,686 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.60 vs. limit=10.0 2024-08-14 04:41:25,077 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-14 04:42:14,454 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2485220.0, ans=0.2 2024-08-14 04:42:15,280 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 2150, loss[loss=0.09214, beats_loss=0.01153, ecapa_loss=0.0001383, whisper_loss=0.07923, over 14583.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01062, ecapa_loss=0.000154, whisper_loss=0.09077, over 3797498.86 frames. ], batch size: 57, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:42:49,192 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 23 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-14 04:43:04,037 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.306e+01 2.493e+01 2.947e+01 5.632e+01, threshold=4.986e+01, percent-clipped=1.0 2024-08-14 04:43:26,350 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.94 vs. limit=15.0 2024-08-14 04:43:35,085 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 2200, loss[loss=0.1198, beats_loss=0.008355, ecapa_loss=0.0002089, whisper_loss=0.1093, over 23211.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01067, ecapa_loss=0.000154, whisper_loss=0.09037, over 3753337.67 frames. ], batch size: 94, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:43:44,766 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2485720.0, ans=0.0 2024-08-14 04:43:45,071 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.56 vs. limit=22.5 2024-08-14 04:44:03,988 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2485820.0, ans=0.2 2024-08-14 04:44:10,506 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 26 from LS+wenet, 30 from Vox, 26 fro AS 2024-08-14 04:44:25,093 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.66 vs. limit=15.0 2024-08-14 04:44:35,790 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-14 04:44:43,913 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 16 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-14 04:44:54,448 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 2250, loss[loss=0.1155, beats_loss=0.01087, ecapa_loss=0.0001611, whisper_loss=0.103, over 22361.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01065, ecapa_loss=0.0001553, whisper_loss=0.09085, over 3766070.44 frames. ], batch size: 93, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:45:14,433 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2486320.0, ans=0.125 2024-08-14 04:45:24,985 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 27 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-14 04:45:42,494 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.982e+01 2.436e+01 2.743e+01 3.250e+01 7.629e+01, threshold=5.485e+01, percent-clipped=1.0 2024-08-14 04:45:44,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2486520.0, ans=0.125 2024-08-14 04:45:48,246 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.37 vs. limit=15.0 2024-08-14 04:46:05,151 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 29 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-14 04:46:15,007 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 2300, loss[loss=0.08227, beats_loss=0.01151, ecapa_loss=0.0001477, whisper_loss=0.06929, over 12988.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01069, ecapa_loss=0.0001563, whisper_loss=0.09088, over 3807379.26 frames. ], batch size: 53, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:46:23,770 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.47 vs. limit=22.5 2024-08-14 04:46:34,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2486820.0, ans=0.125 2024-08-14 04:46:39,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2486820.0, ans=0.125 2024-08-14 04:46:50,869 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2486920.0, ans=0.125 2024-08-14 04:47:16,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2487020.0, ans=0.0 2024-08-14 04:47:24,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2487120.0, ans=0.1 2024-08-14 04:47:25,145 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-14 04:47:34,599 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 2350, loss[loss=0.111, beats_loss=0.01002, ecapa_loss=0.0001602, whisper_loss=0.09938, over 20267.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0107, ecapa_loss=0.0001571, whisper_loss=0.09068, over 3814887.54 frames. ], batch size: 79, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:47:38,265 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2487220.0, ans=0.2 2024-08-14 04:47:58,660 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2487320.0, ans=0.1 2024-08-14 04:48:04,652 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 17 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-14 04:48:19,480 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 15 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-14 04:48:21,032 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 17 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-14 04:48:22,140 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.598e+01 2.359e+01 2.626e+01 3.027e+01 4.535e+02, threshold=5.251e+01, percent-clipped=2.0 2024-08-14 04:48:37,984 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2487620.0, ans=0.1 2024-08-14 04:48:39,435 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 21 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-14 04:48:47,897 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 30 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-14 04:48:51,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=2487620.0, ans=6.0 2024-08-14 04:48:51,500 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.89 vs. limit=15.0 2024-08-14 04:48:55,360 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 2400, loss[loss=0.1043, beats_loss=0.0112, ecapa_loss=0.0001511, whisper_loss=0.09157, over 20174.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01066, ecapa_loss=0.0001577, whisper_loss=0.09126, over 3825564.31 frames. ], batch size: 79, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:49:27,922 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2487920.0, ans=0.1 2024-08-14 04:49:30,665 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-14 04:49:45,199 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.89 vs. limit=15.0 2024-08-14 04:50:14,018 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 2450, loss[loss=0.1004, beats_loss=0.01147, ecapa_loss=0.0001564, whisper_loss=0.08735, over 14622.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01072, ecapa_loss=0.0001588, whisper_loss=0.09053, over 3821596.60 frames. ], batch size: 58, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:50:22,043 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2488220.0, ans=0.1 2024-08-14 04:50:28,023 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 22 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-14 04:50:57,972 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 35 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-14 04:50:58,243 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2488420.0, ans=0.125 2024-08-14 04:51:00,798 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.942e+01 2.341e+01 2.556e+01 2.864e+01 5.420e+01, threshold=5.112e+01, percent-clipped=1.0 2024-08-14 04:51:01,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2488520.0, ans=0.0 2024-08-14 04:51:04,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2488520.0, ans=0.0 2024-08-14 04:51:13,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2488520.0, ans=0.0 2024-08-14 04:51:18,591 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2488620.0, ans=0.95 2024-08-14 04:51:32,007 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 2500, loss[loss=0.1192, beats_loss=0.01073, ecapa_loss=0.0001553, whisper_loss=0.1069, over 23294.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01069, ecapa_loss=0.0001579, whisper_loss=0.09095, over 3817265.46 frames. ], batch size: 93, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:51:36,634 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.77 vs. limit=12.0 2024-08-14 04:52:31,889 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 14 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-14 04:52:35,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2489120.0, ans=0.125 2024-08-14 04:52:52,294 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2489220.0, ans=0.125 2024-08-14 04:52:53,020 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 2550, loss[loss=0.1066, beats_loss=0.009184, ecapa_loss=0.0001667, whisper_loss=0.09579, over 21211.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01061, ecapa_loss=0.0001575, whisper_loss=0.09197, over 3826006.19 frames. ], batch size: 83, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:52:54,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2489220.0, ans=0.125 2024-08-14 04:52:54,549 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2489220.0, ans=0.2 2024-08-14 04:52:59,121 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2489220.0, ans=0.125 2024-08-14 04:53:04,418 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 26 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-14 04:53:17,331 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 18 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-14 04:53:32,546 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-14 04:53:35,243 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-14 04:53:39,493 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-14 04:53:43,374 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.033e+01 2.452e+01 2.668e+01 3.104e+01 5.723e+01, threshold=5.337e+01, percent-clipped=1.0 2024-08-14 04:53:45,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2489520.0, ans=0.125 2024-08-14 04:53:45,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2489520.0, ans=0.125 2024-08-14 04:54:00,564 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.36 vs. limit=6.0 2024-08-14 04:54:14,166 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 2600, loss[loss=0.1034, beats_loss=0.008748, ecapa_loss=0.0001692, whisper_loss=0.093, over 16948.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01059, ecapa_loss=0.0001563, whisper_loss=0.09253, over 3861828.18 frames. ], batch size: 67, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:54:17,127 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 19 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-14 04:54:44,333 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 18 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-14 04:54:48,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2489820.0, ans=0.1 2024-08-14 04:55:00,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2489920.0, ans=0.125 2024-08-14 04:55:14,005 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.97 vs. limit=10.0 2024-08-14 04:55:15,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2490020.0, ans=0.125 2024-08-14 04:55:35,777 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2490120.0, ans=0.125 2024-08-14 04:55:51,156 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 2650, loss[loss=0.1092, beats_loss=0.008277, ecapa_loss=0.0001615, whisper_loss=0.09934, over 18116.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01061, ecapa_loss=0.0001558, whisper_loss=0.09205, over 3860246.77 frames. ], batch size: 69, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:55:53,182 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2490220.0, ans=0.2 2024-08-14 04:56:45,320 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.782e+01 2.391e+01 2.607e+01 2.986e+01 4.430e+01, threshold=5.214e+01, percent-clipped=0.0 2024-08-14 04:56:47,479 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2490520.0, ans=0.125 2024-08-14 04:56:50,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2490520.0, ans=0.125 2024-08-14 04:56:56,473 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2490520.0, ans=0.1 2024-08-14 04:56:57,813 WARNING [optim.py:496] (0/4) Scaling gradients by 0.0513666495680809, model_norm_threshold=52.13920593261719 2024-08-14 04:56:58,015 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.11, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.106e+05, grad_sumsq=1.106e+05, orig_rms_sq=1.000e+00 2024-08-14 04:57:26,891 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 2700, loss[loss=0.09758, beats_loss=0.01278, ecapa_loss=0.0001315, whisper_loss=0.08349, over 21660.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01067, ecapa_loss=0.0001557, whisper_loss=0.09192, over 3878906.60 frames. ], batch size: 86, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:57:34,920 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 14 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-14 04:57:36,392 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 15 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-14 04:58:26,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2490920.0, ans=0.125 2024-08-14 04:58:26,926 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2490920.0, ans=0.0 2024-08-14 04:58:41,256 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-14 04:59:13,605 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-14 04:59:26,954 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 2750, loss[loss=0.08002, beats_loss=0.01114, ecapa_loss=0.0001079, whisper_loss=0.0678, over 15025.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01073, ecapa_loss=0.0001551, whisper_loss=0.09185, over 3884160.03 frames. ], batch size: 58, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:59:57,037 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 24 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-14 05:00:00,552 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.37 vs. limit=22.5 2024-08-14 05:00:37,917 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.424e+01 2.607e+01 2.892e+01 1.015e+03, threshold=5.215e+01, percent-clipped=3.0 2024-08-14 05:01:04,398 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 27 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-14 05:01:27,206 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 2800, loss[loss=0.08249, beats_loss=0.01171, ecapa_loss=0.000145, whisper_loss=0.06933, over 15764.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01073, ecapa_loss=0.0001551, whisper_loss=0.09166, over 3855358.78 frames. ], batch size: 61, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:01:30,206 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 29 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-14 05:01:41,774 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.82 vs. limit=22.5 2024-08-14 05:02:46,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2492020.0, ans=0.2 2024-08-14 05:03:20,970 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 2850, loss[loss=0.1045, beats_loss=0.0127, ecapa_loss=0.0001477, whisper_loss=0.09031, over 15899.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01076, ecapa_loss=0.0001548, whisper_loss=0.09107, over 3816473.57 frames. ], batch size: 63, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:03:42,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2492320.0, ans=0.0 2024-08-14 05:03:57,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2492420.0, ans=0.5 2024-08-14 05:04:00,704 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 23 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-14 05:04:03,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2492420.0, ans=0.125 2024-08-14 05:04:06,094 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.786e+01 2.327e+01 2.505e+01 2.806e+01 7.430e+01, threshold=5.010e+01, percent-clipped=1.0 2024-08-14 05:04:37,426 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 2900, loss[loss=0.08709, beats_loss=0.01235, ecapa_loss=0.0001669, whisper_loss=0.07308, over 20836.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01078, ecapa_loss=0.0001565, whisper_loss=0.09105, over 3844447.65 frames. ], batch size: 90, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:04:45,517 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 26 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-14 05:04:50,493 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-14 05:05:05,585 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 23 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-14 05:05:05,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2492820.0, ans=0.125 2024-08-14 05:05:09,108 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.33 vs. limit=15.0 2024-08-14 05:05:24,929 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-14 05:05:52,700 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 2950, loss[loss=0.09801, beats_loss=0.01125, ecapa_loss=0.0001247, whisper_loss=0.08551, over 20142.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01085, ecapa_loss=0.0001568, whisper_loss=0.09085, over 3853404.38 frames. ], batch size: 79, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:05:55,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2493220.0, ans=0.125 2024-08-14 05:06:05,937 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.75 vs. limit=6.0 2024-08-14 05:06:27,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2493420.0, ans=0.5 2024-08-14 05:06:29,400 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2493420.0, ans=0.1 2024-08-14 05:06:31,494 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.05 vs. limit=10.0 2024-08-14 05:06:34,553 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.898e+01 2.417e+01 2.624e+01 2.963e+01 8.640e+01, threshold=5.248e+01, percent-clipped=1.0 2024-08-14 05:06:35,468 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.53 vs. limit=15.0 2024-08-14 05:06:36,930 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.61 vs. limit=15.0 2024-08-14 05:06:42,767 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.28 vs. limit=15.0 2024-08-14 05:06:43,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2493520.0, ans=0.125 2024-08-14 05:06:57,280 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=26.08 vs. limit=22.5 2024-08-14 05:07:03,639 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 3000, loss[loss=0.07352, beats_loss=0.01213, ecapa_loss=0.0001574, whisper_loss=0.05982, over 14552.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01082, ecapa_loss=0.0001571, whisper_loss=0.09102, over 3898633.43 frames. ], batch size: 60, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:07:03,640 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-14 05:07:44,608 INFO [train_multi_KD3.py:1149] (0/4) Epoch 18, validation on ASR_libri: loss=0.2518, beats_loss=0, ecapa_loss=0.0005463, whisper_loss=0.2464, over 922467.00 frames. 2024-08-14 05:08:00,242 INFO [train_multi_KD3.py:1149] (0/4) Epoch 18, validation on SV_voxceleb1: loss=0.004304, beats_loss=0, ecapa_loss=0.0004304, whisper_loss=0, over 939242.00 frames. 2024-08-14 05:10:04,625 INFO [train_multi_KD3.py:1149] (0/4) Epoch 18, validation on AT_audioset: loss=0.02354, beats_loss=0.02354, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 05:10:04,630 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32601MB 2024-08-14 05:10:12,025 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 21 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-14 05:10:18,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2493820.0, ans=0.125 2024-08-14 05:10:22,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2493820.0, ans=0.125 2024-08-14 05:10:38,421 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 21 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-14 05:10:55,971 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 14 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-14 05:10:57,547 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 30 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-14 05:11:16,045 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 25 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-14 05:11:17,146 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 3050, loss[loss=0.09971, beats_loss=0.01225, ecapa_loss=0.0001451, whisper_loss=0.08601, over 21719.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01083, ecapa_loss=0.0001572, whisper_loss=0.09232, over 3908568.05 frames. ], batch size: 87, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:11:30,250 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-14 05:11:40,372 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 29 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-14 05:11:59,754 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.503e+01 2.783e+01 3.185e+01 5.631e+01, threshold=5.566e+01, percent-clipped=1.0 2024-08-14 05:12:28,538 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 3100, loss[loss=0.1129, beats_loss=0.01073, ecapa_loss=0.0001526, whisper_loss=0.1006, over 14920.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01085, ecapa_loss=0.0001575, whisper_loss=0.09256, over 3897933.68 frames. ], batch size: 57, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:12:32,158 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.05 vs. limit=15.0 2024-08-14 05:12:42,038 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 17 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-14 05:12:48,896 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.50 vs. limit=22.5 2024-08-14 05:12:54,129 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 18 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-14 05:12:55,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2494820.0, ans=0.07 2024-08-14 05:12:58,427 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 21 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-14 05:13:03,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2494920.0, ans=0.2 2024-08-14 05:13:05,735 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 10 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-14 05:13:07,168 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 39 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-14 05:13:30,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2495120.0, ans=0.0 2024-08-14 05:13:42,082 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 3150, loss[loss=0.1016, beats_loss=0.01007, ecapa_loss=0.0001583, whisper_loss=0.08998, over 22510.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01083, ecapa_loss=0.0001569, whisper_loss=0.09158, over 3852111.72 frames. ], batch size: 89, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:14:15,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2495420.0, ans=0.125 2024-08-14 05:14:20,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2495420.0, ans=0.0 2024-08-14 05:14:23,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2495420.0, ans=0.1 2024-08-14 05:14:25,939 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.913e+01 2.378e+01 2.581e+01 2.876e+01 7.737e+01, threshold=5.161e+01, percent-clipped=2.0 2024-08-14 05:14:30,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2495520.0, ans=0.0 2024-08-14 05:14:53,642 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2024-08-14 05:14:55,831 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 3200, loss[loss=0.1107, beats_loss=0.009923, ecapa_loss=0.0001565, whisper_loss=0.09924, over 17291.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01075, ecapa_loss=0.0001583, whisper_loss=0.09207, over 3810658.32 frames. ], batch size: 67, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:14:56,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2495720.0, ans=0.125 2024-08-14 05:14:58,336 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.98 vs. limit=15.0 2024-08-14 05:15:01,799 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 23 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-14 05:15:11,643 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 32 from LS+wenet, 12 from Vox, 39 fro AS 2024-08-14 05:15:21,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2495820.0, ans=0.1 2024-08-14 05:15:24,344 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 24 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-14 05:15:26,095 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2495920.0, ans=0.04949747468305833 2024-08-14 05:15:39,510 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 19 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-14 05:15:47,148 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.54 vs. limit=15.0 2024-08-14 05:15:57,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2496120.0, ans=0.0 2024-08-14 05:15:58,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2496120.0, ans=0.2 2024-08-14 05:16:08,368 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 3250, loss[loss=0.1167, beats_loss=0.01047, ecapa_loss=0.0001509, whisper_loss=0.1048, over 23091.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01075, ecapa_loss=0.0001584, whisper_loss=0.0919, over 3818497.34 frames. ], batch size: 92, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:16:19,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2496220.0, ans=0.125 2024-08-14 05:16:26,289 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 18 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-14 05:16:49,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2496420.0, ans=0.0 2024-08-14 05:16:51,223 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.851e+01 2.408e+01 2.775e+01 3.145e+01 3.018e+02, threshold=5.551e+01, percent-clipped=3.0 2024-08-14 05:16:51,430 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-14 05:16:56,116 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2496520.0, ans=0.2 2024-08-14 05:17:17,029 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2496620.0, ans=0.125 2024-08-14 05:17:20,484 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 3300, loss[loss=0.1131, beats_loss=0.01111, ecapa_loss=0.0001461, whisper_loss=0.1005, over 20458.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01079, ecapa_loss=0.0001591, whisper_loss=0.09175, over 3841437.92 frames. ], batch size: 84, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:17:20,893 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 12 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-14 05:17:37,574 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 27 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-14 05:17:40,723 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2496820.0, ans=0.2 2024-08-14 05:17:56,204 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2496920.0, ans=0.125 2024-08-14 05:18:14,095 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.21 vs. limit=15.0 2024-08-14 05:18:15,449 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.39 vs. limit=15.0 2024-08-14 05:18:33,901 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 3350, loss[loss=0.09986, beats_loss=0.01094, ecapa_loss=0.000167, whisper_loss=0.08724, over 22028.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01071, ecapa_loss=0.0001589, whisper_loss=0.09195, over 3846980.37 frames. ], batch size: 91, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:18:43,288 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 29 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-14 05:19:02,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2497420.0, ans=0.09899494936611666 2024-08-14 05:19:08,568 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2497420.0, ans=0.125 2024-08-14 05:19:11,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2497420.0, ans=0.95 2024-08-14 05:19:17,827 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.798e+01 2.312e+01 2.517e+01 2.799e+01 4.556e+01, threshold=5.034e+01, percent-clipped=0.0 2024-08-14 05:19:22,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2497520.0, ans=0.125 2024-08-14 05:19:22,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2497520.0, ans=0.1 2024-08-14 05:19:47,271 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 3400, loss[loss=0.09967, beats_loss=0.01155, ecapa_loss=0.0001411, whisper_loss=0.08671, over 17812.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01069, ecapa_loss=0.000158, whisper_loss=0.092, over 3884264.76 frames. ], batch size: 72, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:20:18,663 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.23 vs. limit=10.0 2024-08-14 05:20:22,340 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 21 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-14 05:20:30,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2498020.0, ans=0.07 2024-08-14 05:20:30,363 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.35 vs. limit=15.0 2024-08-14 05:20:42,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2498020.0, ans=0.125 2024-08-14 05:20:59,441 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 3450, loss[loss=0.1185, beats_loss=0.007445, ecapa_loss=0.0001819, whisper_loss=0.1093, over 19825.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01071, ecapa_loss=0.0001586, whisper_loss=0.09191, over 3892559.88 frames. ], batch size: 77, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:21:02,102 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.75 vs. limit=22.5 2024-08-14 05:21:43,424 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.938e+01 2.288e+01 2.702e+01 3.056e+01 2.683e+02, threshold=5.405e+01, percent-clipped=1.0 2024-08-14 05:21:43,661 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 28 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-14 05:21:49,945 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2498520.0, ans=0.125 2024-08-14 05:21:52,515 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 22 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-14 05:22:01,474 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 13 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-14 05:22:12,749 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 3500, loss[loss=0.1064, beats_loss=0.01056, ecapa_loss=0.0001439, whisper_loss=0.09444, over 18613.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01062, ecapa_loss=0.0001585, whisper_loss=0.0928, over 3875596.29 frames. ], batch size: 74, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:22:17,581 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 33 from Vox, 29 fro AS 2024-08-14 05:22:19,515 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2498720.0, ans=0.5 2024-08-14 05:22:45,179 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2498920.0, ans=0.2 2024-08-14 05:23:06,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2499020.0, ans=0.0 2024-08-14 05:23:07,261 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.38 vs. limit=15.0 2024-08-14 05:23:13,657 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 16 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-14 05:23:22,813 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 19 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-14 05:23:25,425 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 3550, loss[loss=0.08403, beats_loss=0.01055, ecapa_loss=0.0001814, whisper_loss=0.07166, over 13851.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01068, ecapa_loss=0.0001592, whisper_loss=0.09176, over 3881540.40 frames. ], batch size: 55, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:23:29,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2499220.0, ans=0.125 2024-08-14 05:23:40,446 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 36 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-14 05:23:54,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2499420.0, ans=0.0 2024-08-14 05:24:04,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2499420.0, ans=0.125 2024-08-14 05:24:06,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2499420.0, ans=0.125 2024-08-14 05:24:09,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2499520.0, ans=0.125 2024-08-14 05:24:09,080 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.740e+01 2024-08-14 05:24:10,042 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.813e+01 2.402e+01 2.607e+01 2.928e+01 5.339e+01, threshold=5.213e+01, percent-clipped=0.0 2024-08-14 05:24:10,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2499520.0, ans=0.125 2024-08-14 05:24:31,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2499620.0, ans=0.125 2024-08-14 05:24:35,446 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 28 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-14 05:24:39,710 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 3600, loss[loss=0.08881, beats_loss=0.01145, ecapa_loss=0.0001528, whisper_loss=0.07584, over 20360.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01064, ecapa_loss=0.0001598, whisper_loss=0.09195, over 3899621.84 frames. ], batch size: 84, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:24:51,913 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 26 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-14 05:24:52,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2499720.0, ans=0.125 2024-08-14 05:24:56,860 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2499820.0, ans=0.125 2024-08-14 05:25:01,341 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2499820.0, ans=0.0 2024-08-14 05:25:15,092 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 23 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-14 05:25:50,618 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 11 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-14 05:25:53,215 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 3650, loss[loss=0.09288, beats_loss=0.01435, ecapa_loss=0.0001054, whisper_loss=0.07748, over 17669.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0107, ecapa_loss=0.00016, whisper_loss=0.09126, over 3903091.71 frames. ], batch size: 66, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:26:05,407 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2500220.0, ans=0.0 2024-08-14 05:26:08,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2500320.0, ans=0.1 2024-08-14 05:26:17,116 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-14 05:26:38,006 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.741e+01 2.437e+01 2.673e+01 3.010e+01 1.345e+02, threshold=5.347e+01, percent-clipped=1.0 2024-08-14 05:26:44,242 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2500520.0, ans=0.125 2024-08-14 05:27:07,321 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 3700, loss[loss=0.1005, beats_loss=0.01055, ecapa_loss=0.0001647, whisper_loss=0.08831, over 18615.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0107, ecapa_loss=0.0001601, whisper_loss=0.09113, over 3893824.17 frames. ], batch size: 73, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:27:28,237 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.86 vs. limit=15.0 2024-08-14 05:27:35,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2500920.0, ans=0.1 2024-08-14 05:27:39,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2500920.0, ans=0.0 2024-08-14 05:27:52,411 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2501020.0, ans=0.125 2024-08-14 05:27:52,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2501020.0, ans=0.0 2024-08-14 05:28:02,348 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 26 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-14 05:28:05,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2501120.0, ans=0.0 2024-08-14 05:28:19,966 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 3750, loss[loss=0.09772, beats_loss=0.01062, ecapa_loss=0.0001646, whisper_loss=0.08546, over 17280.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0107, ecapa_loss=0.0001605, whisper_loss=0.09096, over 3884564.08 frames. ], batch size: 71, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:28:30,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2501220.0, ans=0.125 2024-08-14 05:28:33,617 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 21 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-14 05:28:49,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2501420.0, ans=0.1 2024-08-14 05:28:50,796 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 28 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-14 05:28:55,130 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-14 05:28:55,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2501420.0, ans=0.0 2024-08-14 05:29:03,670 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.720e+01 2.382e+01 2.609e+01 2.989e+01 8.009e+01, threshold=5.218e+01, percent-clipped=2.0 2024-08-14 05:29:06,888 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 30 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-14 05:29:10,016 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2501520.0, ans=0.0 2024-08-14 05:29:10,331 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.14 vs. limit=6.0 2024-08-14 05:29:12,548 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 17 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-14 05:29:17,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2501620.0, ans=0.1 2024-08-14 05:29:24,648 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2501620.0, ans=0.2 2024-08-14 05:29:32,767 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 3800, loss[loss=0.1085, beats_loss=0.01212, ecapa_loss=0.0001432, whisper_loss=0.09498, over 23522.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01072, ecapa_loss=0.0001588, whisper_loss=0.091, over 3849367.03 frames. ], batch size: 92, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:29:49,335 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 34 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-14 05:29:50,696 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 20 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-14 05:29:52,588 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2501820.0, ans=0.0 2024-08-14 05:30:01,482 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2501920.0, ans=0.125 2024-08-14 05:30:15,243 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2501920.0, ans=0.125 2024-08-14 05:30:16,525 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-14 05:30:31,044 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2502120.0, ans=0.0 2024-08-14 05:30:34,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2502120.0, ans=0.125 2024-08-14 05:30:34,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2502120.0, ans=0.2 2024-08-14 05:30:36,596 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 21 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-14 05:30:40,371 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.52 vs. limit=15.0 2024-08-14 05:30:42,759 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-14 05:30:44,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2502120.0, ans=0.125 2024-08-14 05:30:46,318 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 3850, loss[loss=0.1183, beats_loss=0.01131, ecapa_loss=0.0001318, whisper_loss=0.1057, over 23465.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01072, ecapa_loss=0.0001583, whisper_loss=0.09099, over 3838808.03 frames. ], batch size: 90, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:30:48,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2502220.0, ans=0.125 2024-08-14 05:30:49,601 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 23 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-14 05:30:54,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2502220.0, ans=0.125 2024-08-14 05:31:03,781 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.22 vs. limit=22.5 2024-08-14 05:31:19,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2502420.0, ans=0.125 2024-08-14 05:31:21,106 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 24 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-14 05:31:24,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2502420.0, ans=0.125 2024-08-14 05:31:28,901 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.75 vs. limit=22.5 2024-08-14 05:31:29,593 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.353e+01 2.523e+01 2.870e+01 4.680e+01, threshold=5.047e+01, percent-clipped=0.0 2024-08-14 05:31:40,440 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2502520.0, ans=0.0 2024-08-14 05:31:57,010 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2502620.0, ans=0.125 2024-08-14 05:31:59,160 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 3900, loss[loss=0.104, beats_loss=0.01135, ecapa_loss=0.0001383, whisper_loss=0.09124, over 18082.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01073, ecapa_loss=0.0001584, whisper_loss=0.09105, over 3849162.36 frames. ], batch size: 71, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:32:12,935 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2502820.0, ans=0.125 2024-08-14 05:32:16,798 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-14 05:32:21,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2502820.0, ans=0.125 2024-08-14 05:32:30,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2502920.0, ans=0.0 2024-08-14 05:32:38,572 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 25 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-14 05:32:39,207 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.24 vs. limit=6.0 2024-08-14 05:32:43,613 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2503020.0, ans=0.125 2024-08-14 05:32:46,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=2503020.0, ans=0.95 2024-08-14 05:33:12,063 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 3950, loss[loss=0.1061, beats_loss=0.0125, ecapa_loss=0.0001332, whisper_loss=0.09227, over 22880.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01065, ecapa_loss=0.0001601, whisper_loss=0.09099, over 3859482.79 frames. ], batch size: 89, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:33:33,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2503320.0, ans=0.125 2024-08-14 05:33:42,518 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 31 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-14 05:33:48,578 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 18 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-14 05:33:55,871 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.026e+01 2.430e+01 2.817e+01 3.192e+01 2.202e+02, threshold=5.633e+01, percent-clipped=4.0 2024-08-14 05:34:10,170 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 21 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-14 05:34:25,027 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 4000, loss[loss=0.1029, beats_loss=0.01151, ecapa_loss=0.0001561, whisper_loss=0.08987, over 19246.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01057, ecapa_loss=0.0001605, whisper_loss=0.09198, over 3902694.08 frames. ], batch size: 73, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:34:34,397 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2503720.0, ans=0.0 2024-08-14 05:34:51,525 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2503820.0, ans=0.1 2024-08-14 05:35:05,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2503920.0, ans=0.125 2024-08-14 05:35:26,007 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.44 vs. limit=15.0 2024-08-14 05:35:27,517 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2504120.0, ans=0.125 2024-08-14 05:35:31,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2504120.0, ans=0.07 2024-08-14 05:35:33,012 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-14 05:35:38,619 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 4050, loss[loss=0.09831, beats_loss=0.01205, ecapa_loss=0.0001832, whisper_loss=0.08442, over 16824.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.0106, ecapa_loss=0.0001617, whisper_loss=0.09222, over 3910224.24 frames. ], batch size: 70, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:36:05,557 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2504320.0, ans=0.125 2024-08-14 05:36:16,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2504420.0, ans=0.125 2024-08-14 05:36:22,929 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.679e+01 2.286e+01 2.527e+01 2.897e+01 4.039e+01, threshold=5.053e+01, percent-clipped=0.0 2024-08-14 05:36:23,246 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-14 05:36:25,173 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2504520.0, ans=0.0 2024-08-14 05:36:38,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2504620.0, ans=0.125 2024-08-14 05:36:51,934 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 4100, loss[loss=0.09018, beats_loss=0.01295, ecapa_loss=0.0001338, whisper_loss=0.07589, over 18447.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01063, ecapa_loss=0.0001613, whisper_loss=0.09144, over 3924683.88 frames. ], batch size: 76, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:37:25,882 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-14 05:37:26,870 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 18 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-14 05:37:27,652 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=2504920.0, ans=15.0 2024-08-14 05:37:34,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2505020.0, ans=0.035 2024-08-14 05:37:34,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2505020.0, ans=0.0 2024-08-14 05:37:53,805 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2505120.0, ans=0.1 2024-08-14 05:37:55,129 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2505120.0, ans=0.1 2024-08-14 05:38:01,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2505120.0, ans=0.125 2024-08-14 05:38:04,938 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 4150, loss[loss=0.1148, beats_loss=0.01106, ecapa_loss=0.0001584, whisper_loss=0.1021, over 22191.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01067, ecapa_loss=0.0001601, whisper_loss=0.09119, over 3937574.66 frames. ], batch size: 90, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:38:05,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2505220.0, ans=0.1 2024-08-14 05:38:15,294 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=2505220.0, ans=0.05 2024-08-14 05:38:25,793 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.30 vs. limit=22.5 2024-08-14 05:38:29,149 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 16 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-14 05:38:30,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2505320.0, ans=0.1 2024-08-14 05:38:33,758 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2505420.0, ans=0.1 2024-08-14 05:38:39,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2505420.0, ans=0.125 2024-08-14 05:38:41,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2505420.0, ans=0.125 2024-08-14 05:38:49,810 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.935e+01 2.395e+01 2.659e+01 2.961e+01 5.291e+01, threshold=5.319e+01, percent-clipped=1.0 2024-08-14 05:38:50,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2505520.0, ans=0.125 2024-08-14 05:38:51,633 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-14 05:39:02,400 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2505620.0, ans=0.0 2024-08-14 05:39:03,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2505620.0, ans=0.125 2024-08-14 05:39:17,885 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 4200, loss[loss=0.1127, beats_loss=0.009734, ecapa_loss=0.0001295, whisper_loss=0.1016, over 18711.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01066, ecapa_loss=0.0001596, whisper_loss=0.09097, over 3902399.51 frames. ], batch size: 70, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:39:18,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2505720.0, ans=0.0 2024-08-14 05:39:29,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2505720.0, ans=0.125 2024-08-14 05:39:43,090 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-14 05:40:01,497 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.63 vs. limit=6.0 2024-08-14 05:40:08,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2506020.0, ans=0.0 2024-08-14 05:40:26,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2506120.0, ans=0.125 2024-08-14 05:40:31,240 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 4250, loss[loss=0.09798, beats_loss=0.0126, ecapa_loss=0.0001145, whisper_loss=0.08424, over 22948.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01071, ecapa_loss=0.0001582, whisper_loss=0.09083, over 3936589.03 frames. ], batch size: 90, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:40:38,403 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 31 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-14 05:40:49,803 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 15 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-14 05:40:50,412 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.95 vs. limit=6.0 2024-08-14 05:40:56,154 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=32.05 vs. limit=22.5 2024-08-14 05:40:59,282 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.27 vs. limit=6.0 2024-08-14 05:41:02,249 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.56 vs. limit=15.0 2024-08-14 05:41:09,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2506420.0, ans=0.125 2024-08-14 05:41:16,272 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.332e+01 2.542e+01 2.821e+01 5.499e+01, threshold=5.083e+01, percent-clipped=1.0 2024-08-14 05:41:30,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2506620.0, ans=0.2 2024-08-14 05:41:41,681 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 24 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-14 05:41:44,640 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 4300, loss[loss=0.1305, beats_loss=0.0102, ecapa_loss=0.0001793, whisper_loss=0.1185, over 23903.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01072, ecapa_loss=0.0001581, whisper_loss=0.09065, over 3929930.70 frames. ], batch size: 93, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:41:54,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2506720.0, ans=0.125 2024-08-14 05:41:55,577 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 25 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-14 05:42:01,645 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 20 from LS+wenet, 30 from Vox, 42 fro AS 2024-08-14 05:42:22,539 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=2506920.0, ans=0.025 2024-08-14 05:42:25,313 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 22 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-14 05:42:53,500 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-14 05:42:58,228 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2507220.0, ans=0.0 2024-08-14 05:42:59,075 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 4350, loss[loss=0.09551, beats_loss=0.01052, ecapa_loss=0.0001395, whisper_loss=0.0836, over 22196.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01075, ecapa_loss=0.0001585, whisper_loss=0.09036, over 3901329.52 frames. ], batch size: 89, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:43:14,425 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2507320.0, ans=0.125 2024-08-14 05:43:21,864 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.63 vs. limit=6.0 2024-08-14 05:43:30,075 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 17 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-14 05:43:35,886 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 25 from LS+wenet, 31 from Vox, 31 fro AS 2024-08-14 05:43:38,551 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 13 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-14 05:43:43,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2507520.0, ans=0.125 2024-08-14 05:43:43,954 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.369e+01 2.648e+01 3.108e+01 4.930e+01, threshold=5.296e+01, percent-clipped=0.0 2024-08-14 05:43:48,624 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-14 05:44:08,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2507620.0, ans=0.125 2024-08-14 05:44:12,384 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 4400, loss[loss=0.1173, beats_loss=0.01011, ecapa_loss=0.0001682, whisper_loss=0.1055, over 16088.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01068, ecapa_loss=0.0001589, whisper_loss=0.09087, over 3899735.09 frames. ], batch size: 62, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:44:13,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2507720.0, ans=0.1 2024-08-14 05:45:01,288 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 24 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-14 05:45:10,969 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2508020.0, ans=0.125 2024-08-14 05:45:27,864 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 4450, loss[loss=0.1042, beats_loss=0.01111, ecapa_loss=0.0001439, whisper_loss=0.09169, over 16819.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01073, ecapa_loss=0.0001577, whisper_loss=0.09052, over 3910877.30 frames. ], batch size: 65, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:45:31,289 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-14 05:45:39,140 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.68 vs. limit=12.0 2024-08-14 05:45:40,535 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.76 vs. limit=15.0 2024-08-14 05:45:56,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2508420.0, ans=0.125 2024-08-14 05:46:03,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2508420.0, ans=0.0 2024-08-14 05:46:13,338 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.080e+01 2.408e+01 2.704e+01 3.117e+01 4.091e+01, threshold=5.407e+01, percent-clipped=0.0 2024-08-14 05:46:20,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2508520.0, ans=0.0 2024-08-14 05:46:21,670 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 22 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-14 05:46:33,548 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 18 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-14 05:46:42,275 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 4500, loss[loss=0.1301, beats_loss=0.008771, ecapa_loss=0.0001864, whisper_loss=0.1194, over 13813.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01065, ecapa_loss=0.0001597, whisper_loss=0.09038, over 3871772.87 frames. ], batch size: 58, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:47:02,525 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2508820.0, ans=0.0 2024-08-14 05:47:03,853 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2508820.0, ans=0.1 2024-08-14 05:47:09,388 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.37 vs. limit=15.0 2024-08-14 05:47:28,016 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2509020.0, ans=0.0 2024-08-14 05:47:34,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2509020.0, ans=0.125 2024-08-14 05:47:48,919 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-14 05:47:52,567 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=14.28 vs. limit=15.0 2024-08-14 05:48:00,522 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2509220.0, ans=0.125 2024-08-14 05:48:01,707 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 4550, loss[loss=0.1146, beats_loss=0.007925, ecapa_loss=0.0002025, whisper_loss=0.1047, over 20658.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01066, ecapa_loss=0.0001597, whisper_loss=0.09067, over 3886263.47 frames. ], batch size: 85, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:48:07,263 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2509220.0, ans=0.125 2024-08-14 05:48:08,313 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 24 from LS+wenet, 30 from Vox, 39 fro AS 2024-08-14 05:48:12,254 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2509220.0, ans=0.125 2024-08-14 05:48:44,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=2509420.0, ans=0.5 2024-08-14 05:48:47,824 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.72 vs. limit=15.0 2024-08-14 05:48:51,399 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.941e+01 2.350e+01 2.581e+01 3.010e+01 9.450e+01, threshold=5.163e+01, percent-clipped=2.0 2024-08-14 05:49:01,152 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 35 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-14 05:49:20,653 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 4600, loss[loss=0.07954, beats_loss=0.01403, ecapa_loss=0.0001339, whisper_loss=0.06417, over 20806.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01081, ecapa_loss=0.0001574, whisper_loss=0.09012, over 3894136.42 frames. ], batch size: 86, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:49:42,376 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.72 vs. limit=15.0 2024-08-14 05:49:43,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2509820.0, ans=0.2 2024-08-14 05:49:44,883 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 21 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-14 05:49:49,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2509820.0, ans=0.035 2024-08-14 05:49:52,623 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 20 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-14 05:50:02,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2509920.0, ans=0.125 2024-08-14 05:50:08,531 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 18 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-14 05:50:15,826 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.95 vs. limit=22.5 2024-08-14 05:50:39,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2510220.0, ans=0.125 2024-08-14 05:50:41,114 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 4650, loss[loss=0.09614, beats_loss=0.01156, ecapa_loss=0.0001736, whisper_loss=0.08285, over 20693.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01083, ecapa_loss=0.000158, whisper_loss=0.0894, over 3878607.90 frames. ], batch size: 89, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:50:46,601 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.76 vs. limit=15.0 2024-08-14 05:50:54,342 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 27 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-14 05:50:59,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2510320.0, ans=0.0 2024-08-14 05:50:59,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2510320.0, ans=0.2 2024-08-14 05:51:07,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2510320.0, ans=0.0 2024-08-14 05:51:18,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2510420.0, ans=0.125 2024-08-14 05:51:24,778 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.14 vs. limit=15.0 2024-08-14 05:51:25,790 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2510420.0, ans=0.125 2024-08-14 05:51:30,947 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.010e+01 2.365e+01 2.623e+01 2.877e+01 4.425e+01, threshold=5.246e+01, percent-clipped=0.0 2024-08-14 05:51:52,877 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 20 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-14 05:52:00,620 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 4700, loss[loss=0.08981, beats_loss=0.01396, ecapa_loss=0.0001353, whisper_loss=0.07449, over 19612.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01083, ecapa_loss=0.000158, whisper_loss=0.08979, over 3874328.93 frames. ], batch size: 82, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:52:00,878 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 25 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-14 05:52:13,670 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 17 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-14 05:52:20,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2510820.0, ans=0.0 2024-08-14 05:53:19,721 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 4750, loss[loss=0.1169, beats_loss=0.009812, ecapa_loss=0.0001808, whisper_loss=0.1053, over 20638.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01081, ecapa_loss=0.0001571, whisper_loss=0.09025, over 3881713.21 frames. ], batch size: 87, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:53:30,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2511220.0, ans=0.0 2024-08-14 05:53:33,275 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2511220.0, ans=0.0 2024-08-14 05:53:55,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2511420.0, ans=0.0 2024-08-14 05:53:55,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2511420.0, ans=0.0 2024-08-14 05:53:58,573 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.52 vs. limit=15.0 2024-08-14 05:54:03,639 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2511420.0, ans=0.125 2024-08-14 05:54:08,791 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.355e+01 2.556e+01 2.982e+01 9.125e+01, threshold=5.113e+01, percent-clipped=1.0 2024-08-14 05:54:13,484 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2511520.0, ans=0.125 2024-08-14 05:54:17,957 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 26 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-14 05:54:18,313 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2511520.0, ans=0.1 2024-08-14 05:54:19,519 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 18 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-14 05:54:27,515 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2511620.0, ans=0.125 2024-08-14 05:54:39,598 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 4800, loss[loss=0.104, beats_loss=0.01117, ecapa_loss=0.0001534, whisper_loss=0.09125, over 19028.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01081, ecapa_loss=0.0001577, whisper_loss=0.09003, over 3903293.54 frames. ], batch size: 74, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:54:41,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2511720.0, ans=0.125 2024-08-14 05:54:41,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2511720.0, ans=0.125 2024-08-14 05:54:54,742 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-14 05:55:00,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2511820.0, ans=0.0 2024-08-14 05:55:03,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2511820.0, ans=0.125 2024-08-14 05:55:36,596 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 18 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-14 05:55:38,546 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.12 vs. limit=15.0 2024-08-14 05:55:39,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2512020.0, ans=0.125 2024-08-14 05:55:43,410 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2512120.0, ans=0.125 2024-08-14 05:55:44,886 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-14 05:55:45,239 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2512120.0, ans=0.07 2024-08-14 05:55:52,150 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2512120.0, ans=0.125 2024-08-14 05:56:01,256 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 4850, loss[loss=0.1151, beats_loss=0.009636, ecapa_loss=0.0001204, whisper_loss=0.1042, over 16587.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01085, ecapa_loss=0.0001572, whisper_loss=0.08942, over 3934057.35 frames. ], batch size: 61, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:56:10,909 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.73 vs. limit=15.0 2024-08-14 05:56:11,721 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 18 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-14 05:56:17,934 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 22 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-14 05:56:18,694 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.31 vs. limit=10.0 2024-08-14 05:56:22,672 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2512320.0, ans=0.0 2024-08-14 05:56:40,782 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 37 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-14 05:56:47,916 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2512420.0, ans=0.2 2024-08-14 05:56:51,712 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.039e+01 2.430e+01 2.607e+01 2.996e+01 1.441e+02, threshold=5.214e+01, percent-clipped=2.0 2024-08-14 05:57:08,623 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 16 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-14 05:57:10,490 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2512620.0, ans=0.1 2024-08-14 05:57:11,593 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 29 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-14 05:57:22,314 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 4900, loss[loss=0.1158, beats_loss=0.01151, ecapa_loss=0.0001648, whisper_loss=0.1026, over 21635.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01082, ecapa_loss=0.0001564, whisper_loss=0.09027, over 3924696.70 frames. ], batch size: 90, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:57:22,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2512720.0, ans=0.125 2024-08-14 05:57:24,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2512720.0, ans=0.125 2024-08-14 05:57:36,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2512820.0, ans=0.125 2024-08-14 05:57:40,274 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.85 vs. limit=15.0 2024-08-14 05:57:41,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2512820.0, ans=0.1 2024-08-14 05:57:44,001 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 13 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-14 05:57:59,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2512920.0, ans=10.0 2024-08-14 05:58:02,835 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 27 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-14 05:58:04,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2512920.0, ans=0.0 2024-08-14 05:58:10,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2513020.0, ans=0.1 2024-08-14 05:58:11,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2513020.0, ans=0.125 2024-08-14 05:58:28,317 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 21 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-14 05:58:33,127 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2513120.0, ans=0.5 2024-08-14 05:58:33,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2513120.0, ans=0.0 2024-08-14 05:58:40,349 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 4950, loss[loss=0.08187, beats_loss=0.01209, ecapa_loss=0.0001529, whisper_loss=0.06825, over 20782.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01077, ecapa_loss=0.0001566, whisper_loss=0.09054, over 3895586.69 frames. ], batch size: 89, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:58:40,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2513220.0, ans=0.125 2024-08-14 05:58:48,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2513220.0, ans=0.125 2024-08-14 05:58:54,252 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 16 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-14 05:59:02,525 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-14 05:59:07,320 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 21 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-14 05:59:09,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2513320.0, ans=0.1 2024-08-14 05:59:29,323 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.690e+01 2.393e+01 2.657e+01 2.925e+01 4.625e+01, threshold=5.315e+01, percent-clipped=0.0 2024-08-14 05:59:29,638 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 22 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-14 05:59:37,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2513520.0, ans=0.125 2024-08-14 05:59:42,504 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.26 vs. limit=12.0 2024-08-14 05:59:44,082 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.39 vs. limit=10.0 2024-08-14 05:59:44,871 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2513620.0, ans=0.125 2024-08-14 05:59:49,703 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2513620.0, ans=0.0 2024-08-14 05:59:58,132 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 5000, loss[loss=0.1105, beats_loss=0.01055, ecapa_loss=0.0001711, whisper_loss=0.09828, over 20527.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01072, ecapa_loss=0.0001574, whisper_loss=0.09087, over 3887067.08 frames. ], batch size: 83, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:00:18,472 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.82 vs. limit=15.0 2024-08-14 06:00:22,522 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 25 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-14 06:00:24,227 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2513820.0, ans=0.1 2024-08-14 06:00:31,402 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-14 06:00:51,739 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 06:01:16,417 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 5050, loss[loss=0.09337, beats_loss=0.01059, ecapa_loss=0.0001929, whisper_loss=0.08085, over 22751.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01074, ecapa_loss=0.0001591, whisper_loss=0.0913, over 3907105.45 frames. ], batch size: 99, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:01:29,109 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 16 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-14 06:01:40,115 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 30 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-14 06:01:51,655 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2514420.0, ans=0.125 2024-08-14 06:02:05,064 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.930e+01 2.401e+01 2.602e+01 2.912e+01 4.134e+01, threshold=5.204e+01, percent-clipped=0.0 2024-08-14 06:02:14,833 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2514520.0, ans=0.125 2024-08-14 06:02:25,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2514620.0, ans=0.0 2024-08-14 06:02:33,295 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 14 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-14 06:02:34,688 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 5100, loss[loss=0.09237, beats_loss=0.01189, ecapa_loss=0.0001529, whisper_loss=0.07895, over 14051.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01074, ecapa_loss=0.0001589, whisper_loss=0.09163, over 3905868.46 frames. ], batch size: 55, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:02:41,939 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-14 06:02:56,772 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 29 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-14 06:03:03,737 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-14 06:03:18,076 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2514920.0, ans=0.0 2024-08-14 06:03:53,598 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 5150, loss[loss=0.09885, beats_loss=0.01038, ecapa_loss=0.0001641, whisper_loss=0.08684, over 20836.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01072, ecapa_loss=0.0001588, whisper_loss=0.09195, over 3897128.59 frames. ], batch size: 83, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:04:06,766 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2515220.0, ans=0.125 2024-08-14 06:04:12,990 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 21 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-14 06:04:21,739 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 23 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-14 06:04:34,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2515420.0, ans=0.125 2024-08-14 06:04:42,454 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.426e+01 2.661e+01 3.204e+01 7.186e+01, threshold=5.323e+01, percent-clipped=2.0 2024-08-14 06:04:59,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2515620.0, ans=0.0 2024-08-14 06:05:08,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2515620.0, ans=0.125 2024-08-14 06:05:12,774 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 5200, loss[loss=0.09242, beats_loss=0.0116, ecapa_loss=0.0001854, whisper_loss=0.07897, over 20346.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01062, ecapa_loss=0.0001595, whisper_loss=0.09182, over 3883614.77 frames. ], batch size: 88, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:05:27,387 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2515820.0, ans=0.0 2024-08-14 06:05:33,735 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.70 vs. limit=12.0 2024-08-14 06:05:37,757 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2515820.0, ans=0.125 2024-08-14 06:05:55,617 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2515920.0, ans=0.07 2024-08-14 06:06:09,282 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 16 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-14 06:06:18,944 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-14 06:06:20,400 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 26 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-14 06:06:32,157 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 5250, loss[loss=0.1158, beats_loss=0.008116, ecapa_loss=0.0001927, whisper_loss=0.1057, over 19176.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01058, ecapa_loss=0.0001602, whisper_loss=0.09192, over 3902013.08 frames. ], batch size: 77, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:06:41,774 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 23 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-14 06:06:43,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2516220.0, ans=0.125 2024-08-14 06:07:21,421 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.751e+01 2.375e+01 2.671e+01 2.925e+01 9.126e+01, threshold=5.343e+01, percent-clipped=1.0 2024-08-14 06:07:34,525 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2516620.0, ans=0.125 2024-08-14 06:07:52,377 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 5300, loss[loss=0.08121, beats_loss=0.01154, ecapa_loss=0.0001854, whisper_loss=0.06782, over 20677.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01055, ecapa_loss=0.0001591, whisper_loss=0.09193, over 3881398.67 frames. ], batch size: 91, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:08:07,839 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.62 vs. limit=6.0 2024-08-14 06:08:08,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2516820.0, ans=0.0 2024-08-14 06:08:17,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2516820.0, ans=0.0 2024-08-14 06:08:20,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2516820.0, ans=0.0 2024-08-14 06:08:28,406 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2516920.0, ans=0.0 2024-08-14 06:08:48,855 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 20 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-14 06:08:50,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2517020.0, ans=0.125 2024-08-14 06:08:52,307 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2517020.0, ans=0.125 2024-08-14 06:09:12,408 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 5350, loss[loss=0.1076, beats_loss=0.01108, ecapa_loss=0.0001615, whisper_loss=0.09492, over 17744.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01053, ecapa_loss=0.0001596, whisper_loss=0.09196, over 3909359.84 frames. ], batch size: 71, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:09:14,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2517220.0, ans=0.07 2024-08-14 06:09:25,534 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.16 vs. limit=22.5 2024-08-14 06:09:28,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2517320.0, ans=0.05 2024-08-14 06:09:31,727 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.23 vs. limit=22.5 2024-08-14 06:10:01,940 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.731e+01 2.326e+01 2.604e+01 3.065e+01 1.793e+02, threshold=5.208e+01, percent-clipped=2.0 2024-08-14 06:10:13,632 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.17 vs. limit=12.0 2024-08-14 06:10:17,861 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 20 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-14 06:10:18,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2517620.0, ans=0.0 2024-08-14 06:10:23,127 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2517620.0, ans=0.2 2024-08-14 06:10:28,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2517620.0, ans=0.0 2024-08-14 06:10:32,440 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 5400, loss[loss=0.07898, beats_loss=0.01263, ecapa_loss=0.0001671, whisper_loss=0.06468, over 17982.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01052, ecapa_loss=0.0001595, whisper_loss=0.09205, over 3879147.69 frames. ], batch size: 76, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:10:49,012 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.52 vs. limit=12.0 2024-08-14 06:11:09,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2517920.0, ans=0.125 2024-08-14 06:11:25,828 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 25 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-14 06:11:35,486 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 06:11:45,126 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2518120.0, ans=0.125 2024-08-14 06:11:50,785 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-14 06:11:51,331 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.80 vs. limit=15.0 2024-08-14 06:11:51,723 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 5450, loss[loss=0.1088, beats_loss=0.01179, ecapa_loss=0.0001713, whisper_loss=0.09534, over 21907.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01062, ecapa_loss=0.0001591, whisper_loss=0.09128, over 3875449.05 frames. ], batch size: 89, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:11:56,785 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 16 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-14 06:11:58,781 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2518220.0, ans=0.0 2024-08-14 06:12:22,700 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 18 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-14 06:12:31,104 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 25 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-14 06:12:34,920 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2518420.0, ans=0.0 2024-08-14 06:12:41,602 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.014e+01 2.369e+01 2.569e+01 2.930e+01 1.155e+02, threshold=5.138e+01, percent-clipped=3.0 2024-08-14 06:12:45,116 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2518520.0, ans=0.0 2024-08-14 06:12:58,738 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=24.96 vs. limit=22.5 2024-08-14 06:12:59,628 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 25 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-14 06:13:10,371 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 5500, loss[loss=0.1327, beats_loss=0.009618, ecapa_loss=0.0001998, whisper_loss=0.1211, over 21687.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0106, ecapa_loss=0.00016, whisper_loss=0.09104, over 3857495.28 frames. ], batch size: 88, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:13:15,459 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 06:13:22,579 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.49 vs. limit=12.0 2024-08-14 06:13:42,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=2518920.0, ans=0.025 2024-08-14 06:13:42,585 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2518920.0, ans=0.0 2024-08-14 06:13:56,197 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 18 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-14 06:14:14,702 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 38 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-14 06:14:16,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2519120.0, ans=0.125 2024-08-14 06:14:18,031 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2519120.0, ans=0.125 2024-08-14 06:14:19,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2519120.0, ans=0.0 2024-08-14 06:14:30,657 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 5550, loss[loss=0.09254, beats_loss=0.01032, ecapa_loss=0.0001593, whisper_loss=0.08063, over 18231.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0106, ecapa_loss=0.0001603, whisper_loss=0.09124, over 3880826.48 frames. ], batch size: 72, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:14:31,247 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.258e+01 2024-08-14 06:14:39,590 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.96 vs. limit=15.0 2024-08-14 06:14:42,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2519220.0, ans=0.025 2024-08-14 06:15:04,797 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 21 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-14 06:15:21,740 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.883e+01 2.291e+01 2.517e+01 2.810e+01 6.286e+01, threshold=5.034e+01, percent-clipped=1.0 2024-08-14 06:15:22,189 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2519520.0, ans=0.125 2024-08-14 06:15:22,211 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2519520.0, ans=0.1 2024-08-14 06:15:26,223 INFO [train_multi_KD3.py:844] (0/4) A total of 96 cuts. 20 from LS+wenet, 34 from Vox, 42 fro AS 2024-08-14 06:15:43,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2519620.0, ans=0.2 2024-08-14 06:15:50,521 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 5600, loss[loss=0.1081, beats_loss=0.01071, ecapa_loss=0.0001575, whisper_loss=0.09581, over 22683.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01074, ecapa_loss=0.0001598, whisper_loss=0.08973, over 3863144.98 frames. ], batch size: 91, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:15:54,399 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 18 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-14 06:15:57,998 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.74 vs. limit=15.0 2024-08-14 06:16:08,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2519820.0, ans=0.0 2024-08-14 06:16:08,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2519820.0, ans=0.125 2024-08-14 06:16:29,880 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.74 vs. limit=15.0 2024-08-14 06:16:30,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2519920.0, ans=0.0 2024-08-14 06:16:31,884 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-252000.pt 2024-08-14 06:16:37,191 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2519920.0, ans=0.0 2024-08-14 06:16:50,546 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 24 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-14 06:16:50,737 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2520020.0, ans=0.0 2024-08-14 06:16:50,956 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.97 vs. limit=15.0 2024-08-14 06:16:56,866 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 19 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-14 06:17:01,119 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 14 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-14 06:17:10,360 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 5650, loss[loss=0.1191, beats_loss=0.008219, ecapa_loss=0.000172, whisper_loss=0.1091, over 17762.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01089, ecapa_loss=0.0001579, whisper_loss=0.08916, over 3886881.61 frames. ], batch size: 71, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:17:12,430 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2520220.0, ans=0.125 2024-08-14 06:17:35,005 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2520320.0, ans=0.125 2024-08-14 06:17:50,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2520420.0, ans=0.0 2024-08-14 06:17:56,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2520420.0, ans=0.0 2024-08-14 06:17:57,963 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 17 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-14 06:18:00,646 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.971e+01 2.353e+01 2.635e+01 2.874e+01 6.701e+01, threshold=5.270e+01, percent-clipped=1.0 2024-08-14 06:18:13,640 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.96 vs. limit=15.0 2024-08-14 06:18:24,857 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 23 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-14 06:18:26,860 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2520620.0, ans=0.0 2024-08-14 06:18:32,420 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 5700, loss[loss=0.1028, beats_loss=0.01043, ecapa_loss=0.0001846, whisper_loss=0.0905, over 19175.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01086, ecapa_loss=0.0001579, whisper_loss=0.08982, over 3888471.42 frames. ], batch size: 80, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:18:34,421 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 33 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-14 06:18:43,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2520720.0, ans=0.125 2024-08-14 06:19:07,295 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 10 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-14 06:19:23,018 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 19 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-14 06:19:52,991 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 5750, loss[loss=0.1176, beats_loss=0.008347, ecapa_loss=0.0001897, whisper_loss=0.1073, over 16963.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01081, ecapa_loss=0.0001589, whisper_loss=0.09041, over 3882313.06 frames. ], batch size: 66, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:19:57,804 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 16 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-14 06:19:58,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2521220.0, ans=0.0 2024-08-14 06:20:07,699 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2521320.0, ans=0.125 2024-08-14 06:20:11,228 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.52 vs. limit=15.0 2024-08-14 06:20:29,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2521420.0, ans=0.0 2024-08-14 06:20:35,201 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 21 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-14 06:20:41,193 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.761e+01 2.372e+01 2.640e+01 2.859e+01 6.893e+01, threshold=5.281e+01, percent-clipped=1.0 2024-08-14 06:20:48,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2521520.0, ans=0.0 2024-08-14 06:21:12,405 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 5800, loss[loss=0.08045, beats_loss=0.0111, ecapa_loss=0.0001537, whisper_loss=0.06781, over 16610.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01081, ecapa_loss=0.0001586, whisper_loss=0.09013, over 3879564.08 frames. ], batch size: 65, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:21:43,099 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.17 vs. limit=15.0 2024-08-14 06:21:52,778 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2521920.0, ans=0.0 2024-08-14 06:22:05,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2522020.0, ans=0.125 2024-08-14 06:22:07,033 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 22 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-14 06:22:12,745 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2522120.0, ans=0.125 2024-08-14 06:22:18,607 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.93 vs. limit=12.0 2024-08-14 06:22:26,925 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 5850, loss[loss=0.09502, beats_loss=0.01238, ecapa_loss=0.0001521, whisper_loss=0.08112, over 17962.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01079, ecapa_loss=0.000158, whisper_loss=0.09031, over 3876008.26 frames. ], batch size: 72, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:22:57,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2522420.0, ans=0.125 2024-08-14 06:22:58,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2522420.0, ans=0.1 2024-08-14 06:23:02,805 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2522420.0, ans=0.125 2024-08-14 06:23:10,952 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.913e+01 2.428e+01 2.673e+01 2.941e+01 3.816e+01, threshold=5.346e+01, percent-clipped=0.0 2024-08-14 06:23:23,725 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 23 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-14 06:23:28,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2522620.0, ans=0.0 2024-08-14 06:23:30,020 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.10 vs. limit=15.0 2024-08-14 06:23:38,381 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 5900, loss[loss=0.1128, beats_loss=0.009307, ecapa_loss=0.0001808, whisper_loss=0.1017, over 22703.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01073, ecapa_loss=0.0001588, whisper_loss=0.0906, over 3858825.88 frames. ], batch size: 94, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:23:51,012 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 25 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-14 06:24:30,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2523020.0, ans=0.125 2024-08-14 06:24:36,553 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 24 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-14 06:24:40,750 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 16 from LS+wenet, 23 from Vox, 16 fro AS 2024-08-14 06:24:46,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2523220.0, ans=0.04949747468305833 2024-08-14 06:24:47,705 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 5950, loss[loss=0.09683, beats_loss=0.01069, ecapa_loss=0.0001811, whisper_loss=0.08433, over 21769.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01075, ecapa_loss=0.0001603, whisper_loss=0.0902, over 3882822.36 frames. ], batch size: 91, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:25:06,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2523320.0, ans=0.125 2024-08-14 06:25:07,871 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2523320.0, ans=0.1 2024-08-14 06:25:18,937 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.65 vs. limit=6.0 2024-08-14 06:25:20,740 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 14 from LS+wenet, 9 from Vox, 33 fro AS 2024-08-14 06:25:23,489 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 24 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-14 06:25:29,864 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.835e+05 2024-08-14 06:25:30,605 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.949e+01 2.432e+01 2.806e+01 3.149e+01 6.455e+01, threshold=5.612e+01, percent-clipped=2.0 2024-08-14 06:25:36,698 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2523520.0, ans=0.125 2024-08-14 06:25:39,143 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2523520.0, ans=0.0 2024-08-14 06:25:43,052 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2523620.0, ans=0.1 2024-08-14 06:25:47,417 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2523620.0, ans=0.125 2024-08-14 06:25:48,810 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 24 from LS+wenet, 11 from Vox, 21 fro AS 2024-08-14 06:25:56,744 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 6000, loss[loss=0.101, beats_loss=0.01221, ecapa_loss=0.0001615, whisper_loss=0.08715, over 20474.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01083, ecapa_loss=0.0001585, whisper_loss=0.09024, over 3879450.86 frames. ], batch size: 85, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:25:56,746 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-14 06:26:36,863 INFO [train_multi_KD3.py:1149] (0/4) Epoch 18, validation on ASR_libri: loss=0.2513, beats_loss=0, ecapa_loss=0.0005424, whisper_loss=0.2459, over 922467.00 frames. 2024-08-14 06:26:55,961 INFO [train_multi_KD3.py:1149] (0/4) Epoch 18, validation on SV_voxceleb1: loss=0.004393, beats_loss=0, ecapa_loss=0.0004393, whisper_loss=0, over 939242.00 frames. 2024-08-14 06:28:56,613 INFO [train_multi_KD3.py:1149] (0/4) Epoch 18, validation on AT_audioset: loss=0.02347, beats_loss=0.02347, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 06:28:56,618 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32601MB 2024-08-14 06:28:58,923 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.74 vs. limit=6.0 2024-08-14 06:29:02,316 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 26 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-14 06:30:05,661 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 6050, loss[loss=0.1065, beats_loss=0.01033, ecapa_loss=0.0001677, whisper_loss=0.09445, over 15387.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01075, ecapa_loss=0.0001584, whisper_loss=0.09115, over 3891304.83 frames. ], batch size: 61, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:30:18,860 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.73 vs. limit=15.0 2024-08-14 06:30:29,414 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-14 06:30:42,390 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 16 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-14 06:30:49,368 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.623e+01 2.348e+01 2.542e+01 2.875e+01 5.513e+01, threshold=5.084e+01, percent-clipped=0.0 2024-08-14 06:30:56,538 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.11 vs. limit=22.5 2024-08-14 06:31:07,542 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2524620.0, ans=0.04949747468305833 2024-08-14 06:31:15,168 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 6100, loss[loss=0.1092, beats_loss=0.008246, ecapa_loss=0.0002278, whisper_loss=0.09865, over 16724.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01078, ecapa_loss=0.00016, whisper_loss=0.09084, over 3849077.74 frames. ], batch size: 68, lr: 3.48e-03, grad_scale: 1.152921504606847e+18 2024-08-14 06:31:26,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2524720.0, ans=0.125 2024-08-14 06:31:29,500 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-14 06:31:31,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2524820.0, ans=0.05 2024-08-14 06:31:35,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2524820.0, ans=0.2 2024-08-14 06:32:00,814 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2525020.0, ans=0.0 2024-08-14 06:32:25,697 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 6150, loss[loss=0.1269, beats_loss=0.009772, ecapa_loss=0.0001799, whisper_loss=0.1154, over 22079.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01082, ecapa_loss=0.0001603, whisper_loss=0.09057, over 3851070.99 frames. ], batch size: 89, lr: 3.48e-03, grad_scale: 1.152921504606847e+18 2024-08-14 06:32:27,953 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-14 06:32:28,515 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.46 vs. limit=22.5 2024-08-14 06:32:55,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2525420.0, ans=0.0 2024-08-14 06:33:10,116 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.296e+01 2.588e+01 2.950e+01 9.161e+01, threshold=5.175e+01, percent-clipped=1.0 2024-08-14 06:33:16,468 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 30 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-14 06:33:16,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2525520.0, ans=0.2 2024-08-14 06:33:22,337 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2525620.0, ans=0.125 2024-08-14 06:33:29,178 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 32 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-14 06:33:34,184 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.16 vs. limit=12.0 2024-08-14 06:33:37,952 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 6200, loss[loss=0.1029, beats_loss=0.01246, ecapa_loss=0.0001484, whisper_loss=0.08899, over 22280.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01069, ecapa_loss=0.0001599, whisper_loss=0.09104, over 3850920.48 frames. ], batch size: 90, lr: 3.48e-03, grad_scale: 1.152921504606847e+18 2024-08-14 06:33:41,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2525720.0, ans=0.125 2024-08-14 06:34:04,465 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2525820.0, ans=0.125 2024-08-14 06:34:17,263 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 33 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-14 06:34:17,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2525920.0, ans=0.1 2024-08-14 06:34:24,032 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.37 vs. limit=15.0 2024-08-14 06:34:44,760 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-14 06:34:49,627 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 26 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-14 06:34:54,183 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 6250, loss[loss=0.09767, beats_loss=0.01098, ecapa_loss=0.0001735, whisper_loss=0.08495, over 21155.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01068, ecapa_loss=0.0001593, whisper_loss=0.09147, over 3884316.92 frames. ], batch size: 87, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:35:16,239 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.54 vs. limit=15.0 2024-08-14 06:35:20,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2526320.0, ans=0.0 2024-08-14 06:35:28,532 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2526420.0, ans=0.125 2024-08-14 06:35:34,816 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 15 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-14 06:35:35,109 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2526420.0, ans=0.1 2024-08-14 06:35:41,447 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 22 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-14 06:35:44,384 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.485e+01 2.719e+01 3.146e+01 4.092e+01, threshold=5.438e+01, percent-clipped=0.0 2024-08-14 06:35:46,354 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_na.min_abs, batch_count=2526520.0, ans=0.02 2024-08-14 06:35:48,132 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.42 vs. limit=22.5 2024-08-14 06:35:50,628 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 15 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-14 06:36:10,129 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.52 vs. limit=12.0 2024-08-14 06:36:12,046 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 6300, loss[loss=0.08895, beats_loss=0.009966, ecapa_loss=0.0001682, whisper_loss=0.0773, over 21719.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01064, ecapa_loss=0.0001592, whisper_loss=0.09142, over 3876316.49 frames. ], batch size: 91, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:36:20,688 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.64 vs. limit=22.5 2024-08-14 06:36:33,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2526820.0, ans=0.125 2024-08-14 06:36:39,383 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2526820.0, ans=0.0 2024-08-14 06:36:43,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2526920.0, ans=0.125 2024-08-14 06:36:44,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2526920.0, ans=0.125 2024-08-14 06:36:51,729 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-14 06:36:55,672 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 24 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-14 06:37:16,882 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.90 vs. limit=15.0 2024-08-14 06:37:25,351 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 24 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-14 06:37:25,640 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2527120.0, ans=0.0 2024-08-14 06:37:30,651 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 6350, loss[loss=0.1175, beats_loss=0.009123, ecapa_loss=0.0001636, whisper_loss=0.1068, over 15500.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01065, ecapa_loss=0.0001597, whisper_loss=0.09166, over 3889131.88 frames. ], batch size: 60, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:37:40,184 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 39 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-14 06:37:58,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2527320.0, ans=0.125 2024-08-14 06:38:19,888 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.875e+01 2.286e+01 2.522e+01 2.892e+01 3.872e+01, threshold=5.043e+01, percent-clipped=0.0 2024-08-14 06:38:20,166 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 19 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-14 06:38:24,384 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-14 06:38:38,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2527620.0, ans=0.025 2024-08-14 06:38:41,977 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 28 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-14 06:38:47,910 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 6400, loss[loss=0.1071, beats_loss=0.01349, ecapa_loss=0.000142, whisper_loss=0.09215, over 22092.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01065, ecapa_loss=0.0001591, whisper_loss=0.09185, over 3882077.24 frames. ], batch size: 90, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:38:51,598 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2527720.0, ans=0.2 2024-08-14 06:39:02,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2527820.0, ans=0.0 2024-08-14 06:39:05,154 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.64 vs. limit=15.0 2024-08-14 06:39:39,341 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2528020.0, ans=0.0 2024-08-14 06:40:06,589 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 6450, loss[loss=0.1388, beats_loss=0.008745, ecapa_loss=0.0001742, whisper_loss=0.1284, over 19006.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01067, ecapa_loss=0.0001577, whisper_loss=0.09169, over 3892589.98 frames. ], batch size: 74, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:40:13,543 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=2528220.0, ans=15.0 2024-08-14 06:40:14,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2528220.0, ans=0.125 2024-08-14 06:40:22,602 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.68 vs. limit=15.0 2024-08-14 06:40:32,796 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 27 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-14 06:40:40,282 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 23 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-14 06:40:56,962 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.003e+01 2.368e+01 2.657e+01 3.046e+01 7.930e+01, threshold=5.314e+01, percent-clipped=1.0 2024-08-14 06:41:17,135 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2528620.0, ans=0.125 2024-08-14 06:41:24,191 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 6500, loss[loss=0.0927, beats_loss=0.01068, ecapa_loss=0.000193, whisper_loss=0.08009, over 16235.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01072, ecapa_loss=0.0001566, whisper_loss=0.09185, over 3895419.75 frames. ], batch size: 68, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:41:31,364 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.39 vs. limit=15.0 2024-08-14 06:41:35,074 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2528720.0, ans=0.0 2024-08-14 06:41:36,155 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 21 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-14 06:42:19,986 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2529020.0, ans=0.0 2024-08-14 06:42:43,617 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 6550, loss[loss=0.1161, beats_loss=0.01075, ecapa_loss=0.0001563, whisper_loss=0.1038, over 21980.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01072, ecapa_loss=0.000157, whisper_loss=0.09155, over 3898663.04 frames. ], batch size: 88, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:42:46,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2529220.0, ans=0.125 2024-08-14 06:43:03,542 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 06:43:03,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2529320.0, ans=0.1 2024-08-14 06:43:21,607 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 22 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-14 06:43:21,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2529420.0, ans=0.2 2024-08-14 06:43:22,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2529420.0, ans=0.125 2024-08-14 06:43:35,259 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.448e+01 2.627e+01 2.898e+01 7.209e+01, threshold=5.254e+01, percent-clipped=1.0 2024-08-14 06:43:49,132 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2529620.0, ans=0.0 2024-08-14 06:44:04,358 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2529720.0, ans=0.0 2024-08-14 06:44:05,896 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 6600, loss[loss=0.1167, beats_loss=0.009358, ecapa_loss=0.0001786, whisper_loss=0.1056, over 22955.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.0106, ecapa_loss=0.0001588, whisper_loss=0.0927, over 3893944.60 frames. ], batch size: 91, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:44:39,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2529920.0, ans=0.1 2024-08-14 06:44:48,921 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 33 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-14 06:44:51,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2529920.0, ans=0.125 2024-08-14 06:44:59,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2530020.0, ans=0.125 2024-08-14 06:45:11,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2530120.0, ans=0.0 2024-08-14 06:45:25,543 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.58 vs. limit=15.0 2024-08-14 06:45:28,187 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 6650, loss[loss=0.09802, beats_loss=0.0116, ecapa_loss=0.0001694, whisper_loss=0.08473, over 21734.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01053, ecapa_loss=0.0001607, whisper_loss=0.09324, over 3908190.61 frames. ], batch size: 91, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:45:43,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2530320.0, ans=0.125 2024-08-14 06:45:52,657 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2530320.0, ans=0.125 2024-08-14 06:45:58,951 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.34 vs. limit=10.0 2024-08-14 06:46:04,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2530420.0, ans=0.125 2024-08-14 06:46:20,581 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.361e+01 2.583e+01 2.896e+01 3.977e+01, threshold=5.167e+01, percent-clipped=0.0 2024-08-14 06:46:28,705 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 29 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-14 06:46:48,230 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 6700, loss[loss=0.1191, beats_loss=0.008997, ecapa_loss=0.0001597, whisper_loss=0.1085, over 20734.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01054, ecapa_loss=0.0001614, whisper_loss=0.09304, over 3901507.35 frames. ], batch size: 80, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:46:56,951 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.80 vs. limit=15.0 2024-08-14 06:47:09,843 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-14 06:47:15,304 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.42 vs. limit=22.5 2024-08-14 06:47:16,937 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2530820.0, ans=0.1 2024-08-14 06:47:20,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=2530920.0, ans=0.025 2024-08-14 06:47:33,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2530920.0, ans=0.0 2024-08-14 06:47:39,650 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 19 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-14 06:48:16,238 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 6750, loss[loss=0.09586, beats_loss=0.01228, ecapa_loss=0.000112, whisper_loss=0.08246, over 16821.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01056, ecapa_loss=0.0001625, whisper_loss=0.09207, over 3888498.20 frames. ], batch size: 63, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:48:20,352 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 18 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-14 06:48:25,312 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.153e+01 2024-08-14 06:48:42,478 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 22 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-14 06:48:44,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2531320.0, ans=0.0 2024-08-14 06:48:45,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2531320.0, ans=0.125 2024-08-14 06:48:51,403 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 24 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-14 06:49:08,034 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.900e+01 2.298e+01 2.539e+01 2.885e+01 4.400e+01, threshold=5.079e+01, percent-clipped=0.0 2024-08-14 06:49:13,986 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2531520.0, ans=0.2 2024-08-14 06:49:38,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2531720.0, ans=0.125 2024-08-14 06:49:39,211 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 6800, loss[loss=0.1103, beats_loss=0.01052, ecapa_loss=0.0001616, whisper_loss=0.09818, over 21984.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01057, ecapa_loss=0.0001611, whisper_loss=0.09224, over 3888936.68 frames. ], batch size: 88, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:49:56,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2531820.0, ans=0.1 2024-08-14 06:50:18,333 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 22 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-14 06:50:55,892 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 25 from LS+wenet, 23 from Vox, 47 fro AS 2024-08-14 06:51:09,778 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 6850, loss[loss=0.109, beats_loss=0.01127, ecapa_loss=0.0001727, whisper_loss=0.09599, over 22203.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01062, ecapa_loss=0.0001592, whisper_loss=0.09244, over 3881967.44 frames. ], batch size: 93, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:51:28,727 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 18 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-14 06:51:35,511 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2532320.0, ans=0.1 2024-08-14 06:51:37,873 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 26 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-14 06:51:50,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2532420.0, ans=0.0 2024-08-14 06:52:03,067 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.338e+01 2.591e+01 2.972e+01 6.425e+01, threshold=5.181e+01, percent-clipped=1.0 2024-08-14 06:52:09,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2532520.0, ans=0.125 2024-08-14 06:52:22,346 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.04 vs. limit=15.0 2024-08-14 06:52:23,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2532620.0, ans=0.1 2024-08-14 06:52:32,998 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 29 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-14 06:52:33,266 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2532620.0, ans=0.125 2024-08-14 06:52:35,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2532620.0, ans=0.125 2024-08-14 06:52:40,373 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 6900, loss[loss=0.08481, beats_loss=0.01443, ecapa_loss=0.0001439, whisper_loss=0.06893, over 16582.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.0107, ecapa_loss=0.0001588, whisper_loss=0.0919, over 3867013.01 frames. ], batch size: 68, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:53:18,337 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.23 vs. limit=15.0 2024-08-14 06:53:32,312 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2532920.0, ans=0.125 2024-08-14 06:53:53,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2533020.0, ans=0.2 2024-08-14 06:54:10,105 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 26 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-14 06:54:28,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2533220.0, ans=0.0 2024-08-14 06:54:28,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2533220.0, ans=0.2 2024-08-14 06:54:30,209 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 6950, loss[loss=0.1096, beats_loss=0.009365, ecapa_loss=0.0001902, whisper_loss=0.09835, over 18378.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.0107, ecapa_loss=0.0001588, whisper_loss=0.09217, over 3869638.95 frames. ], batch size: 76, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:54:31,135 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2533220.0, ans=0.1 2024-08-14 06:54:32,621 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 23 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-14 06:54:58,530 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-14 06:55:05,088 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2533320.0, ans=0.2 2024-08-14 06:55:16,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=2533420.0, ans=0.1 2024-08-14 06:55:41,203 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.379e+01 2.570e+01 2.940e+01 3.906e+01, threshold=5.139e+01, percent-clipped=0.0 2024-08-14 06:55:50,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=2533520.0, ans=0.02 2024-08-14 06:55:58,851 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 21 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-14 06:56:12,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2533620.0, ans=0.1 2024-08-14 06:56:20,536 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 7000, loss[loss=0.1128, beats_loss=0.01107, ecapa_loss=0.0001502, whisper_loss=0.1002, over 22370.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01069, ecapa_loss=0.0001587, whisper_loss=0.09214, over 3862334.55 frames. ], batch size: 93, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:57:08,549 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.71 vs. limit=10.0 2024-08-14 06:57:14,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2533920.0, ans=0.125 2024-08-14 06:57:16,817 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 29 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-14 06:57:29,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2534020.0, ans=0.125 2024-08-14 06:57:47,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2534120.0, ans=0.0 2024-08-14 06:57:53,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2534120.0, ans=0.0 2024-08-14 06:58:01,130 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 7050, loss[loss=0.09904, beats_loss=0.01081, ecapa_loss=0.0001747, whisper_loss=0.08648, over 21449.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01075, ecapa_loss=0.0001597, whisper_loss=0.09167, over 3895772.36 frames. ], batch size: 87, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:58:16,723 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2534320.0, ans=0.125 2024-08-14 06:58:16,737 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2534320.0, ans=0.125 2024-08-14 06:58:28,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2534320.0, ans=0.125 2024-08-14 06:58:34,075 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-14 06:58:34,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=2534420.0, ans=0.5 2024-08-14 06:58:46,116 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2534520.0, ans=0.2 2024-08-14 06:58:48,167 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.840e+01 2.340e+01 2.637e+01 3.082e+01 1.011e+02, threshold=5.275e+01, percent-clipped=1.0 2024-08-14 06:59:10,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2534620.0, ans=0.125 2024-08-14 06:59:11,636 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2534620.0, ans=0.125 2024-08-14 06:59:14,000 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 7100, loss[loss=0.09834, beats_loss=0.009269, ecapa_loss=0.0001468, whisper_loss=0.0876, over 18042.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01075, ecapa_loss=0.0001571, whisper_loss=0.09141, over 3894977.35 frames. ], batch size: 71, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:59:31,430 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.71 vs. limit=15.0 2024-08-14 06:59:41,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2534820.0, ans=0.1 2024-08-14 06:59:57,316 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-14 07:00:21,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2535120.0, ans=0.125 2024-08-14 07:00:27,182 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 28 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-14 07:00:31,612 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 7150, loss[loss=0.08819, beats_loss=0.01413, ecapa_loss=0.0001321, whisper_loss=0.07274, over 18700.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01075, ecapa_loss=0.000157, whisper_loss=0.09124, over 3915509.21 frames. ], batch size: 74, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:00:43,804 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 24 from LS+wenet, 28 from Vox, 43 fro AS 2024-08-14 07:00:44,334 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.49 vs. limit=15.0 2024-08-14 07:00:56,719 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.56 vs. limit=15.0 2024-08-14 07:01:14,458 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 24 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-14 07:01:20,083 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.290e+01 2.562e+01 2.920e+01 7.577e+01, threshold=5.124e+01, percent-clipped=1.0 2024-08-14 07:01:22,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2535520.0, ans=0.125 2024-08-14 07:01:43,499 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.58 vs. limit=15.0 2024-08-14 07:01:44,545 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2535620.0, ans=0.125 2024-08-14 07:01:47,517 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 7200, loss[loss=0.1056, beats_loss=0.01004, ecapa_loss=0.0001478, whisper_loss=0.09404, over 18510.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01081, ecapa_loss=0.0001579, whisper_loss=0.09022, over 3920254.87 frames. ], batch size: 74, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:01:56,257 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.67 vs. limit=15.0 2024-08-14 07:02:04,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2535820.0, ans=0.1 2024-08-14 07:02:14,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2535820.0, ans=0.1 2024-08-14 07:02:37,402 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2536020.0, ans=0.0 2024-08-14 07:02:47,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2536020.0, ans=0.125 2024-08-14 07:02:53,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2536120.0, ans=0.0 2024-08-14 07:02:54,620 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2536120.0, ans=0.2 2024-08-14 07:03:04,624 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 7250, loss[loss=0.1181, beats_loss=0.01051, ecapa_loss=0.0001677, whisper_loss=0.1059, over 22373.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01082, ecapa_loss=0.000158, whisper_loss=0.08966, over 3907706.82 frames. ], batch size: 87, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:03:13,348 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 18 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-14 07:03:16,270 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-14 07:03:17,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2536220.0, ans=0.09899494936611666 2024-08-14 07:03:17,894 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2536220.0, ans=0.0 2024-08-14 07:03:27,029 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=2536320.0, ans=15.0 2024-08-14 07:03:44,021 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.27 vs. limit=15.0 2024-08-14 07:03:50,803 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=12.03 vs. limit=12.0 2024-08-14 07:03:52,689 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.501e+01 2024-08-14 07:03:55,217 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.008e+01 2.431e+01 2.606e+01 2.894e+01 4.565e+01, threshold=5.211e+01, percent-clipped=0.0 2024-08-14 07:04:00,120 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 35 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-14 07:04:09,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2536620.0, ans=0.125 2024-08-14 07:04:22,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2536720.0, ans=0.05 2024-08-14 07:04:22,894 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 7300, loss[loss=0.1129, beats_loss=0.00949, ecapa_loss=0.0001727, whisper_loss=0.1017, over 19028.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01085, ecapa_loss=0.0001567, whisper_loss=0.09031, over 3899237.55 frames. ], batch size: 75, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:04:26,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2536720.0, ans=0.2 2024-08-14 07:04:37,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2536820.0, ans=0.2 2024-08-14 07:04:49,850 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.92 vs. limit=15.0 2024-08-14 07:04:52,035 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 18 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-14 07:04:53,285 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 21 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-14 07:05:08,860 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2537020.0, ans=0.125 2024-08-14 07:05:10,770 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.45 vs. limit=12.0 2024-08-14 07:05:23,809 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 14 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-14 07:05:34,352 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 30 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-14 07:05:38,180 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 7350, loss[loss=0.08315, beats_loss=0.01276, ecapa_loss=0.0001548, whisper_loss=0.06883, over 22283.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01084, ecapa_loss=0.0001576, whisper_loss=0.09044, over 3899567.03 frames. ], batch size: 93, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:05:40,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2537220.0, ans=0.0 2024-08-14 07:05:44,141 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=4.799e+00 2024-08-14 07:06:18,108 INFO [train_multi_KD3.py:844] (0/4) A total of 97 cuts. 34 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-14 07:06:26,872 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.875e+01 2.465e+01 2.701e+01 2.899e+01 2.044e+02, threshold=5.402e+01, percent-clipped=2.0 2024-08-14 07:06:28,530 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 10 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-14 07:06:47,591 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 20 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-14 07:06:54,433 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 7400, loss[loss=0.1231, beats_loss=0.01039, ecapa_loss=0.0001436, whisper_loss=0.1113, over 20269.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01077, ecapa_loss=0.0001583, whisper_loss=0.09095, over 3879161.50 frames. ], batch size: 80, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:07:03,019 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 24 from LS+wenet, 24 from Vox, 47 fro AS 2024-08-14 07:07:14,320 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 24 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-14 07:07:23,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2537820.0, ans=0.125 2024-08-14 07:07:29,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2537920.0, ans=0.1 2024-08-14 07:07:43,638 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 23 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-14 07:07:48,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2538020.0, ans=0.125 2024-08-14 07:07:55,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2538120.0, ans=0.07 2024-08-14 07:08:02,628 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.59 vs. limit=15.0 2024-08-14 07:08:12,607 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 7450, loss[loss=0.1184, beats_loss=0.009831, ecapa_loss=0.0001793, whisper_loss=0.1068, over 20535.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01076, ecapa_loss=0.0001598, whisper_loss=0.09076, over 3869866.86 frames. ], batch size: 81, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:08:15,657 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.40 vs. limit=15.0 2024-08-14 07:08:39,982 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 22 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-14 07:09:05,385 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.838e+01 2.389e+01 2.665e+01 3.000e+01 5.031e+01, threshold=5.329e+01, percent-clipped=0.0 2024-08-14 07:09:05,558 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-14 07:09:10,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2538520.0, ans=0.125 2024-08-14 07:09:33,874 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 7500, loss[loss=0.1163, beats_loss=0.01221, ecapa_loss=0.0001359, whisper_loss=0.1028, over 22142.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01072, ecapa_loss=0.0001598, whisper_loss=0.0913, over 3908247.43 frames. ], batch size: 86, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:09:34,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2538720.0, ans=0.125 2024-08-14 07:09:37,979 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 16 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-14 07:09:39,702 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 23 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-14 07:09:50,427 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 20 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-14 07:09:55,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2538820.0, ans=0.125 2024-08-14 07:10:03,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2538820.0, ans=0.1 2024-08-14 07:10:17,585 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-14 07:10:30,883 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2539020.0, ans=0.2 2024-08-14 07:10:35,728 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-14 07:10:37,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2539120.0, ans=0.125 2024-08-14 07:10:44,750 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 23 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-14 07:10:54,718 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 7550, loss[loss=0.1045, beats_loss=0.01072, ecapa_loss=0.0001716, whisper_loss=0.09205, over 21481.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01064, ecapa_loss=0.0001601, whisper_loss=0.09096, over 3859222.13 frames. ], batch size: 90, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:11:16,364 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 07:11:17,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=2539320.0, ans=0.05 2024-08-14 07:11:29,600 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 26 from LS+wenet, 14 from Vox, 47 fro AS 2024-08-14 07:11:33,809 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-14 07:11:44,643 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 19 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-14 07:11:46,479 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.883e+01 2.298e+01 2.593e+01 2.946e+01 4.435e+01, threshold=5.186e+01, percent-clipped=0.0 2024-08-14 07:11:56,276 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-14 07:12:06,515 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=2539620.0, ans=6.0 2024-08-14 07:12:12,718 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-14 07:12:15,444 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 7600, loss[loss=0.1041, beats_loss=0.01106, ecapa_loss=0.0001367, whisper_loss=0.09171, over 21280.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01068, ecapa_loss=0.0001599, whisper_loss=0.09118, over 3862623.18 frames. ], batch size: 85, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:12:15,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2539720.0, ans=0.125 2024-08-14 07:12:43,100 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 18 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-14 07:12:47,764 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2539920.0, ans=0.125 2024-08-14 07:12:52,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2539920.0, ans=0.1 2024-08-14 07:13:06,492 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.19 vs. limit=22.5 2024-08-14 07:13:07,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2540020.0, ans=0.1 2024-08-14 07:13:16,661 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.33 vs. limit=15.0 2024-08-14 07:13:33,821 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 7650, loss[loss=0.09912, beats_loss=0.01302, ecapa_loss=0.0001432, whisper_loss=0.08467, over 21989.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01069, ecapa_loss=0.0001598, whisper_loss=0.09071, over 3830884.78 frames. ], batch size: 88, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:13:35,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2540220.0, ans=0.0 2024-08-14 07:13:40,376 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2024-08-14 07:14:16,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2540420.0, ans=0.1 2024-08-14 07:14:23,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2540520.0, ans=0.2 2024-08-14 07:14:25,149 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.824e+01 2.299e+01 2.494e+01 2.918e+01 5.997e+01, threshold=4.989e+01, percent-clipped=1.0 2024-08-14 07:14:53,667 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 7700, loss[loss=0.1031, beats_loss=0.01117, ecapa_loss=0.0001769, whisper_loss=0.09013, over 22264.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01064, ecapa_loss=0.00016, whisper_loss=0.09108, over 3848497.83 frames. ], batch size: 92, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:15:03,724 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.71 vs. limit=6.0 2024-08-14 07:15:08,228 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 13 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-14 07:15:14,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2540820.0, ans=0.2 2024-08-14 07:15:34,649 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 24 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-14 07:15:36,201 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 25 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-14 07:15:45,520 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 07:15:54,119 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-14 07:15:55,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2541020.0, ans=0.125 2024-08-14 07:15:57,311 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 28 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-14 07:16:13,432 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 7750, loss[loss=0.1073, beats_loss=0.01229, ecapa_loss=0.0001645, whisper_loss=0.09335, over 23133.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01066, ecapa_loss=0.0001584, whisper_loss=0.09071, over 3863705.82 frames. ], batch size: 93, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:16:38,184 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.38 vs. limit=12.0 2024-08-14 07:16:58,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2541420.0, ans=0.2 2024-08-14 07:17:04,249 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.893e+01 2.379e+01 2.592e+01 2.812e+01 4.047e+01, threshold=5.183e+01, percent-clipped=0.0 2024-08-14 07:17:10,962 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2541520.0, ans=0.0 2024-08-14 07:17:17,667 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 19 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-14 07:17:29,178 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2541720.0, ans=0.1 2024-08-14 07:17:30,266 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 7800, loss[loss=0.09493, beats_loss=0.01239, ecapa_loss=0.0001633, whisper_loss=0.0809, over 21630.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01058, ecapa_loss=0.0001584, whisper_loss=0.0911, over 3891778.12 frames. ], batch size: 90, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:17:36,529 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 31 from Vox, 27 fro AS 2024-08-14 07:18:06,175 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.28 vs. limit=22.5 2024-08-14 07:18:11,751 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-14 07:18:16,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2542020.0, ans=0.0 2024-08-14 07:18:28,660 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2542120.0, ans=0.125 2024-08-14 07:18:36,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2542120.0, ans=0.125 2024-08-14 07:18:42,528 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 34 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-14 07:18:42,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2542220.0, ans=0.125 2024-08-14 07:18:43,545 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 7850, loss[loss=0.1262, beats_loss=0.007866, ecapa_loss=0.0001513, whisper_loss=0.1168, over 22690.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0106, ecapa_loss=0.0001587, whisper_loss=0.09155, over 3897322.26 frames. ], batch size: 86, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:18:45,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2542220.0, ans=0.125 2024-08-14 07:18:48,267 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 35 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-14 07:19:29,266 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.969e+01 2.387e+01 2.602e+01 2.902e+01 1.105e+02, threshold=5.203e+01, percent-clipped=1.0 2024-08-14 07:19:44,215 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2542620.0, ans=0.125 2024-08-14 07:19:54,419 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 7900, loss[loss=0.117, beats_loss=0.01075, ecapa_loss=0.0001362, whisper_loss=0.1049, over 22891.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01069, ecapa_loss=0.0001581, whisper_loss=0.09185, over 3909609.30 frames. ], batch size: 89, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:20:24,692 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 13 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-14 07:20:28,856 WARNING [optim.py:496] (0/4) Scaling gradients by 0.04889056831598282, model_norm_threshold=52.03104019165039 2024-08-14 07:20:29,045 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.0.norm.log_scale with proportion 0.15, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.668e+05, grad_sumsq=1.668e+05, orig_rms_sq=1.000e+00 2024-08-14 07:20:51,253 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 15 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-14 07:21:02,655 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 24 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-14 07:21:06,589 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 7950, loss[loss=0.1082, beats_loss=0.01146, ecapa_loss=0.0001513, whisper_loss=0.09524, over 20726.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01076, ecapa_loss=0.0001582, whisper_loss=0.09124, over 3896906.90 frames. ], batch size: 81, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:21:08,523 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2543220.0, ans=0.2 2024-08-14 07:21:26,539 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2543320.0, ans=0.0 2024-08-14 07:21:38,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2543420.0, ans=0.125 2024-08-14 07:21:49,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2543420.0, ans=0.1 2024-08-14 07:21:54,293 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.818e+01 2.373e+01 2.668e+01 3.252e+01 1.064e+03, threshold=5.336e+01, percent-clipped=2.0 2024-08-14 07:22:19,736 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 8000, loss[loss=0.0955, beats_loss=0.01169, ecapa_loss=0.000184, whisper_loss=0.08198, over 22032.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01073, ecapa_loss=0.000158, whisper_loss=0.09192, over 3897826.99 frames. ], batch size: 94, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:22:23,191 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.90 vs. limit=22.5 2024-08-14 07:22:24,378 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 21 from LS+wenet, 12 from Vox, 48 fro AS 2024-08-14 07:22:25,801 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 17 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-14 07:22:48,422 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=2543920.0, ans=0.5 2024-08-14 07:23:00,071 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-14 07:23:10,594 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2544020.0, ans=0.0 2024-08-14 07:23:33,341 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 8050, loss[loss=0.1118, beats_loss=0.01083, ecapa_loss=0.0001384, whisper_loss=0.09962, over 19926.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01069, ecapa_loss=0.0001583, whisper_loss=0.09228, over 3909418.14 frames. ], batch size: 78, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:23:38,135 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2544220.0, ans=0.1 2024-08-14 07:23:47,034 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 18 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-14 07:24:14,958 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.09 vs. limit=15.0 2024-08-14 07:24:20,017 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.420e+01 2.579e+01 3.062e+01 1.369e+02, threshold=5.158e+01, percent-clipped=2.0 2024-08-14 07:24:20,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2544520.0, ans=0.125 2024-08-14 07:24:24,332 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 14 from Vox, 47 fro AS 2024-08-14 07:24:41,953 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2544620.0, ans=0.125 2024-08-14 07:24:45,538 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 8100, loss[loss=0.1118, beats_loss=0.01016, ecapa_loss=0.0001491, whisper_loss=0.1002, over 22455.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01077, ecapa_loss=0.0001571, whisper_loss=0.09204, over 3913963.43 frames. ], batch size: 89, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:24:48,920 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2544720.0, ans=0.1 2024-08-14 07:24:54,115 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 14 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-14 07:25:00,664 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.19 vs. limit=10.0 2024-08-14 07:25:02,196 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.62 vs. limit=15.0 2024-08-14 07:25:06,724 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-14 07:25:15,960 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2544920.0, ans=0.125 2024-08-14 07:25:18,595 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 20 from LS+wenet, 22 from Vox, 52 fro AS 2024-08-14 07:25:34,239 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2545020.0, ans=0.125 2024-08-14 07:25:36,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2545020.0, ans=0.125 2024-08-14 07:25:55,751 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 8150, loss[loss=0.08127, beats_loss=0.01174, ecapa_loss=0.0001614, whisper_loss=0.06792, over 13777.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01069, ecapa_loss=0.0001581, whisper_loss=0.09217, over 3925617.30 frames. ], batch size: 58, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:26:06,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2545220.0, ans=0.125 2024-08-14 07:26:08,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2545320.0, ans=0.1 2024-08-14 07:26:20,206 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 14 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-14 07:26:32,897 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 31 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-14 07:26:36,860 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 13 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-14 07:26:41,161 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.944e+01 2.412e+01 2.672e+01 3.051e+01 4.273e+01, threshold=5.344e+01, percent-clipped=0.0 2024-08-14 07:26:44,053 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-14 07:26:52,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2545620.0, ans=0.125 2024-08-14 07:26:58,403 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2545620.0, ans=0.1 2024-08-14 07:26:58,517 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 07:27:00,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2545620.0, ans=0.0 2024-08-14 07:27:00,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2545620.0, ans=0.125 2024-08-14 07:27:04,425 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2545620.0, ans=0.125 2024-08-14 07:27:06,480 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 8200, loss[loss=0.09889, beats_loss=0.01048, ecapa_loss=0.0001552, whisper_loss=0.08686, over 22387.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01063, ecapa_loss=0.0001578, whisper_loss=0.09254, over 3916458.95 frames. ], batch size: 90, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:27:11,293 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.23 vs. limit=15.0 2024-08-14 07:27:12,302 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 23 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-14 07:27:31,563 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 19 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-14 07:27:35,501 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-14 07:27:40,189 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.70 vs. limit=10.0 2024-08-14 07:27:47,708 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.27 vs. limit=15.0 2024-08-14 07:27:49,255 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.13 vs. limit=12.0 2024-08-14 07:28:09,879 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2546120.0, ans=0.1 2024-08-14 07:28:09,894 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2546120.0, ans=0.125 2024-08-14 07:28:14,863 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.27 vs. limit=22.5 2024-08-14 07:28:18,244 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 8250, loss[loss=0.09679, beats_loss=0.01215, ecapa_loss=0.0001623, whisper_loss=0.08302, over 14974.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01067, ecapa_loss=0.0001578, whisper_loss=0.09205, over 3902191.11 frames. ], batch size: 59, lr: 3.47e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:28:22,363 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 37 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-14 07:28:30,982 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-14 07:28:37,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2546320.0, ans=0.125 2024-08-14 07:29:03,854 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.865e+01 2.321e+01 2.631e+01 2.915e+01 1.588e+02, threshold=5.262e+01, percent-clipped=1.0 2024-08-14 07:29:05,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2546520.0, ans=0.0 2024-08-14 07:29:08,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2546520.0, ans=0.2 2024-08-14 07:29:12,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2546520.0, ans=0.1 2024-08-14 07:29:16,733 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-14 07:29:19,751 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 23 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-14 07:29:23,335 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 19 from LS+wenet, 25 from Vox, 46 fro AS 2024-08-14 07:29:32,255 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 8300, loss[loss=0.1077, beats_loss=0.01036, ecapa_loss=0.0001557, whisper_loss=0.09576, over 22228.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01068, ecapa_loss=0.0001569, whisper_loss=0.09213, over 3898446.90 frames. ], batch size: 90, lr: 3.47e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:29:32,464 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-14 07:29:35,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2546720.0, ans=0.1 2024-08-14 07:29:38,709 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 19 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-14 07:29:42,813 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 25 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-14 07:29:52,291 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 24 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-14 07:30:02,474 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.87 vs. limit=15.0 2024-08-14 07:30:10,744 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 19 from LS+wenet, 12 from Vox, 38 fro AS 2024-08-14 07:30:30,500 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-14 07:30:31,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2547020.0, ans=0.125 2024-08-14 07:30:43,263 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2547120.0, ans=0.125 2024-08-14 07:30:48,537 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 8350, loss[loss=0.1083, beats_loss=0.009994, ecapa_loss=0.0001222, whisper_loss=0.09712, over 23353.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01067, ecapa_loss=0.0001565, whisper_loss=0.09205, over 3896824.81 frames. ], batch size: 90, lr: 3.47e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:30:57,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2547220.0, ans=0.0 2024-08-14 07:31:25,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2547420.0, ans=0.125 2024-08-14 07:31:29,636 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2547420.0, ans=0.2 2024-08-14 07:31:35,331 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.881e+01 2.329e+01 2.543e+01 2.806e+01 3.860e+01, threshold=5.087e+01, percent-clipped=0.0 2024-08-14 07:31:43,164 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.987e+05 2024-08-14 07:31:45,858 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 22 from LS+wenet, 17 from Vox, 17 fro AS 2024-08-14 07:31:46,130 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2547620.0, ans=0.1 2024-08-14 07:32:01,778 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 8400, loss[loss=0.09346, beats_loss=0.01192, ecapa_loss=0.0001429, whisper_loss=0.08011, over 19183.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01079, ecapa_loss=0.0001563, whisper_loss=0.09154, over 3884809.50 frames. ], batch size: 78, lr: 3.47e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:32:06,492 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2547720.0, ans=0.0 2024-08-14 07:32:14,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=2547720.0, ans=10.0 2024-08-14 07:32:14,135 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.52 vs. limit=15.0 2024-08-14 07:32:27,336 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 19 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-14 07:32:56,824 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-14 07:32:57,145 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2548020.0, ans=0.0 2024-08-14 07:32:59,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2548120.0, ans=0.07 2024-08-14 07:33:13,960 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 8450, loss[loss=0.09116, beats_loss=0.01039, ecapa_loss=0.0001392, whisper_loss=0.07938, over 17521.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01075, ecapa_loss=0.0001573, whisper_loss=0.09092, over 3865699.48 frames. ], batch size: 70, lr: 3.47e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:33:27,245 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-14 07:33:37,404 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 13 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-14 07:33:43,230 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 34 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-14 07:33:44,741 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2548420.0, ans=0.125 2024-08-14 07:33:54,524 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=2548420.0, ans=0.5 2024-08-14 07:33:59,486 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.934e+01 2.373e+01 2.582e+01 3.046e+01 4.610e+01, threshold=5.164e+01, percent-clipped=0.0 2024-08-14 07:34:16,121 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2548620.0, ans=0.125 2024-08-14 07:34:25,517 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 8500, loss[loss=0.08032, beats_loss=0.009816, ecapa_loss=0.0001883, whisper_loss=0.06862, over 15042.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01072, ecapa_loss=0.0001574, whisper_loss=0.09083, over 3878541.77 frames. ], batch size: 62, lr: 3.47e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:34:34,651 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 27 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-14 07:34:40,232 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2548820.0, ans=0.0 2024-08-14 07:34:48,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2548820.0, ans=0.0 2024-08-14 07:34:56,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2548920.0, ans=0.125 2024-08-14 07:35:17,844 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 18 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-14 07:35:30,399 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 18 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-14 07:35:33,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2549120.0, ans=0.0 2024-08-14 07:35:36,233 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 8550, loss[loss=0.117, beats_loss=0.00755, ecapa_loss=0.0002079, whisper_loss=0.1074, over 18632.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01068, ecapa_loss=0.0001574, whisper_loss=0.09113, over 3877758.02 frames. ], batch size: 75, lr: 3.47e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:35:38,276 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2549220.0, ans=0.0 2024-08-14 07:35:42,272 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-14 07:35:42,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2549220.0, ans=0.1 2024-08-14 07:35:49,126 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-14 07:35:54,680 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 17 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-14 07:36:09,345 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 34 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-14 07:36:13,822 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2549420.0, ans=0.2 2024-08-14 07:36:16,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2549420.0, ans=0.125 2024-08-14 07:36:16,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2549420.0, ans=0.0 2024-08-14 07:36:20,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2549520.0, ans=0.0 2024-08-14 07:36:22,558 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.968e+01 2.413e+01 2.608e+01 2.932e+01 1.178e+02, threshold=5.217e+01, percent-clipped=2.0 2024-08-14 07:36:32,720 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 14 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-14 07:36:50,415 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 8600, loss[loss=0.08478, beats_loss=0.01097, ecapa_loss=0.0001983, whisper_loss=0.07183, over 20553.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01067, ecapa_loss=0.0001577, whisper_loss=0.09127, over 3883044.69 frames. ], batch size: 94, lr: 3.47e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:37:00,451 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2549720.0, ans=0.1 2024-08-14 07:37:03,383 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 25 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-14 07:37:03,735 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2549820.0, ans=0.0 2024-08-14 07:37:17,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2549820.0, ans=0.5 2024-08-14 07:37:23,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2549920.0, ans=0.0 2024-08-14 07:37:31,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2549920.0, ans=0.125 2024-08-14 07:37:33,831 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.24 vs. limit=10.0 2024-08-14 07:37:41,251 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.67 vs. limit=15.0 2024-08-14 07:37:43,269 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.96 vs. limit=15.0 2024-08-14 07:37:48,623 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2550020.0, ans=0.0 2024-08-14 07:37:52,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2550120.0, ans=0.2 2024-08-14 07:37:54,178 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 23 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-14 07:37:55,540 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 17 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-14 07:38:09,087 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 8650, loss[loss=0.1142, beats_loss=0.008278, ecapa_loss=0.0001905, whisper_loss=0.104, over 18050.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01069, ecapa_loss=0.0001574, whisper_loss=0.09067, over 3873912.55 frames. ], batch size: 73, lr: 3.46e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:38:21,961 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.58 vs. limit=15.0 2024-08-14 07:38:41,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2550420.0, ans=0.125 2024-08-14 07:38:56,362 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.012e+01 2.324e+01 2.530e+01 2.821e+01 3.799e+01, threshold=5.059e+01, percent-clipped=0.0 2024-08-14 07:38:56,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2550520.0, ans=0.2 2024-08-14 07:39:00,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2550520.0, ans=0.125 2024-08-14 07:39:08,974 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=2550620.0, ans=0.95 2024-08-14 07:39:20,815 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 8700, loss[loss=0.1216, beats_loss=0.008261, ecapa_loss=0.0001562, whisper_loss=0.1118, over 19526.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01068, ecapa_loss=0.0001582, whisper_loss=0.09111, over 3881018.43 frames. ], batch size: 75, lr: 3.46e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:39:25,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2550720.0, ans=0.125 2024-08-14 07:39:42,757 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 18 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-14 07:39:54,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2550920.0, ans=0.0 2024-08-14 07:40:09,594 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-14 07:40:15,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2551020.0, ans=0.5 2024-08-14 07:40:15,658 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.68 vs. limit=15.0 2024-08-14 07:40:31,591 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 8750, loss[loss=0.1054, beats_loss=0.009785, ecapa_loss=0.0001466, whisper_loss=0.09417, over 15173.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0106, ecapa_loss=0.0001594, whisper_loss=0.09125, over 3865799.44 frames. ], batch size: 57, lr: 3.46e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:40:36,455 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 24 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-14 07:40:52,797 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.98 vs. limit=22.5 2024-08-14 07:40:53,720 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 22 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-14 07:40:55,712 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.52 vs. limit=15.0 2024-08-14 07:41:03,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2551420.0, ans=0.2 2024-08-14 07:41:06,229 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2551420.0, ans=0.2 2024-08-14 07:41:16,949 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.853e+01 2.282e+01 2.583e+01 2.856e+01 3.464e+01, threshold=5.167e+01, percent-clipped=0.0 2024-08-14 07:41:42,158 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 8800, loss[loss=0.08174, beats_loss=0.01347, ecapa_loss=0.0001921, whisper_loss=0.06635, over 17518.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0107, ecapa_loss=0.0001592, whisper_loss=0.09085, over 3861613.09 frames. ], batch size: 73, lr: 3.46e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:41:42,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2551720.0, ans=0.2 2024-08-14 07:42:28,089 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 13 from LS+wenet, 14 from Vox, 43 fro AS 2024-08-14 07:42:28,477 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.548e-03 2024-08-14 07:42:46,437 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2552120.0, ans=0.2 2024-08-14 07:42:54,288 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 8850, loss[loss=0.09987, beats_loss=0.0104, ecapa_loss=0.0001509, whisper_loss=0.08797, over 17870.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01083, ecapa_loss=0.0001568, whisper_loss=0.08951, over 3833325.45 frames. ], batch size: 71, lr: 3.46e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:43:13,663 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.06 vs. limit=10.0 2024-08-14 07:43:14,402 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2552320.0, ans=0.0 2024-08-14 07:43:26,830 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-14 07:43:34,855 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.43 vs. limit=15.0 2024-08-14 07:43:39,538 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.949e+01 2.372e+01 2.652e+01 3.112e+01 4.829e+01, threshold=5.305e+01, percent-clipped=0.0 2024-08-14 07:43:41,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2552520.0, ans=0.07 2024-08-14 07:43:48,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2552520.0, ans=0.0 2024-08-14 07:44:05,166 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 8900, loss[loss=0.1195, beats_loss=0.01044, ecapa_loss=0.000139, whisper_loss=0.1076, over 21764.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0108, ecapa_loss=0.0001568, whisper_loss=0.09035, over 3832672.15 frames. ], batch size: 85, lr: 3.46e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:44:30,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2552820.0, ans=0.1 2024-08-14 07:44:31,621 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 17 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-14 07:44:45,164 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 20 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-14 07:44:46,985 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2552920.0, ans=0.0 2024-08-14 07:44:50,660 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 24 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-14 07:44:56,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2553020.0, ans=0.125 2024-08-14 07:44:59,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2553020.0, ans=0.0 2024-08-14 07:45:13,060 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-14 07:45:15,190 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.52 vs. limit=6.0 2024-08-14 07:45:16,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2553220.0, ans=0.0 2024-08-14 07:45:16,876 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 8950, loss[loss=0.09287, beats_loss=0.009211, ecapa_loss=0.0001692, whisper_loss=0.08197, over 15711.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01087, ecapa_loss=0.0001557, whisper_loss=0.08973, over 3837186.45 frames. ], batch size: 61, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:45:17,172 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-14 07:45:20,192 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-14 07:45:26,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2553220.0, ans=0.1 2024-08-14 07:45:28,575 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 21 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-14 07:45:43,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2553320.0, ans=0.125 2024-08-14 07:45:48,732 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2553420.0, ans=0.125 2024-08-14 07:45:53,002 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2553420.0, ans=0.1 2024-08-14 07:46:03,605 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.450e+01 2.761e+01 3.148e+01 4.518e+01, threshold=5.522e+01, percent-clipped=0.0 2024-08-14 07:46:11,372 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 35 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-14 07:46:22,449 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=3.938e+00 2024-08-14 07:46:22,560 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.56 vs. limit=15.0 2024-08-14 07:46:23,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2553620.0, ans=0.125 2024-08-14 07:46:27,837 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 9000, loss[loss=0.09517, beats_loss=0.01303, ecapa_loss=0.0001226, whisper_loss=0.08091, over 20090.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01085, ecapa_loss=0.0001566, whisper_loss=0.09047, over 3869190.77 frames. ], batch size: 80, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:46:27,839 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-14 07:47:08,574 INFO [train_multi_KD3.py:1149] (0/4) Epoch 18, validation on ASR_libri: loss=0.2528, beats_loss=0, ecapa_loss=0.0005502, whisper_loss=0.2473, over 922467.00 frames. 2024-08-14 07:47:28,575 INFO [train_multi_KD3.py:1149] (0/4) Epoch 18, validation on SV_voxceleb1: loss=0.004391, beats_loss=0, ecapa_loss=0.0004391, whisper_loss=0, over 939242.00 frames. 2024-08-14 07:49:28,238 INFO [train_multi_KD3.py:1149] (0/4) Epoch 18, validation on AT_audioset: loss=0.02358, beats_loss=0.02358, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 07:49:28,242 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32601MB 2024-08-14 07:49:39,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2553720.0, ans=0.0 2024-08-14 07:49:39,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2553720.0, ans=0.125 2024-08-14 07:49:48,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2553820.0, ans=0.0 2024-08-14 07:49:59,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2553920.0, ans=0.125 2024-08-14 07:50:07,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2553920.0, ans=0.125 2024-08-14 07:50:27,920 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=3.978e-02 2024-08-14 07:50:37,705 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 21 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-14 07:50:38,792 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 9050, loss[loss=0.09572, beats_loss=0.01149, ecapa_loss=0.0001171, whisper_loss=0.08305, over 19056.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01077, ecapa_loss=0.0001576, whisper_loss=0.09063, over 3851297.21 frames. ], batch size: 74, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:50:40,358 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 21 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-14 07:50:45,146 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2554220.0, ans=0.125 2024-08-14 07:50:57,514 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=13.95 vs. limit=15.0 2024-08-14 07:51:15,567 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 26 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-14 07:51:15,806 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2554420.0, ans=0.125 2024-08-14 07:51:24,820 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2554520.0, ans=0.125 2024-08-14 07:51:26,081 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2554520.0, ans=0.035 2024-08-14 07:51:27,693 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.000e+01 2.420e+01 2.680e+01 3.001e+01 5.357e+01, threshold=5.359e+01, percent-clipped=0.0 2024-08-14 07:51:34,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2554520.0, ans=0.125 2024-08-14 07:51:49,819 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.63 vs. limit=12.0 2024-08-14 07:51:55,921 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 9100, loss[loss=0.1013, beats_loss=0.009645, ecapa_loss=0.0001334, whisper_loss=0.09028, over 14791.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01072, ecapa_loss=0.0001577, whisper_loss=0.09051, over 3845398.56 frames. ], batch size: 55, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:52:01,311 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.70 vs. limit=22.5 2024-08-14 07:52:06,313 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2554720.0, ans=0.2 2024-08-14 07:52:11,536 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-14 07:52:12,051 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2554820.0, ans=10.0 2024-08-14 07:52:12,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2554820.0, ans=0.09899494936611666 2024-08-14 07:52:18,949 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 32 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-14 07:52:40,189 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 16 from LS+wenet, 27 from Vox, 24 fro AS 2024-08-14 07:52:42,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2555020.0, ans=0.125 2024-08-14 07:52:55,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=2555020.0, ans=15.0 2024-08-14 07:53:04,143 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2555120.0, ans=0.1 2024-08-14 07:53:11,773 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 9150, loss[loss=0.127, beats_loss=0.007764, ecapa_loss=0.0002089, whisper_loss=0.1171, over 21344.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01069, ecapa_loss=0.0001583, whisper_loss=0.09093, over 3852629.51 frames. ], batch size: 89, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:53:12,012 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 24 from LS+wenet, 22 from Vox, 48 fro AS 2024-08-14 07:53:27,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2555320.0, ans=0.2 2024-08-14 07:53:30,399 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-14 07:53:50,141 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 32 from Vox, 34 fro AS 2024-08-14 07:53:53,204 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2555520.0, ans=0.125 2024-08-14 07:53:54,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2555520.0, ans=0.0 2024-08-14 07:53:57,996 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.064e+01 2.410e+01 2.700e+01 3.056e+01 6.075e+01, threshold=5.399e+01, percent-clipped=3.0 2024-08-14 07:54:03,293 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.83 vs. limit=10.0 2024-08-14 07:54:21,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2555720.0, ans=0.2 2024-08-14 07:54:21,926 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 9200, loss[loss=0.1038, beats_loss=0.009414, ecapa_loss=0.0001648, whisper_loss=0.09272, over 21513.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01074, ecapa_loss=0.0001583, whisper_loss=0.09069, over 3880519.70 frames. ], batch size: 87, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:54:28,267 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2555720.0, ans=0.0 2024-08-14 07:54:31,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2555720.0, ans=0.0 2024-08-14 07:54:31,069 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2555720.0, ans=0.125 2024-08-14 07:54:39,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2555820.0, ans=0.0 2024-08-14 07:54:39,911 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.07 vs. limit=12.0 2024-08-14 07:55:06,743 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2556020.0, ans=0.125 2024-08-14 07:55:16,693 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2556020.0, ans=0.1 2024-08-14 07:55:33,304 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 9250, loss[loss=0.09569, beats_loss=0.01283, ecapa_loss=0.00014, whisper_loss=0.08146, over 22037.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01076, ecapa_loss=0.0001583, whisper_loss=0.09092, over 3912108.95 frames. ], batch size: 92, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:55:35,264 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2556220.0, ans=0.125 2024-08-14 07:55:40,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2556220.0, ans=0.0 2024-08-14 07:55:44,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2556220.0, ans=0.0 2024-08-14 07:55:53,403 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 20 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-14 07:55:59,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2556320.0, ans=0.0 2024-08-14 07:56:20,124 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.326e+01 2.739e+01 3.130e+01 4.617e+01, threshold=5.478e+01, percent-clipped=0.0 2024-08-14 07:56:28,845 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 26 from LS+wenet, 9 from Vox, 35 fro AS 2024-08-14 07:56:37,565 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-14 07:56:41,400 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 27 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-14 07:56:43,859 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 9300, loss[loss=0.1177, beats_loss=0.008555, ecapa_loss=0.0001646, whisper_loss=0.1075, over 20644.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01067, ecapa_loss=0.0001587, whisper_loss=0.09169, over 3930731.67 frames. ], batch size: 78, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:56:50,267 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 20 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-14 07:56:50,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2556720.0, ans=0.1 2024-08-14 07:57:00,437 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2556820.0, ans=0.1 2024-08-14 07:57:16,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2556920.0, ans=0.0 2024-08-14 07:57:39,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2557020.0, ans=0.04949747468305833 2024-08-14 07:57:56,467 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 9350, loss[loss=0.09562, beats_loss=0.01151, ecapa_loss=0.000134, whisper_loss=0.08277, over 22091.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01062, ecapa_loss=0.0001592, whisper_loss=0.09148, over 3914017.13 frames. ], batch size: 89, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:58:15,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2557320.0, ans=0.0 2024-08-14 07:58:15,424 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2557320.0, ans=0.0 2024-08-14 07:58:27,768 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2557420.0, ans=0.0 2024-08-14 07:58:30,996 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.12 vs. limit=15.0 2024-08-14 07:58:42,293 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.320e+01 2.600e+01 2.954e+01 6.976e+01, threshold=5.200e+01, percent-clipped=1.0 2024-08-14 07:58:44,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2557520.0, ans=0.0 2024-08-14 07:58:50,217 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2557520.0, ans=0.1 2024-08-14 07:58:58,516 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 34 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-14 07:59:06,885 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 9400, loss[loss=0.09994, beats_loss=0.01174, ecapa_loss=0.0001487, whisper_loss=0.08672, over 22050.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01066, ecapa_loss=0.0001586, whisper_loss=0.0916, over 3907287.20 frames. ], batch size: 89, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:59:09,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2557720.0, ans=0.125 2024-08-14 07:59:10,282 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 24 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-14 07:59:31,133 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 15 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-14 07:59:47,473 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.22 vs. limit=15.0 2024-08-14 07:59:54,476 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.67 vs. limit=15.0 2024-08-14 08:00:10,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2558120.0, ans=0.1 2024-08-14 08:00:17,382 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 9450, loss[loss=0.0807, beats_loss=0.0134, ecapa_loss=0.0001946, whisper_loss=0.06536, over 21191.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01068, ecapa_loss=0.0001608, whisper_loss=0.09119, over 3904486.44 frames. ], batch size: 94, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:00:24,809 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 17 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-14 08:00:41,904 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 30 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-14 08:00:51,158 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.73 vs. limit=15.0 2024-08-14 08:00:55,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2558420.0, ans=0.1 2024-08-14 08:00:59,187 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 20 from LS+wenet, 8 from Vox, 29 fro AS 2024-08-14 08:00:59,578 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2558520.0, ans=0.07 2024-08-14 08:01:04,819 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.392e+01 2.683e+01 2.998e+01 2.159e+02, threshold=5.366e+01, percent-clipped=1.0 2024-08-14 08:01:16,233 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=4.602e+00 2024-08-14 08:01:23,562 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 21 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-14 08:01:28,858 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 9500, loss[loss=0.1211, beats_loss=0.00923, ecapa_loss=0.0001913, whisper_loss=0.1099, over 16827.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01072, ecapa_loss=0.0001602, whisper_loss=0.09092, over 3915555.85 frames. ], batch size: 68, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:01:33,996 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.05 vs. limit=22.5 2024-08-14 08:01:37,324 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-14 08:02:07,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2558920.0, ans=0.125 2024-08-14 08:02:09,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2558920.0, ans=0.1 2024-08-14 08:02:11,712 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 36 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-14 08:02:26,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2559120.0, ans=0.0 2024-08-14 08:02:37,736 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.00 vs. limit=15.0 2024-08-14 08:02:39,859 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 9550, loss[loss=0.09074, beats_loss=0.0108, ecapa_loss=0.0002051, whisper_loss=0.07789, over 19664.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01069, ecapa_loss=0.0001613, whisper_loss=0.09075, over 3907618.85 frames. ], batch size: 83, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:02:46,618 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.00 vs. limit=10.0 2024-08-14 08:02:55,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2559320.0, ans=0.125 2024-08-14 08:03:01,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2559320.0, ans=0.125 2024-08-14 08:03:22,976 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2559520.0, ans=0.125 2024-08-14 08:03:24,429 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2559520.0, ans=0.125 2024-08-14 08:03:26,441 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.824e+01 2.380e+01 2.664e+01 3.149e+01 1.810e+02, threshold=5.328e+01, percent-clipped=2.0 2024-08-14 08:03:43,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2559620.0, ans=0.125 2024-08-14 08:03:50,383 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 9600, loss[loss=0.104, beats_loss=0.008743, ecapa_loss=0.0001913, whisper_loss=0.09333, over 18882.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01065, ecapa_loss=0.0001609, whisper_loss=0.0905, over 3849109.09 frames. ], batch size: 78, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:03:51,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2559720.0, ans=0.0 2024-08-14 08:03:54,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2559720.0, ans=0.0 2024-08-14 08:03:58,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2559720.0, ans=0.0 2024-08-14 08:04:07,067 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.50 vs. limit=15.0 2024-08-14 08:04:10,799 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 24 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-14 08:04:15,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2559820.0, ans=0.125 2024-08-14 08:04:30,154 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-256000.pt 2024-08-14 08:04:35,023 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 20 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-14 08:04:39,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2560020.0, ans=0.04949747468305833 2024-08-14 08:04:46,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2560020.0, ans=0.125 2024-08-14 08:04:59,981 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.37 vs. limit=15.0 2024-08-14 08:05:06,101 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 9650, loss[loss=0.1005, beats_loss=0.01197, ecapa_loss=0.0001577, whisper_loss=0.08695, over 22691.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0107, ecapa_loss=0.0001601, whisper_loss=0.09006, over 3823060.95 frames. ], batch size: 91, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:05:15,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2560220.0, ans=0.2 2024-08-14 08:05:16,449 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2560220.0, ans=0.1 2024-08-14 08:05:28,938 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-14 08:05:33,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2560420.0, ans=0.125 2024-08-14 08:05:37,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2560420.0, ans=0.125 2024-08-14 08:05:38,716 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 33 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-14 08:05:38,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2560420.0, ans=0.125 2024-08-14 08:05:39,089 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2560420.0, ans=0.125 2024-08-14 08:05:40,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2560420.0, ans=0.0 2024-08-14 08:05:43,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2560420.0, ans=0.0 2024-08-14 08:05:44,920 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2560420.0, ans=0.1 2024-08-14 08:05:46,128 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 23 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-14 08:05:48,985 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-14 08:05:52,944 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.378e+01 2.605e+01 3.057e+01 7.649e+01, threshold=5.209e+01, percent-clipped=3.0 2024-08-14 08:06:01,473 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2560620.0, ans=0.07 2024-08-14 08:06:16,720 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 9700, loss[loss=0.08002, beats_loss=0.01268, ecapa_loss=0.000161, whisper_loss=0.06573, over 13795.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01069, ecapa_loss=0.0001587, whisper_loss=0.09119, over 3883205.06 frames. ], batch size: 58, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:06:23,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2560720.0, ans=0.125 2024-08-14 08:06:31,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2560820.0, ans=0.125 2024-08-14 08:06:39,694 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 12 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-14 08:06:45,469 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-14 08:07:25,289 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.70 vs. limit=6.0 2024-08-14 08:07:28,667 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 9750, loss[loss=0.1037, beats_loss=0.0115, ecapa_loss=0.0001387, whisper_loss=0.0908, over 23472.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01074, ecapa_loss=0.0001576, whisper_loss=0.0904, over 3828261.08 frames. ], batch size: 95, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:07:29,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2561220.0, ans=0.125 2024-08-14 08:07:30,384 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-14 08:07:30,971 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.11 vs. limit=15.0 2024-08-14 08:07:48,953 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 29 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-14 08:07:54,705 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 29 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-14 08:08:15,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2561520.0, ans=0.1 2024-08-14 08:08:16,146 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.895e+01 2.202e+01 2.404e+01 2.626e+01 3.852e+01, threshold=4.808e+01, percent-clipped=0.0 2024-08-14 08:08:16,329 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 30 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-14 08:08:19,088 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 28 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-14 08:08:24,978 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 35 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-14 08:08:28,365 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.43 vs. limit=15.0 2024-08-14 08:08:40,264 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 9800, loss[loss=0.09357, beats_loss=0.01568, ecapa_loss=0.0001358, whisper_loss=0.07653, over 19257.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0107, ecapa_loss=0.0001575, whisper_loss=0.09142, over 3878274.19 frames. ], batch size: 78, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:08:52,322 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 23 from LS+wenet, 30 from Vox, 24 fro AS 2024-08-14 08:09:01,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2561820.0, ans=0.125 2024-08-14 08:09:03,598 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2561820.0, ans=0.125 2024-08-14 08:09:21,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2562020.0, ans=0.125 2024-08-14 08:09:26,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2562020.0, ans=0.125 2024-08-14 08:09:27,320 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-14 08:09:28,910 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2562020.0, ans=0.0 2024-08-14 08:09:42,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2562120.0, ans=0.1 2024-08-14 08:09:43,722 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 32 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-14 08:09:45,178 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2562120.0, ans=0.0 2024-08-14 08:09:46,208 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 23 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-14 08:09:49,395 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 20 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-14 08:09:50,590 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 9850, loss[loss=0.104, beats_loss=0.01025, ecapa_loss=0.0001398, whisper_loss=0.09231, over 15636.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01072, ecapa_loss=0.0001579, whisper_loss=0.09167, over 3879237.10 frames. ], batch size: 60, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:10:02,204 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2562220.0, ans=0.125 2024-08-14 08:10:15,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2562320.0, ans=0.2 2024-08-14 08:10:15,997 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-14 08:10:17,578 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 29 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-14 08:10:25,946 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 27 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-14 08:10:36,593 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.966e+01 2.409e+01 2.672e+01 2.970e+01 5.427e+01, threshold=5.345e+01, percent-clipped=1.0 2024-08-14 08:10:42,948 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2562520.0, ans=0.1 2024-08-14 08:10:53,870 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2562620.0, ans=0.125 2024-08-14 08:11:00,323 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 9900, loss[loss=0.0932, beats_loss=0.01149, ecapa_loss=0.0001601, whisper_loss=0.08011, over 20261.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01078, ecapa_loss=0.0001572, whisper_loss=0.09166, over 3894324.03 frames. ], batch size: 82, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:11:02,760 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.12 vs. limit=6.0 2024-08-14 08:11:10,027 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.73 vs. limit=15.0 2024-08-14 08:11:27,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2562920.0, ans=0.1 2024-08-14 08:11:34,802 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-14 08:11:42,532 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2563020.0, ans=0.2 2024-08-14 08:11:45,059 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 19 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-14 08:11:58,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2563120.0, ans=0.125 2024-08-14 08:11:59,571 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 08:12:01,773 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-14 08:12:11,578 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 9950, loss[loss=0.09808, beats_loss=0.01242, ecapa_loss=0.0001486, whisper_loss=0.08417, over 17110.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01083, ecapa_loss=0.0001561, whisper_loss=0.09112, over 3898670.54 frames. ], batch size: 70, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:12:16,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2563220.0, ans=0.0 2024-08-14 08:12:22,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2563220.0, ans=0.125 2024-08-14 08:12:42,866 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 32 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-14 08:12:50,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2563420.0, ans=0.125 2024-08-14 08:12:57,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2563520.0, ans=0.125 2024-08-14 08:12:58,420 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.310e+01 2.516e+01 2.952e+01 4.420e+01, threshold=5.033e+01, percent-clipped=0.0 2024-08-14 08:12:58,690 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 19 from LS+wenet, 23 from Vox, 47 fro AS 2024-08-14 08:13:04,286 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 22 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-14 08:13:22,568 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 10000, loss[loss=0.1184, beats_loss=0.00959, ecapa_loss=0.0001576, whisper_loss=0.1073, over 22226.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01079, ecapa_loss=0.0001569, whisper_loss=0.09085, over 3874750.28 frames. ], batch size: 87, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:13:24,250 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 22 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-14 08:13:36,029 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 08:13:40,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2563820.0, ans=0.1 2024-08-14 08:13:42,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2563820.0, ans=0.125 2024-08-14 08:13:51,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2563920.0, ans=0.1 2024-08-14 08:13:54,609 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.23 vs. limit=15.0 2024-08-14 08:13:56,515 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 23 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-14 08:14:02,638 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 10 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-14 08:14:06,174 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.47 vs. limit=10.0 2024-08-14 08:14:14,821 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2564020.0, ans=0.035 2024-08-14 08:14:23,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2564120.0, ans=0.0 2024-08-14 08:14:27,419 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.56 vs. limit=15.0 2024-08-14 08:14:33,679 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 10050, loss[loss=0.1019, beats_loss=0.01185, ecapa_loss=0.0001402, whisper_loss=0.08866, over 21237.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01083, ecapa_loss=0.0001567, whisper_loss=0.09084, over 3877689.77 frames. ], batch size: 86, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:14:34,295 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.57 vs. limit=15.0 2024-08-14 08:14:53,515 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.34 vs. limit=15.0 2024-08-14 08:15:00,040 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-14 08:15:18,388 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 25 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-14 08:15:18,752 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2564520.0, ans=0.125 2024-08-14 08:15:19,752 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-14 08:15:22,197 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.780e+01 2.332e+01 2.576e+01 2.987e+01 4.902e+01, threshold=5.151e+01, percent-clipped=0.0 2024-08-14 08:15:32,922 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 23 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-14 08:15:39,613 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.90 vs. limit=15.0 2024-08-14 08:15:40,510 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2564620.0, ans=0.0 2024-08-14 08:15:44,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2564620.0, ans=0.125 2024-08-14 08:15:47,020 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 10100, loss[loss=0.09826, beats_loss=0.01311, ecapa_loss=0.000146, whisper_loss=0.08368, over 21511.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01085, ecapa_loss=0.000158, whisper_loss=0.09114, over 3912720.15 frames. ], batch size: 90, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:15:49,303 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 31 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-14 08:15:51,055 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2564720.0, ans=0.125 2024-08-14 08:15:52,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2564720.0, ans=0.0 2024-08-14 08:16:31,816 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 22 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-14 08:16:33,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2564920.0, ans=0.125 2024-08-14 08:16:41,738 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 28 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-14 08:16:43,488 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-14 08:16:46,458 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2565020.0, ans=0.1 2024-08-14 08:16:53,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2565120.0, ans=0.125 2024-08-14 08:17:09,436 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 08:17:12,503 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 10150, loss[loss=0.09288, beats_loss=0.009574, ecapa_loss=0.0001843, whisper_loss=0.08146, over 20362.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0108, ecapa_loss=0.0001598, whisper_loss=0.09124, over 3939024.68 frames. ], batch size: 87, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:17:16,089 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2565220.0, ans=0.125 2024-08-14 08:17:27,953 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2565320.0, ans=0.0 2024-08-14 08:17:35,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2565320.0, ans=0.1 2024-08-14 08:17:50,562 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.11 vs. limit=12.0 2024-08-14 08:17:57,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten.whitening_limit, batch_count=2565420.0, ans=15.0 2024-08-14 08:18:01,297 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2565420.0, ans=0.1 2024-08-14 08:18:03,549 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.96 vs. limit=15.0 2024-08-14 08:18:04,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2565520.0, ans=0.2 2024-08-14 08:18:06,969 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 26 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-14 08:18:08,628 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.786e+01 2.364e+01 2.632e+01 2.890e+01 4.484e+01, threshold=5.264e+01, percent-clipped=0.0 2024-08-14 08:18:27,134 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-14 08:18:36,658 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 10200, loss[loss=0.09914, beats_loss=0.01247, ecapa_loss=0.000161, whisper_loss=0.08506, over 18043.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01079, ecapa_loss=0.0001592, whisper_loss=0.09078, over 3916395.10 frames. ], batch size: 72, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:18:36,929 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 27 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-14 08:19:02,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2565820.0, ans=0.125 2024-08-14 08:19:20,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2565920.0, ans=0.5 2024-08-14 08:19:38,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2566020.0, ans=0.0 2024-08-14 08:19:39,638 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 28 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-14 08:19:46,134 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 38 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-14 08:20:04,421 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 10250, loss[loss=0.1192, beats_loss=0.01096, ecapa_loss=0.000143, whisper_loss=0.1068, over 22979.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01075, ecapa_loss=0.0001597, whisper_loss=0.09092, over 3892207.53 frames. ], batch size: 89, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:20:12,749 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 26 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-14 08:20:13,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2566220.0, ans=0.0 2024-08-14 08:20:16,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2566220.0, ans=0.0 2024-08-14 08:20:21,262 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2566320.0, ans=0.125 2024-08-14 08:20:57,479 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.331e+01 2.528e+01 2.980e+01 4.721e+01, threshold=5.056e+01, percent-clipped=0.0 2024-08-14 08:21:04,818 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 35 from Vox, 30 fro AS 2024-08-14 08:21:17,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2566620.0, ans=0.125 2024-08-14 08:21:26,228 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.23 vs. limit=15.0 2024-08-14 08:21:26,748 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 10300, loss[loss=0.1166, beats_loss=0.01114, ecapa_loss=0.0001536, whisper_loss=0.1039, over 21963.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01075, ecapa_loss=0.0001602, whisper_loss=0.09034, over 3897208.44 frames. ], batch size: 87, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:21:37,729 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2566720.0, ans=0.125 2024-08-14 08:22:07,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2566920.0, ans=0.125 2024-08-14 08:22:08,699 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2566920.0, ans=0.0 2024-08-14 08:22:26,244 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2567020.0, ans=0.025 2024-08-14 08:22:38,644 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.00 vs. limit=15.0 2024-08-14 08:22:38,859 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.48 vs. limit=22.5 2024-08-14 08:22:51,133 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 10350, loss[loss=0.09482, beats_loss=0.01364, ecapa_loss=0.0001275, whisper_loss=0.0799, over 15627.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01079, ecapa_loss=0.0001584, whisper_loss=0.09105, over 3934654.57 frames. ], batch size: 63, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:22:56,810 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2567220.0, ans=0.125 2024-08-14 08:22:58,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2567220.0, ans=0.5 2024-08-14 08:23:01,570 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.81 vs. limit=15.0 2024-08-14 08:23:07,723 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-14 08:23:09,567 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 17 from Vox, 49 fro AS 2024-08-14 08:23:21,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2567320.0, ans=0.125 2024-08-14 08:23:51,940 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.017e+01 2.354e+01 2.584e+01 2.935e+01 4.636e+01, threshold=5.168e+01, percent-clipped=0.0 2024-08-14 08:23:55,688 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2567520.0, ans=0.0 2024-08-14 08:24:04,300 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 30 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-14 08:24:22,230 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 10400, loss[loss=0.1123, beats_loss=0.01012, ecapa_loss=0.0001788, whisper_loss=0.1004, over 15120.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01075, ecapa_loss=0.0001575, whisper_loss=0.09144, over 3941803.18 frames. ], batch size: 60, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:24:42,061 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 20 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-14 08:24:44,896 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.83 vs. limit=10.0 2024-08-14 08:24:51,927 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2567820.0, ans=0.04949747468305833 2024-08-14 08:24:53,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2567820.0, ans=0.125 2024-08-14 08:25:14,380 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 13 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-14 08:25:33,347 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.72 vs. limit=15.0 2024-08-14 08:25:49,537 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 10450, loss[loss=0.09323, beats_loss=0.009483, ecapa_loss=0.0001654, whisper_loss=0.08209, over 19742.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01075, ecapa_loss=0.0001578, whisper_loss=0.09058, over 3903617.55 frames. ], batch size: 78, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:25:50,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2568220.0, ans=0.125 2024-08-14 08:25:52,220 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=5.353e+01 2024-08-14 08:26:07,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2568320.0, ans=0.125 2024-08-14 08:26:14,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2568320.0, ans=0.125 2024-08-14 08:26:22,392 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 27 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-14 08:26:37,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2568520.0, ans=0.0 2024-08-14 08:26:41,412 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.325e+01 2.620e+01 2.982e+01 4.539e+01, threshold=5.240e+01, percent-clipped=0.0 2024-08-14 08:26:42,133 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.65 vs. limit=10.0 2024-08-14 08:26:45,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2568520.0, ans=0.05 2024-08-14 08:27:07,626 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.23 vs. limit=15.0 2024-08-14 08:27:08,350 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 10500, loss[loss=0.1178, beats_loss=0.0108, ecapa_loss=0.0001358, whisper_loss=0.1057, over 24329.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01071, ecapa_loss=0.0001574, whisper_loss=0.09107, over 3930648.30 frames. ], batch size: 91, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:27:20,521 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 22 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-14 08:27:34,405 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 32 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-14 08:27:43,989 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 20 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-14 08:27:59,451 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=24.03 vs. limit=22.5 2024-08-14 08:28:10,095 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 20 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-14 08:28:21,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2569120.0, ans=0.035 2024-08-14 08:28:41,526 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 10550, loss[loss=0.1155, beats_loss=0.009224, ecapa_loss=0.0001571, whisper_loss=0.1047, over 18925.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01063, ecapa_loss=0.0001578, whisper_loss=0.09123, over 3896550.50 frames. ], batch size: 75, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:28:59,346 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 27 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-14 08:29:03,051 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2569320.0, ans=0.0 2024-08-14 08:29:03,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2569320.0, ans=0.0 2024-08-14 08:29:20,962 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 21 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-14 08:29:36,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2569520.0, ans=0.125 2024-08-14 08:29:41,006 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.912e+01 2.277e+01 2.540e+01 2.895e+01 1.094e+02, threshold=5.080e+01, percent-clipped=1.0 2024-08-14 08:29:50,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2569620.0, ans=0.09899494936611666 2024-08-14 08:29:53,341 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 24 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-14 08:30:06,831 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 10600, loss[loss=0.1099, beats_loss=0.01019, ecapa_loss=0.0001507, whisper_loss=0.09816, over 22933.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01064, ecapa_loss=0.0001579, whisper_loss=0.09081, over 3892392.39 frames. ], batch size: 90, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:30:10,774 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 22 from LS+wenet, 8 from Vox, 24 fro AS 2024-08-14 08:30:27,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2569820.0, ans=0.1 2024-08-14 08:30:47,815 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2569920.0, ans=0.1 2024-08-14 08:30:58,293 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.15 vs. limit=22.5 2024-08-14 08:31:11,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2570120.0, ans=0.0 2024-08-14 08:31:12,267 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=12.09 vs. limit=15.0 2024-08-14 08:31:16,071 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 19 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-14 08:31:24,559 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 10650, loss[loss=0.1163, beats_loss=0.009597, ecapa_loss=0.0001575, whisper_loss=0.1051, over 23628.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01065, ecapa_loss=0.0001568, whisper_loss=0.09047, over 3864049.32 frames. ], batch size: 94, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:31:35,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2570220.0, ans=0.125 2024-08-14 08:31:43,005 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 26 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-14 08:31:49,895 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2570320.0, ans=0.125 2024-08-14 08:32:07,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2570420.0, ans=0.125 2024-08-14 08:32:08,698 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2570420.0, ans=0.2 2024-08-14 08:32:10,234 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2570420.0, ans=0.125 2024-08-14 08:32:16,410 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 24 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-14 08:32:21,096 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.360e+01 2.616e+01 3.033e+01 9.241e+01, threshold=5.233e+01, percent-clipped=1.0 2024-08-14 08:32:28,177 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 19 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-14 08:32:32,623 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2570620.0, ans=0.125 2024-08-14 08:32:33,106 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.39 vs. limit=12.0 2024-08-14 08:32:39,690 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.72 vs. limit=15.0 2024-08-14 08:32:52,950 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 10700, loss[loss=0.09999, beats_loss=0.01139, ecapa_loss=0.0001686, whisper_loss=0.08691, over 21735.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0106, ecapa_loss=0.0001581, whisper_loss=0.09122, over 3863336.29 frames. ], batch size: 90, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:33:03,522 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2570720.0, ans=0.0 2024-08-14 08:33:05,166 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2570720.0, ans=0.035 2024-08-14 08:33:08,401 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2570820.0, ans=0.125 2024-08-14 08:33:12,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2570820.0, ans=0.0 2024-08-14 08:33:20,991 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2570820.0, ans=0.0 2024-08-14 08:34:12,810 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2571120.0, ans=0.125 2024-08-14 08:34:21,011 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 10750, loss[loss=0.1147, beats_loss=0.01117, ecapa_loss=0.0001723, whisper_loss=0.1018, over 22474.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01075, ecapa_loss=0.0001572, whisper_loss=0.09055, over 3857580.90 frames. ], batch size: 92, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:34:33,241 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-14 08:34:35,055 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2571320.0, ans=0.1 2024-08-14 08:34:43,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2571320.0, ans=0.125 2024-08-14 08:34:49,407 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-14 08:34:53,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2571420.0, ans=0.125 2024-08-14 08:34:55,081 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2571420.0, ans=0.0 2024-08-14 08:35:03,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2571420.0, ans=0.0 2024-08-14 08:35:13,232 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.456e+01 2.714e+01 3.010e+01 3.209e+02, threshold=5.428e+01, percent-clipped=1.0 2024-08-14 08:35:15,104 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.21 vs. limit=15.0 2024-08-14 08:35:23,585 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2571620.0, ans=0.125 2024-08-14 08:35:36,833 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 38 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-14 08:35:38,268 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 10800, loss[loss=0.1239, beats_loss=0.01147, ecapa_loss=0.0001237, whisper_loss=0.1112, over 24479.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01075, ecapa_loss=0.0001567, whisper_loss=0.09166, over 3879096.65 frames. ], batch size: 93, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:35:47,957 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-14 08:35:49,524 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2571720.0, ans=0.125 2024-08-14 08:35:55,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2571820.0, ans=0.125 2024-08-14 08:36:02,859 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2571820.0, ans=0.125 2024-08-14 08:36:20,043 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2571920.0, ans=0.1 2024-08-14 08:36:44,328 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.92 vs. limit=15.0 2024-08-14 08:36:48,441 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 19 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-14 08:36:52,583 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 10850, loss[loss=0.1039, beats_loss=0.01028, ecapa_loss=0.0001713, whisper_loss=0.09186, over 19694.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01073, ecapa_loss=0.000158, whisper_loss=0.09209, over 3898887.44 frames. ], batch size: 80, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:36:57,514 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 34 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-14 08:37:22,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2572420.0, ans=0.0 2024-08-14 08:37:25,161 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 24 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-14 08:37:26,650 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 17 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-14 08:37:45,438 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.787e+01 2.429e+01 2.769e+01 3.241e+01 1.860e+02, threshold=5.537e+01, percent-clipped=2.0 2024-08-14 08:38:09,681 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 22 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-14 08:38:16,069 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.07 vs. limit=15.0 2024-08-14 08:38:17,642 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 10900, loss[loss=0.09539, beats_loss=0.01135, ecapa_loss=0.0001317, whisper_loss=0.08273, over 22121.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01066, ecapa_loss=0.000159, whisper_loss=0.09188, over 3913076.16 frames. ], batch size: 91, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:38:22,927 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 28 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-14 08:38:56,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2572920.0, ans=0.1 2024-08-14 08:39:00,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2572920.0, ans=0.125 2024-08-14 08:39:04,277 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.92 vs. limit=15.0 2024-08-14 08:39:06,182 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.68 vs. limit=22.5 2024-08-14 08:39:44,329 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 08:39:47,399 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 10950, loss[loss=0.07794, beats_loss=0.01256, ecapa_loss=0.0001418, whisper_loss=0.06396, over 15881.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01069, ecapa_loss=0.000158, whisper_loss=0.09193, over 3935724.37 frames. ], batch size: 63, lr: 3.45e-03, grad_scale: 1.152921504606847e+18 2024-08-14 08:39:53,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2573220.0, ans=0.0 2024-08-14 08:39:53,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2573220.0, ans=0.07 2024-08-14 08:39:56,234 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2573220.0, ans=0.1 2024-08-14 08:39:56,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2573220.0, ans=0.0 2024-08-14 08:40:08,575 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2573320.0, ans=0.125 2024-08-14 08:40:16,097 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2573420.0, ans=0.2 2024-08-14 08:40:37,044 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.991e+01 2.432e+01 2.668e+01 2.934e+01 4.215e+01, threshold=5.335e+01, percent-clipped=0.0 2024-08-14 08:40:57,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2573620.0, ans=0.0 2024-08-14 08:41:05,863 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 11000, loss[loss=0.09801, beats_loss=0.01106, ecapa_loss=0.0001673, whisper_loss=0.08528, over 18964.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01064, ecapa_loss=0.0001578, whisper_loss=0.09165, over 3908917.11 frames. ], batch size: 76, lr: 3.45e-03, grad_scale: 1.152921504606847e+18 2024-08-14 08:41:22,620 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2573820.0, ans=0.125 2024-08-14 08:41:23,851 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 16 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-14 08:41:29,195 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 22 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-14 08:41:38,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2573820.0, ans=0.125 2024-08-14 08:41:56,245 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 17 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-14 08:42:38,903 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 11050, loss[loss=0.06924, beats_loss=0.007888, ecapa_loss=0.0001768, whisper_loss=0.05958, over 15065.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01067, ecapa_loss=0.0001574, whisper_loss=0.09119, over 3905075.88 frames. ], batch size: 62, lr: 3.45e-03, grad_scale: 1.152921504606847e+18 2024-08-14 08:42:50,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2574220.0, ans=0.125 2024-08-14 08:43:25,703 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2574420.0, ans=0.0 2024-08-14 08:43:41,587 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.738e+01 2.313e+01 2.557e+01 2.807e+01 4.067e+01, threshold=5.114e+01, percent-clipped=0.0 2024-08-14 08:44:11,720 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2574620.0, ans=0.07 2024-08-14 08:44:20,943 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 11100, loss[loss=0.1084, beats_loss=0.01163, ecapa_loss=0.0001383, whisper_loss=0.09538, over 23123.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01071, ecapa_loss=0.0001575, whisper_loss=0.09116, over 3916411.41 frames. ], batch size: 91, lr: 3.45e-03, grad_scale: 1.152921504606847e+18 2024-08-14 08:44:28,531 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 24 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-14 08:44:49,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2574820.0, ans=0.125 2024-08-14 08:44:49,208 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2574820.0, ans=0.125 2024-08-14 08:44:51,779 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 22 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-14 08:45:17,568 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2574920.0, ans=0.1 2024-08-14 08:45:38,379 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 20 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-14 08:45:41,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2575020.0, ans=0.07 2024-08-14 08:45:44,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2575020.0, ans=0.125 2024-08-14 08:45:44,411 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2575020.0, ans=0.125 2024-08-14 08:45:58,458 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2575120.0, ans=0.125 2024-08-14 08:46:01,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2575120.0, ans=0.125 2024-08-14 08:46:13,805 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 11150, loss[loss=0.08982, beats_loss=0.0132, ecapa_loss=0.0001513, whisper_loss=0.0751, over 19255.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.0107, ecapa_loss=0.000157, whisper_loss=0.09199, over 3944521.83 frames. ], batch size: 78, lr: 3.45e-03, grad_scale: 1.152921504606847e+18 2024-08-14 08:46:19,770 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.42 vs. limit=10.0 2024-08-14 08:46:37,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2575320.0, ans=0.2 2024-08-14 08:46:41,006 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 10 from LS+wenet, 23 from Vox, 20 fro AS 2024-08-14 08:47:03,767 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2575420.0, ans=0.125 2024-08-14 08:47:23,455 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2575520.0, ans=0.125 2024-08-14 08:47:29,594 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.938e+01 2.286e+01 2.573e+01 3.032e+01 5.380e+01, threshold=5.147e+01, percent-clipped=1.0 2024-08-14 08:48:11,295 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 11200, loss[loss=0.1205, beats_loss=0.007638, ecapa_loss=0.000186, whisper_loss=0.111, over 20334.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01065, ecapa_loss=0.0001571, whisper_loss=0.0915, over 3902231.96 frames. ], batch size: 80, lr: 3.45e-03, grad_scale: 1.152921504606847e+18 2024-08-14 08:48:19,994 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 24 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-14 08:48:32,693 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2575820.0, ans=0.0 2024-08-14 08:48:41,541 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2575820.0, ans=0.0 2024-08-14 08:48:44,807 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-14 08:48:51,489 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.63 vs. limit=22.5 2024-08-14 08:49:04,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2575920.0, ans=0.09899494936611666 2024-08-14 08:49:19,514 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 18 from LS+wenet, 11 from Vox, 39 fro AS 2024-08-14 08:49:36,328 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 11250, loss[loss=0.1018, beats_loss=0.01273, ecapa_loss=0.0001383, whisper_loss=0.08773, over 22412.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01061, ecapa_loss=0.0001569, whisper_loss=0.09234, over 3890829.13 frames. ], batch size: 86, lr: 3.45e-03, grad_scale: 1.152921504606847e+18 2024-08-14 08:49:41,835 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-14 08:49:42,198 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2576220.0, ans=0.125 2024-08-14 08:49:47,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2576220.0, ans=0.0 2024-08-14 08:49:55,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2576320.0, ans=0.2 2024-08-14 08:49:55,690 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.46 vs. limit=12.0 2024-08-14 08:50:26,304 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.795e+01 2.439e+01 2.714e+01 2.976e+01 1.044e+02, threshold=5.429e+01, percent-clipped=2.0 2024-08-14 08:50:46,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2576620.0, ans=0.1 2024-08-14 08:50:54,401 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 11300, loss[loss=0.09543, beats_loss=0.009839, ecapa_loss=0.0002167, whisper_loss=0.08342, over 19166.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01056, ecapa_loss=0.0001571, whisper_loss=0.09229, over 3869199.08 frames. ], batch size: 81, lr: 3.45e-03, grad_scale: 1.152921504606847e+18 2024-08-14 08:50:54,705 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 33 from Vox, 35 fro AS 2024-08-14 08:50:56,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2576720.0, ans=0.125 2024-08-14 08:50:58,203 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 32 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-14 08:51:31,920 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-14 08:51:33,805 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2576920.0, ans=0.07 2024-08-14 08:51:41,455 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 29 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 08:51:46,596 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-14 08:51:49,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2577020.0, ans=0.125 2024-08-14 08:51:53,349 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2577020.0, ans=0.0 2024-08-14 08:52:16,122 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 11350, loss[loss=0.09388, beats_loss=0.0116, ecapa_loss=0.0001471, whisper_loss=0.08081, over 16010.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01056, ecapa_loss=0.0001578, whisper_loss=0.09225, over 3872043.61 frames. ], batch size: 63, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:52:16,306 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 30 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-14 08:52:24,831 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 25 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-14 08:52:34,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2577320.0, ans=0.0 2024-08-14 08:52:51,821 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2577420.0, ans=10.0 2024-08-14 08:52:56,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2577420.0, ans=0.125 2024-08-14 08:53:05,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2577520.0, ans=0.125 2024-08-14 08:53:14,113 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.788e+01 2.319e+01 2.550e+01 2.857e+01 6.146e+01, threshold=5.100e+01, percent-clipped=1.0 2024-08-14 08:53:30,872 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 22 from LS+wenet, 19 from Vox, 49 fro AS 2024-08-14 08:53:40,272 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 11400, loss[loss=0.09526, beats_loss=0.01171, ecapa_loss=0.0001487, whisper_loss=0.08206, over 22430.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01058, ecapa_loss=0.0001577, whisper_loss=0.0923, over 3863056.23 frames. ], batch size: 93, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:53:47,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2577720.0, ans=0.125 2024-08-14 08:53:59,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2577820.0, ans=0.125 2024-08-14 08:54:10,868 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.81 vs. limit=22.5 2024-08-14 08:54:24,298 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 30 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-14 08:54:36,772 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 22 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-14 08:54:38,094 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 31 from Vox, 36 fro AS 2024-08-14 08:54:41,267 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 26 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-14 08:54:50,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2578120.0, ans=0.0 2024-08-14 08:54:51,729 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 19 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-14 08:54:58,274 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 11450, loss[loss=0.1292, beats_loss=0.00801, ecapa_loss=0.000173, whisper_loss=0.1194, over 19491.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01068, ecapa_loss=0.0001574, whisper_loss=0.09172, over 3867188.72 frames. ], batch size: 76, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:55:01,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=2578220.0, ans=0.025 2024-08-14 08:55:18,118 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 28 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-14 08:55:35,895 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2578420.0, ans=0.0 2024-08-14 08:55:46,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2578520.0, ans=0.0 2024-08-14 08:55:48,038 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.836e+01 2.444e+01 2.679e+01 2.887e+01 5.368e+01, threshold=5.358e+01, percent-clipped=1.0 2024-08-14 08:56:13,509 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 11500, loss[loss=0.1181, beats_loss=0.01057, ecapa_loss=0.0001504, whisper_loss=0.106, over 22804.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01066, ecapa_loss=0.0001573, whisper_loss=0.09201, over 3880877.31 frames. ], batch size: 89, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:56:15,698 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2578720.0, ans=0.2 2024-08-14 08:56:19,723 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 29 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-14 08:56:24,505 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2578720.0, ans=0.0 2024-08-14 08:56:56,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2578920.0, ans=0.0 2024-08-14 08:56:59,630 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2024-08-14 08:57:14,208 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2579120.0, ans=0.1 2024-08-14 08:57:19,863 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 23 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-14 08:57:28,725 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 11550, loss[loss=0.1003, beats_loss=0.009634, ecapa_loss=0.0001828, whisper_loss=0.08889, over 15092.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.0106, ecapa_loss=0.0001581, whisper_loss=0.09229, over 3878201.49 frames. ], batch size: 62, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:57:29,492 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2579220.0, ans=0.2 2024-08-14 08:57:46,528 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2579320.0, ans=10.0 2024-08-14 08:57:54,266 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2579320.0, ans=0.2 2024-08-14 08:57:58,148 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 21 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-14 08:58:05,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2579420.0, ans=0.125 2024-08-14 08:58:05,883 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.89 vs. limit=12.0 2024-08-14 08:58:18,803 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.840e+01 2.371e+01 2.639e+01 3.011e+01 4.840e+01, threshold=5.278e+01, percent-clipped=0.0 2024-08-14 08:58:36,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2579620.0, ans=0.125 2024-08-14 08:58:38,756 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 22 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-14 08:58:41,440 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 11600, loss[loss=0.07596, beats_loss=0.01299, ecapa_loss=0.0001496, whisper_loss=0.06147, over 14701.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01055, ecapa_loss=0.0001588, whisper_loss=0.09223, over 3881582.33 frames. ], batch size: 60, lr: 3.44e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:58:49,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2579720.0, ans=0.1 2024-08-14 08:58:56,306 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 26 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-14 08:59:05,242 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 19 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-14 08:59:07,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2579820.0, ans=0.125 2024-08-14 08:59:09,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2579920.0, ans=0.125 2024-08-14 08:59:20,799 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2579920.0, ans=0.1 2024-08-14 08:59:25,264 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2580020.0, ans=0.1 2024-08-14 08:59:34,737 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2580020.0, ans=0.0 2024-08-14 08:59:36,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2580020.0, ans=0.0 2024-08-14 08:59:40,697 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2580120.0, ans=0.0 2024-08-14 08:59:46,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2580120.0, ans=0.125 2024-08-14 08:59:48,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2580120.0, ans=0.2 2024-08-14 08:59:53,120 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 11650, loss[loss=0.1159, beats_loss=0.01063, ecapa_loss=0.0001797, whisper_loss=0.1035, over 22110.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01065, ecapa_loss=0.0001582, whisper_loss=0.09139, over 3907975.71 frames. ], batch size: 91, lr: 3.44e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:59:53,657 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2580220.0, ans=0.125 2024-08-14 08:59:55,197 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.596e+01 2024-08-14 09:00:04,684 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 22 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-14 09:00:41,776 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.943e+01 2.343e+01 2.621e+01 2.855e+01 6.176e+01, threshold=5.243e+01, percent-clipped=1.0 2024-08-14 09:01:06,915 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 11700, loss[loss=0.1124, beats_loss=0.01102, ecapa_loss=0.000214, whisper_loss=0.09926, over 18900.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01074, ecapa_loss=0.0001593, whisper_loss=0.0907, over 3921789.23 frames. ], batch size: 77, lr: 3.44e-03, grad_scale: 5.764607523034235e+17 2024-08-14 09:01:08,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2580720.0, ans=0.125 2024-08-14 09:01:14,477 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 20 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-14 09:01:17,169 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 19 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-14 09:01:33,172 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 36 from Vox, 29 fro AS 2024-08-14 09:01:48,991 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-14 09:01:53,754 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2581020.0, ans=0.125 2024-08-14 09:02:02,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2581020.0, ans=0.125 2024-08-14 09:02:18,088 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 11750, loss[loss=0.08738, beats_loss=0.01385, ecapa_loss=0.0001317, whisper_loss=0.07222, over 20648.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0108, ecapa_loss=0.0001597, whisper_loss=0.09044, over 3908407.63 frames. ], batch size: 88, lr: 3.44e-03, grad_scale: 5.764607523034235e+17 2024-08-14 09:02:18,303 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-14 09:02:39,532 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2581320.0, ans=0.09899494936611666 2024-08-14 09:02:54,403 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2581420.0, ans=0.125 2024-08-14 09:03:06,409 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.73 vs. limit=15.0 2024-08-14 09:03:07,034 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.345e+01 2.656e+01 2.861e+01 8.705e+01, threshold=5.311e+01, percent-clipped=2.0 2024-08-14 09:03:07,607 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.808e-03 2024-08-14 09:03:08,781 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2581520.0, ans=0.125 2024-08-14 09:03:14,530 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 28 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-14 09:03:15,932 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 26 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-14 09:03:21,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2581620.0, ans=0.125 2024-08-14 09:03:25,149 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-14 09:03:28,591 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.20 vs. limit=15.0 2024-08-14 09:03:30,118 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 11800, loss[loss=0.07724, beats_loss=0.01175, ecapa_loss=0.000184, whisper_loss=0.06365, over 21193.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01078, ecapa_loss=0.0001594, whisper_loss=0.09059, over 3924634.44 frames. ], batch size: 93, lr: 3.44e-03, grad_scale: 5.764607523034235e+17 2024-08-14 09:03:35,484 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2581720.0, ans=0.2 2024-08-14 09:03:37,426 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 27 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-14 09:03:45,230 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 15 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-14 09:03:47,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2581820.0, ans=0.05 2024-08-14 09:03:55,904 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 19 from LS+wenet, 9 from Vox, 28 fro AS 2024-08-14 09:03:59,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2581820.0, ans=0.1 2024-08-14 09:04:01,902 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-14 09:04:25,054 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.78 vs. limit=12.0 2024-08-14 09:04:47,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2582120.0, ans=0.125 2024-08-14 09:04:53,135 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2582120.0, ans=0.125 2024-08-14 09:04:54,585 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2582120.0, ans=0.0 2024-08-14 09:04:58,259 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 11850, loss[loss=0.1055, beats_loss=0.01198, ecapa_loss=0.0001289, whisper_loss=0.09227, over 23349.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01075, ecapa_loss=0.0001594, whisper_loss=0.09115, over 3915064.63 frames. ], batch size: 91, lr: 3.44e-03, grad_scale: 5.764607523034235e+17 2024-08-14 09:05:21,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2582320.0, ans=0.2 2024-08-14 09:05:36,440 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2582420.0, ans=0.125 2024-08-14 09:05:48,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2582420.0, ans=0.2 2024-08-14 09:05:51,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2582520.0, ans=0.0 2024-08-14 09:06:01,227 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.908e+01 2.374e+01 2.631e+01 2.932e+01 6.705e+01, threshold=5.263e+01, percent-clipped=1.0 2024-08-14 09:06:06,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2582520.0, ans=0.2 2024-08-14 09:06:19,253 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-14 09:06:30,906 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 11900, loss[loss=0.109, beats_loss=0.009077, ecapa_loss=0.0001553, whisper_loss=0.09841, over 18762.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01073, ecapa_loss=0.0001591, whisper_loss=0.0913, over 3933934.13 frames. ], batch size: 72, lr: 3.44e-03, grad_scale: 5.764607523034235e+17 2024-08-14 09:06:31,755 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.13 vs. limit=15.0 2024-08-14 09:06:40,400 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 19 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-14 09:07:10,098 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=10.67 vs. limit=12.0 2024-08-14 09:08:01,781 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 11950, loss[loss=0.08998, beats_loss=0.01069, ecapa_loss=0.0001621, whisper_loss=0.07767, over 17567.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01075, ecapa_loss=0.0001589, whisper_loss=0.09037, over 3897092.08 frames. ], batch size: 73, lr: 3.44e-03, grad_scale: 5.764607523034235e+17 2024-08-14 09:08:18,679 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.56 vs. limit=12.0 2024-08-14 09:08:18,845 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.38 vs. limit=10.0 2024-08-14 09:08:21,230 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 26 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-14 09:08:25,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2583320.0, ans=0.125 2024-08-14 09:08:40,796 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.92 vs. limit=15.0 2024-08-14 09:08:51,223 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 19 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-14 09:08:55,745 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.707e+01 2.350e+01 2.661e+01 2.995e+01 4.363e+01, threshold=5.322e+01, percent-clipped=0.0 2024-08-14 09:09:04,850 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 27 from LS+wenet, 12 from Vox, 21 fro AS 2024-08-14 09:09:10,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2583620.0, ans=0.0 2024-08-14 09:09:10,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2583620.0, ans=0.95 2024-08-14 09:09:18,072 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=4.388e-01 2024-08-14 09:09:22,784 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 12000, loss[loss=0.09887, beats_loss=0.01262, ecapa_loss=0.0001463, whisper_loss=0.08479, over 21851.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01072, ecapa_loss=0.0001586, whisper_loss=0.09101, over 3917070.05 frames. ], batch size: 87, lr: 3.44e-03, grad_scale: 5.764607523034235e+17 2024-08-14 09:09:22,785 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-14 09:10:01,019 INFO [train_multi_KD3.py:1149] (0/4) Epoch 18, validation on ASR_libri: loss=0.2533, beats_loss=0, ecapa_loss=0.0005459, whisper_loss=0.2479, over 922467.00 frames. 2024-08-14 09:10:19,258 INFO [train_multi_KD3.py:1149] (0/4) Epoch 18, validation on SV_voxceleb1: loss=0.004372, beats_loss=0, ecapa_loss=0.0004372, whisper_loss=0, over 939242.00 frames. 2024-08-14 09:12:09,247 INFO [train_multi_KD3.py:1149] (0/4) Epoch 18, validation on AT_audioset: loss=0.02349, beats_loss=0.02349, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 09:12:09,251 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32601MB 2024-08-14 09:12:19,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2583720.0, ans=0.0 2024-08-14 09:12:35,444 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2583820.0, ans=0.2 2024-08-14 09:12:37,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2583820.0, ans=0.04949747468305833 2024-08-14 09:12:55,221 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 28 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-14 09:13:09,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2584020.0, ans=0.0 2024-08-14 09:13:14,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2584120.0, ans=0.125 2024-08-14 09:13:21,706 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 24 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-14 09:13:27,587 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 12050, loss[loss=0.1143, beats_loss=0.009595, ecapa_loss=0.0001603, whisper_loss=0.1031, over 22121.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01068, ecapa_loss=0.0001576, whisper_loss=0.09101, over 3895726.19 frames. ], batch size: 89, lr: 3.44e-03, grad_scale: 5.764607523034235e+17 2024-08-14 09:13:36,591 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2584220.0, ans=0.0 2024-08-14 09:13:52,121 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2584320.0, ans=0.2 2024-08-14 09:13:59,467 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 26 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-14 09:14:02,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2584420.0, ans=0.125 2024-08-14 09:14:04,830 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.58 vs. limit=15.0 2024-08-14 09:14:05,418 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-14 09:14:06,702 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 19 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-14 09:14:08,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2584420.0, ans=0.125 2024-08-14 09:14:20,240 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.845e+01 2.316e+01 2.502e+01 2.940e+01 4.119e+01, threshold=5.004e+01, percent-clipped=0.0 2024-08-14 09:14:33,309 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 22 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-14 09:14:35,216 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2584620.0, ans=0.125 2024-08-14 09:14:44,913 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 12100, loss[loss=0.1093, beats_loss=0.009328, ecapa_loss=0.0001854, whisper_loss=0.09815, over 19537.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01067, ecapa_loss=0.0001581, whisper_loss=0.0911, over 3909354.63 frames. ], batch size: 80, lr: 3.44e-03, grad_scale: 5.764607523034235e+17 2024-08-14 09:15:07,767 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2584820.0, ans=0.125 2024-08-14 09:15:09,179 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2584820.0, ans=0.125 2024-08-14 09:15:23,635 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-14 09:15:28,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2585020.0, ans=0.0 2024-08-14 09:15:47,817 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 14 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-14 09:15:51,020 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.464e+05 2024-08-14 09:15:51,343 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.61 vs. limit=15.0 2024-08-14 09:15:59,267 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.960e+00 2024-08-14 09:16:00,123 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 12150, loss[loss=0.1154, beats_loss=0.009756, ecapa_loss=0.0001693, whisper_loss=0.1039, over 21747.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01064, ecapa_loss=0.0001579, whisper_loss=0.09134, over 3845700.26 frames. ], batch size: 88, lr: 3.44e-03, grad_scale: 5.764607523034235e+17 2024-08-14 09:16:05,527 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2585220.0, ans=0.1 2024-08-14 09:16:12,495 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 32 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-14 09:16:17,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2585320.0, ans=0.2 2024-08-14 09:16:22,133 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=2585320.0, ans=10.0 2024-08-14 09:16:24,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2585320.0, ans=0.125 2024-08-14 09:16:30,722 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.82 vs. limit=15.0 2024-08-14 09:16:50,852 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.916e+01 2.391e+01 2.592e+01 3.075e+01 2.876e+02, threshold=5.185e+01, percent-clipped=6.0 2024-08-14 09:17:01,896 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 18 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-14 09:17:06,850 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 25 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-14 09:17:15,305 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 12200, loss[loss=0.1158, beats_loss=0.009897, ecapa_loss=0.0001866, whisper_loss=0.1041, over 16561.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01064, ecapa_loss=0.0001572, whisper_loss=0.09153, over 3870321.04 frames. ], batch size: 71, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:17:20,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2585720.0, ans=0.125 2024-08-14 09:17:33,420 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.93 vs. limit=22.5 2024-08-14 09:17:38,233 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2585820.0, ans=0.125 2024-08-14 09:17:40,956 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2585820.0, ans=0.95 2024-08-14 09:17:42,066 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 23 from LS+wenet, 28 from Vox, 43 fro AS 2024-08-14 09:18:06,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2586020.0, ans=0.0 2024-08-14 09:18:13,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2586120.0, ans=0.2 2024-08-14 09:18:19,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2586120.0, ans=0.09899494936611666 2024-08-14 09:18:23,351 WARNING [optim.py:496] (0/4) Scaling gradients by 0.06917443126440048, model_norm_threshold=51.84561538696289 2024-08-14 09:18:23,562 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.22, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.256e+05, grad_sumsq=1.256e+05, orig_rms_sq=1.000e+00 2024-08-14 09:18:29,318 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 12250, loss[loss=0.1093, beats_loss=0.0112, ecapa_loss=0.0001279, whisper_loss=0.09686, over 23369.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.0106, ecapa_loss=0.0001586, whisper_loss=0.09165, over 3866432.41 frames. ], batch size: 87, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:18:37,353 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-14 09:19:20,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2586520.0, ans=0.0 2024-08-14 09:19:23,654 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.783e+01 2.394e+01 2.719e+01 3.099e+01 7.495e+02, threshold=5.439e+01, percent-clipped=1.0 2024-08-14 09:19:35,015 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 19 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-14 09:19:41,552 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 22 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-14 09:19:44,937 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.91 vs. limit=15.0 2024-08-14 09:19:47,136 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 12300, loss[loss=0.09184, beats_loss=0.01376, ecapa_loss=0.0001401, whisper_loss=0.07667, over 16218.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01061, ecapa_loss=0.0001589, whisper_loss=0.09159, over 3867688.64 frames. ], batch size: 67, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:20:05,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2586820.0, ans=0.0 2024-08-14 09:20:17,134 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 23 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-14 09:20:29,922 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 12 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-14 09:20:35,967 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.11 vs. limit=15.0 2024-08-14 09:21:13,962 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 15 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-14 09:21:22,169 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 12350, loss[loss=0.1177, beats_loss=0.009583, ecapa_loss=0.0001416, whisper_loss=0.1067, over 19629.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01063, ecapa_loss=0.000159, whisper_loss=0.0915, over 3903587.28 frames. ], batch size: 75, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:21:22,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2587220.0, ans=0.1 2024-08-14 09:21:25,655 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2587220.0, ans=0.0 2024-08-14 09:21:32,660 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2587220.0, ans=0.125 2024-08-14 09:21:34,274 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 26 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-14 09:21:37,683 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 18 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-14 09:21:42,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2587320.0, ans=0.025 2024-08-14 09:21:49,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=2587320.0, ans=15.0 2024-08-14 09:21:59,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2587420.0, ans=0.0 2024-08-14 09:22:05,463 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 20 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-14 09:22:12,723 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 17 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-14 09:22:24,040 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.766e+01 2.397e+01 2.618e+01 2.960e+01 3.782e+01, threshold=5.235e+01, percent-clipped=0.0 2024-08-14 09:22:27,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2587520.0, ans=0.07 2024-08-14 09:22:36,124 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-14 09:22:47,825 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 12400, loss[loss=0.1139, beats_loss=0.00864, ecapa_loss=0.0001581, whisper_loss=0.1037, over 15125.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01068, ecapa_loss=0.000157, whisper_loss=0.09136, over 3901051.06 frames. ], batch size: 60, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:22:51,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2587720.0, ans=0.125 2024-08-14 09:23:00,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2587720.0, ans=0.125 2024-08-14 09:23:45,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2588020.0, ans=0.0 2024-08-14 09:23:56,193 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2588120.0, ans=0.0 2024-08-14 09:24:02,893 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 12450, loss[loss=0.08932, beats_loss=0.01197, ecapa_loss=0.0001419, whisper_loss=0.07592, over 21507.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01074, ecapa_loss=0.0001566, whisper_loss=0.09064, over 3892654.94 frames. ], batch size: 89, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:24:09,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2588220.0, ans=0.0 2024-08-14 09:24:14,500 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 22 from LS+wenet, 16 from Vox, 17 fro AS 2024-08-14 09:24:15,184 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.65 vs. limit=15.0 2024-08-14 09:24:43,412 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 13 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-14 09:24:54,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2588520.0, ans=0.0 2024-08-14 09:24:55,285 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.811e+01 2.420e+01 2.657e+01 3.140e+01 9.625e+01, threshold=5.314e+01, percent-clipped=1.0 2024-08-14 09:25:01,792 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.17 vs. limit=15.0 2024-08-14 09:25:09,234 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2588620.0, ans=0.1 2024-08-14 09:25:18,900 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 12500, loss[loss=0.1087, beats_loss=0.01058, ecapa_loss=0.0001544, whisper_loss=0.09654, over 19605.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01073, ecapa_loss=0.000157, whisper_loss=0.09045, over 3874723.95 frames. ], batch size: 77, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:25:20,562 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 30 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-14 09:25:25,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2588720.0, ans=0.0 2024-08-14 09:25:28,587 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 14 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-14 09:25:30,223 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 21 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-14 09:25:33,465 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2588820.0, ans=0.125 2024-08-14 09:26:06,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2589020.0, ans=0.125 2024-08-14 09:26:06,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2589020.0, ans=0.125 2024-08-14 09:26:11,402 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2589020.0, ans=0.125 2024-08-14 09:26:20,506 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-14 09:26:23,729 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2589120.0, ans=0.2 2024-08-14 09:26:35,396 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 12550, loss[loss=0.1003, beats_loss=0.009774, ecapa_loss=0.0001138, whisper_loss=0.08941, over 15981.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01075, ecapa_loss=0.0001572, whisper_loss=0.09088, over 3918909.65 frames. ], batch size: 59, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:26:46,299 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 19 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-14 09:26:58,660 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2589320.0, ans=0.0 2024-08-14 09:27:08,073 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.84 vs. limit=15.0 2024-08-14 09:27:10,552 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 11 from LS+wenet, 23 from Vox, 20 fro AS 2024-08-14 09:27:19,662 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 23 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-14 09:27:29,344 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.797e+01 2.466e+01 2.734e+01 3.063e+01 5.302e+01, threshold=5.468e+01, percent-clipped=0.0 2024-08-14 09:27:39,943 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 26 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-14 09:27:54,991 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 12600, loss[loss=0.09339, beats_loss=0.01339, ecapa_loss=0.0001421, whisper_loss=0.07858, over 22544.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01074, ecapa_loss=0.0001573, whisper_loss=0.09168, over 3904610.64 frames. ], batch size: 94, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:28:58,023 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 26 from LS+wenet, 31 from Vox, 38 fro AS 2024-08-14 09:29:00,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2590020.0, ans=0.2 2024-08-14 09:29:13,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2590120.0, ans=0.125 2024-08-14 09:29:29,738 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 12650, loss[loss=0.09406, beats_loss=0.0125, ecapa_loss=0.0001194, whisper_loss=0.08037, over 20305.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01072, ecapa_loss=0.0001573, whisper_loss=0.09213, over 3925493.03 frames. ], batch size: 79, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:29:58,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2590320.0, ans=0.125 2024-08-14 09:30:12,220 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 09:30:14,723 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.05 vs. limit=15.0 2024-08-14 09:30:21,599 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2590420.0, ans=0.0 2024-08-14 09:30:29,627 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.96 vs. limit=15.0 2024-08-14 09:30:37,395 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.924e+01 2.252e+01 2.572e+01 2.889e+01 4.246e+01, threshold=5.144e+01, percent-clipped=0.0 2024-08-14 09:30:54,045 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 23 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-14 09:31:01,499 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 12700, loss[loss=0.09575, beats_loss=0.01122, ecapa_loss=0.0001621, whisper_loss=0.08291, over 21478.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01084, ecapa_loss=0.0001573, whisper_loss=0.09068, over 3900324.32 frames. ], batch size: 92, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:31:04,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2590720.0, ans=0.2 2024-08-14 09:31:19,069 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 30 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-14 09:31:23,803 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 38 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-14 09:31:37,328 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 17 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-14 09:31:42,660 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.60 vs. limit=10.0 2024-08-14 09:31:46,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2591020.0, ans=0.125 2024-08-14 09:31:49,014 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 17 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-14 09:32:01,341 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2591120.0, ans=0.125 2024-08-14 09:32:01,728 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.85 vs. limit=15.0 2024-08-14 09:32:04,069 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 24 from LS+wenet, 30 from Vox, 40 fro AS 2024-08-14 09:32:14,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2591220.0, ans=0.0 2024-08-14 09:32:15,725 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 12750, loss[loss=0.1157, beats_loss=0.009497, ecapa_loss=0.000163, whisper_loss=0.1046, over 21669.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01085, ecapa_loss=0.0001569, whisper_loss=0.09137, over 3921078.07 frames. ], batch size: 87, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:32:29,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2591320.0, ans=0.125 2024-08-14 09:32:30,768 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 15 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-14 09:32:51,869 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2591420.0, ans=0.0 2024-08-14 09:33:07,392 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.072e+01 2.400e+01 2.605e+01 3.000e+01 2.756e+02, threshold=5.209e+01, percent-clipped=1.0 2024-08-14 09:33:27,840 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2591620.0, ans=0.05 2024-08-14 09:33:30,137 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 12800, loss[loss=0.1141, beats_loss=0.01086, ecapa_loss=0.000161, whisper_loss=0.1016, over 19252.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01084, ecapa_loss=0.0001584, whisper_loss=0.09108, over 3912142.63 frames. ], batch size: 73, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:33:41,333 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.85 vs. limit=22.5 2024-08-14 09:33:58,189 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 25 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-14 09:34:25,288 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 26 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-14 09:34:32,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2592120.0, ans=0.125 2024-08-14 09:35:10,780 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.72 vs. limit=10.0 2024-08-14 09:35:13,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=2592120.0, ans=0.2 2024-08-14 09:35:17,924 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 12850, loss[loss=0.1077, beats_loss=0.009849, ecapa_loss=0.0001747, whisper_loss=0.09612, over 22603.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0109, ecapa_loss=0.0001587, whisper_loss=0.09038, over 3918194.56 frames. ], batch size: 89, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:35:27,663 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 19 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-14 09:35:56,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2592320.0, ans=0.125 2024-08-14 09:36:12,042 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 21 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-14 09:36:23,577 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 2.339e+01 2.541e+01 2.791e+01 1.384e+02, threshold=5.082e+01, percent-clipped=3.0 2024-08-14 09:36:46,334 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 12900, loss[loss=0.1122, beats_loss=0.01086, ecapa_loss=0.000194, whisper_loss=0.0994, over 20694.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01086, ecapa_loss=0.0001595, whisper_loss=0.09026, over 3860173.84 frames. ], batch size: 87, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:36:51,067 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 24 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-14 09:37:25,093 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2592820.0, ans=0.2 2024-08-14 09:37:31,745 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 32 from Vox, 30 fro AS 2024-08-14 09:37:47,391 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 19 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-14 09:37:51,783 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 32 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-14 09:37:56,347 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 28 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-14 09:37:56,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2593020.0, ans=0.1 2024-08-14 09:38:04,662 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-14 09:38:05,078 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.49 vs. limit=10.0 2024-08-14 09:38:15,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2593120.0, ans=0.125 2024-08-14 09:38:37,811 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 12950, loss[loss=0.1101, beats_loss=0.009156, ecapa_loss=0.0001807, whisper_loss=0.09912, over 22157.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01071, ecapa_loss=0.0001593, whisper_loss=0.09067, over 3886499.44 frames. ], batch size: 90, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:38:38,188 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-14 09:38:41,074 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2593220.0, ans=0.1 2024-08-14 09:38:52,983 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 23 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-14 09:39:01,045 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.06 vs. limit=12.0 2024-08-14 09:39:15,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2593420.0, ans=0.125 2024-08-14 09:39:50,469 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.001e+01 2.412e+01 2.710e+01 3.075e+01 4.932e+01, threshold=5.420e+01, percent-clipped=0.0 2024-08-14 09:40:01,874 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2593620.0, ans=0.0 2024-08-14 09:40:09,927 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 36 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-14 09:40:17,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2593620.0, ans=0.0 2024-08-14 09:40:20,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2593620.0, ans=0.0 2024-08-14 09:40:21,162 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.51 vs. limit=15.0 2024-08-14 09:40:27,860 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 13000, loss[loss=0.1175, beats_loss=0.008022, ecapa_loss=0.000168, whisper_loss=0.1078, over 16122.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01064, ecapa_loss=0.0001591, whisper_loss=0.09093, over 3891027.93 frames. ], batch size: 61, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:40:39,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2593720.0, ans=0.125 2024-08-14 09:40:51,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2593820.0, ans=0.05 2024-08-14 09:40:51,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2593820.0, ans=0.0 2024-08-14 09:41:18,337 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.27 vs. limit=22.5 2024-08-14 09:41:21,309 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 20 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-14 09:41:26,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2593920.0, ans=0.1 2024-08-14 09:41:26,592 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2593920.0, ans=0.2 2024-08-14 09:41:31,249 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2593920.0, ans=0.0 2024-08-14 09:41:48,305 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 23 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-14 09:41:52,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2594020.0, ans=0.0 2024-08-14 09:42:22,881 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 13050, loss[loss=0.1113, beats_loss=0.01109, ecapa_loss=0.0001363, whisper_loss=0.09881, over 23351.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01067, ecapa_loss=0.0001581, whisper_loss=0.09063, over 3879057.91 frames. ], batch size: 90, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:42:45,955 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 16 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-14 09:42:54,180 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 23 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-14 09:43:02,437 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-14 09:43:02,711 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2594420.0, ans=0.2 2024-08-14 09:43:20,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2594420.0, ans=0.0 2024-08-14 09:43:32,894 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.734e+01 2.338e+01 2.607e+01 2.942e+01 4.688e+01, threshold=5.215e+01, percent-clipped=0.0 2024-08-14 09:44:03,071 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 13100, loss[loss=0.1092, beats_loss=0.00933, ecapa_loss=0.000162, whisper_loss=0.09821, over 21575.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01071, ecapa_loss=0.0001566, whisper_loss=0.09033, over 3881083.80 frames. ], batch size: 87, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:44:13,709 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 25 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-14 09:44:38,378 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 37 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-14 09:44:48,875 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 19 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-14 09:44:54,632 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 26 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 09:45:06,009 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-14 09:45:09,274 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=5.228e+01 2024-08-14 09:45:10,285 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 22 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-14 09:45:15,275 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2595120.0, ans=0.0 2024-08-14 09:45:29,488 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 13150, loss[loss=0.1093, beats_loss=0.009834, ecapa_loss=0.0001686, whisper_loss=0.09778, over 21268.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01067, ecapa_loss=0.0001564, whisper_loss=0.09075, over 3880134.91 frames. ], batch size: 89, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:45:36,395 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 28 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-14 09:45:38,540 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2595220.0, ans=0.125 2024-08-14 09:45:39,952 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2595220.0, ans=0.125 2024-08-14 09:45:56,317 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.92 vs. limit=22.5 2024-08-14 09:46:02,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2595420.0, ans=0.0 2024-08-14 09:46:10,854 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2595420.0, ans=0.0 2024-08-14 09:46:12,053 INFO [train_multi_KD3.py:844] (0/4) A total of 96 cuts. 28 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-14 09:46:20,204 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.778e+01 2.308e+01 2.613e+01 2.975e+01 4.681e+01, threshold=5.226e+01, percent-clipped=0.0 2024-08-14 09:46:23,324 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 21 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-14 09:46:30,890 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-14 09:46:36,624 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-14 09:46:36,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2595620.0, ans=0.0 2024-08-14 09:46:38,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2595620.0, ans=0.0 2024-08-14 09:46:39,948 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2595620.0, ans=0.125 2024-08-14 09:46:42,166 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 13200, loss[loss=0.07481, beats_loss=0.009728, ecapa_loss=0.0001976, whisper_loss=0.0631, over 17744.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01067, ecapa_loss=0.0001563, whisper_loss=0.09082, over 3881708.24 frames. ], batch size: 76, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:46:50,317 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 13 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-14 09:47:14,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2595920.0, ans=0.125 2024-08-14 09:47:28,907 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 16 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-14 09:47:39,197 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-14 09:47:45,180 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-14 09:48:08,126 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 21 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-14 09:48:12,858 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.79 vs. limit=15.0 2024-08-14 09:48:16,373 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 13250, loss[loss=0.09791, beats_loss=0.01232, ecapa_loss=0.0001093, whisper_loss=0.0845, over 16655.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01068, ecapa_loss=0.0001562, whisper_loss=0.09099, over 3860928.55 frames. ], batch size: 65, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:48:33,007 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 33 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-14 09:48:53,196 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 23 from LS+wenet, 9 from Vox, 26 fro AS 2024-08-14 09:49:06,655 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-14 09:49:22,998 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 27 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-14 09:49:26,595 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.946e+01 2.423e+01 2.644e+01 3.025e+01 2.002e+02, threshold=5.289e+01, percent-clipped=3.0 2024-08-14 09:49:29,628 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.66 vs. limit=15.0 2024-08-14 09:49:39,321 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 25 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-14 09:49:41,428 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 20 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-14 09:49:45,671 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2596620.0, ans=0.125 2024-08-14 09:49:49,493 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2596620.0, ans=0.2 2024-08-14 09:49:57,279 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 13300, loss[loss=0.08133, beats_loss=0.0104, ecapa_loss=0.0001614, whisper_loss=0.06932, over 19403.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01068, ecapa_loss=0.0001561, whisper_loss=0.0915, over 3872168.38 frames. ], batch size: 78, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:50:11,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2596720.0, ans=0.1 2024-08-14 09:50:17,755 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.43 vs. limit=6.0 2024-08-14 09:50:48,431 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 25 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-14 09:50:58,139 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.68 vs. limit=22.5 2024-08-14 09:51:09,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2597020.0, ans=0.0 2024-08-14 09:51:30,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2597220.0, ans=0.07 2024-08-14 09:51:31,063 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 13350, loss[loss=0.1213, beats_loss=0.009323, ecapa_loss=0.000156, whisper_loss=0.1104, over 19735.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01067, ecapa_loss=0.0001557, whisper_loss=0.09133, over 3850466.49 frames. ], batch size: 77, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:51:31,332 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 20 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-14 09:51:31,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2597220.0, ans=0.1 2024-08-14 09:52:05,341 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2597420.0, ans=0.1 2024-08-14 09:52:07,419 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 09:52:07,470 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2597420.0, ans=0.125 2024-08-14 09:52:19,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2597520.0, ans=0.125 2024-08-14 09:52:23,957 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 23 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-14 09:52:25,066 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.040e+01 2.409e+01 2.670e+01 3.063e+01 5.921e+01, threshold=5.339e+01, percent-clipped=1.0 2024-08-14 09:52:42,614 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 27 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-14 09:52:47,024 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2597720.0, ans=0.125 2024-08-14 09:52:47,940 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 13400, loss[loss=0.1098, beats_loss=0.01115, ecapa_loss=0.0001177, whisper_loss=0.09744, over 24045.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0107, ecapa_loss=0.0001556, whisper_loss=0.09081, over 3840283.55 frames. ], batch size: 91, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:52:56,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2597720.0, ans=0.025 2024-08-14 09:52:57,782 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2597720.0, ans=0.0 2024-08-14 09:52:57,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2597720.0, ans=0.0 2024-08-14 09:53:16,503 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2597820.0, ans=0.2 2024-08-14 09:53:19,463 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-14 09:53:22,105 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.57 vs. limit=15.0 2024-08-14 09:53:26,088 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2597920.0, ans=0.1 2024-08-14 09:53:27,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2597920.0, ans=0.0 2024-08-14 09:53:28,613 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 22 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-14 09:53:42,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2598020.0, ans=0.125 2024-08-14 09:53:42,447 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.50 vs. limit=15.0 2024-08-14 09:54:05,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=2598220.0, ans=10.0 2024-08-14 09:54:06,824 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 13450, loss[loss=0.102, beats_loss=0.01155, ecapa_loss=0.0001817, whisper_loss=0.08867, over 21367.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01069, ecapa_loss=0.0001573, whisper_loss=0.09071, over 3867859.98 frames. ], batch size: 89, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:54:12,356 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2598220.0, ans=0.5 2024-08-14 09:54:13,843 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 09:54:20,356 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.29 vs. limit=15.0 2024-08-14 09:54:28,672 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 20 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-14 09:54:30,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2598320.0, ans=0.0 2024-08-14 09:54:51,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2598520.0, ans=0.125 2024-08-14 09:54:59,298 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.865e+01 2.340e+01 2.558e+01 2.955e+01 5.061e+01, threshold=5.115e+01, percent-clipped=0.0 2024-08-14 09:54:59,635 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 16 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-14 09:55:04,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2598520.0, ans=0.0 2024-08-14 09:55:14,313 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 28 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-14 09:55:20,859 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 13500, loss[loss=0.1119, beats_loss=0.009532, ecapa_loss=0.0001805, whisper_loss=0.1006, over 18007.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01061, ecapa_loss=0.0001581, whisper_loss=0.09172, over 3871336.52 frames. ], batch size: 72, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:55:27,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2598720.0, ans=0.125 2024-08-14 09:55:34,766 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-14 09:55:52,858 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.06 vs. limit=15.0 2024-08-14 09:56:23,681 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-14 09:56:29,222 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 14 from Vox, 46 fro AS 2024-08-14 09:56:32,368 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 22 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-14 09:56:33,447 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 13550, loss[loss=0.09843, beats_loss=0.01394, ecapa_loss=0.0001229, whisper_loss=0.08326, over 20418.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01067, ecapa_loss=0.0001568, whisper_loss=0.09126, over 3837793.65 frames. ], batch size: 83, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:56:38,356 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2599220.0, ans=0.0 2024-08-14 09:57:01,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=2599420.0, ans=0.02 2024-08-14 09:57:08,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2599420.0, ans=0.125 2024-08-14 09:57:24,013 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.012e+01 2.303e+01 2.544e+01 2.977e+01 7.464e+01, threshold=5.088e+01, percent-clipped=1.0 2024-08-14 09:57:24,350 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-14 09:57:25,662 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 33 from Vox, 29 fro AS 2024-08-14 09:57:31,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2599620.0, ans=0.125 2024-08-14 09:57:42,329 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.70 vs. limit=15.0 2024-08-14 09:57:45,663 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 13600, loss[loss=0.1093, beats_loss=0.01085, ecapa_loss=0.0001315, whisper_loss=0.09716, over 16214.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01074, ecapa_loss=0.0001564, whisper_loss=0.0914, over 3872785.61 frames. ], batch size: 62, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:57:50,540 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 18 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-14 09:57:51,751 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 23 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-14 09:58:02,169 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 15 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-14 09:58:15,515 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2599920.0, ans=0.0 2024-08-14 09:58:19,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2599920.0, ans=0.0 2024-08-14 09:58:21,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2599920.0, ans=0.0 2024-08-14 09:58:25,340 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-260000.pt 2024-08-14 09:58:43,906 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2600020.0, ans=0.125 2024-08-14 09:58:53,142 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2600120.0, ans=0.125 2024-08-14 09:59:01,278 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 13650, loss[loss=0.1038, beats_loss=0.01274, ecapa_loss=0.0001434, whisper_loss=0.08967, over 22883.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0108, ecapa_loss=0.0001577, whisper_loss=0.09098, over 3842727.63 frames. ], batch size: 93, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:59:03,091 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2600220.0, ans=0.05 2024-08-14 09:59:07,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2600220.0, ans=0.125 2024-08-14 09:59:20,406 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2600320.0, ans=0.1 2024-08-14 09:59:34,836 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2600420.0, ans=0.05 2024-08-14 09:59:38,773 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2600420.0, ans=0.1 2024-08-14 09:59:51,101 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.604e+01 2.361e+01 2.642e+01 3.028e+01 5.099e+01, threshold=5.285e+01, percent-clipped=1.0 2024-08-14 10:00:13,505 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 13700, loss[loss=0.1054, beats_loss=0.01141, ecapa_loss=0.0001379, whisper_loss=0.09257, over 22699.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0108, ecapa_loss=0.0001566, whisper_loss=0.09131, over 3846904.67 frames. ], batch size: 91, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 10:00:15,191 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 21 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-14 10:00:17,253 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2600720.0, ans=0.1 2024-08-14 10:00:22,648 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 17 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-14 10:00:26,239 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.40 vs. limit=15.0 2024-08-14 10:00:44,425 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2600920.0, ans=0.125 2024-08-14 10:00:51,252 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-14 10:01:06,166 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2601020.0, ans=0.0 2024-08-14 10:01:17,770 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2601120.0, ans=0.07 2024-08-14 10:01:24,287 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.36 vs. limit=22.5 2024-08-14 10:01:26,311 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 13750, loss[loss=0.1001, beats_loss=0.01063, ecapa_loss=0.0001626, whisper_loss=0.08784, over 21324.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01078, ecapa_loss=0.0001564, whisper_loss=0.09159, over 3845267.37 frames. ], batch size: 89, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 10:01:37,921 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-14 10:01:42,541 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 28 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-14 10:01:48,234 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-14 10:01:54,304 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 21 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-14 10:02:01,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2601420.0, ans=0.1 2024-08-14 10:02:03,074 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2601420.0, ans=0.0 2024-08-14 10:02:08,880 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-14 10:02:16,406 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2601520.0, ans=0.1 2024-08-14 10:02:17,140 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.453e+01 2.720e+01 3.144e+01 4.957e+01, threshold=5.441e+01, percent-clipped=0.0 2024-08-14 10:02:23,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2601620.0, ans=0.2 2024-08-14 10:02:24,773 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2601620.0, ans=0.1 2024-08-14 10:02:39,708 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 13800, loss[loss=0.09576, beats_loss=0.01194, ecapa_loss=0.0001469, whisper_loss=0.08234, over 23220.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01075, ecapa_loss=0.0001568, whisper_loss=0.09147, over 3830675.27 frames. ], batch size: 94, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 10:02:39,891 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 23 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-14 10:03:19,718 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2601920.0, ans=0.0 2024-08-14 10:03:51,501 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 13850, loss[loss=0.08223, beats_loss=0.01278, ecapa_loss=0.0001554, whisper_loss=0.0679, over 13417.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01074, ecapa_loss=0.0001562, whisper_loss=0.09149, over 3833645.65 frames. ], batch size: 54, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 10:04:09,398 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.72 vs. limit=15.0 2024-08-14 10:04:26,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2602420.0, ans=0.07 2024-08-14 10:04:26,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2602420.0, ans=0.2 2024-08-14 10:04:27,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2602420.0, ans=0.0 2024-08-14 10:04:35,766 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2602520.0, ans=0.0 2024-08-14 10:04:36,891 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 24 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-14 10:04:40,997 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.987e+01 2.427e+01 2.695e+01 2.897e+01 4.823e+02, threshold=5.391e+01, percent-clipped=1.0 2024-08-14 10:04:41,924 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.81 vs. limit=22.5 2024-08-14 10:04:54,142 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.75 vs. limit=22.5 2024-08-14 10:05:01,812 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 32 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-14 10:05:02,928 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 13900, loss[loss=0.117, beats_loss=0.008148, ecapa_loss=0.0001706, whisper_loss=0.1072, over 22322.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01079, ecapa_loss=0.0001565, whisper_loss=0.09079, over 3833010.75 frames. ], batch size: 87, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 10:05:04,678 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 35 from LS+wenet, 16 from Vox, 17 fro AS 2024-08-14 10:05:06,370 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 33 from Vox, 32 fro AS 2024-08-14 10:05:16,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2602820.0, ans=0.1 2024-08-14 10:05:21,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2602820.0, ans=0.1 2024-08-14 10:05:38,401 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2602920.0, ans=0.125 2024-08-14 10:05:48,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2603020.0, ans=0.125 2024-08-14 10:05:52,859 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2603020.0, ans=0.125 2024-08-14 10:05:58,199 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 19 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-14 10:06:06,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2603120.0, ans=0.125 2024-08-14 10:06:08,016 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 27 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-14 10:06:15,026 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 13950, loss[loss=0.1179, beats_loss=0.009989, ecapa_loss=0.0001426, whisper_loss=0.1065, over 16682.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01065, ecapa_loss=0.0001564, whisper_loss=0.09145, over 3831210.77 frames. ], batch size: 64, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 10:06:15,346 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 24 from LS+wenet, 12 from Vox, 20 fro AS 2024-08-14 10:06:22,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2603220.0, ans=0.1 2024-08-14 10:06:25,170 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-14 10:06:38,093 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 25 from Vox, 21 fro AS 2024-08-14 10:06:43,469 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 17 from Vox, 49 fro AS 2024-08-14 10:06:45,411 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2603420.0, ans=0.1 2024-08-14 10:07:04,691 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.996e+01 2.381e+01 2.587e+01 2.937e+01 5.454e+01, threshold=5.174e+01, percent-clipped=1.0 2024-08-14 10:07:18,009 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 19 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-14 10:07:18,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2603620.0, ans=0.125 2024-08-14 10:07:19,735 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2603620.0, ans=0.2 2024-08-14 10:07:22,699 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.64 vs. limit=10.0 2024-08-14 10:07:26,088 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 14000, loss[loss=0.09637, beats_loss=0.01122, ecapa_loss=0.0001427, whisper_loss=0.08373, over 15120.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01068, ecapa_loss=0.0001547, whisper_loss=0.09105, over 3850787.39 frames. ], batch size: 57, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 10:07:28,046 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2603720.0, ans=0.0 2024-08-14 10:07:38,275 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 28 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-14 10:07:41,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2603820.0, ans=0.04949747468305833 2024-08-14 10:08:06,951 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 15 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-14 10:08:08,450 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 29 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-14 10:08:20,197 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 19 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-14 10:08:37,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2604120.0, ans=0.09899494936611666 2024-08-14 10:08:39,332 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 14050, loss[loss=0.1083, beats_loss=0.009319, ecapa_loss=0.000166, whisper_loss=0.09736, over 22507.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01069, ecapa_loss=0.0001547, whisper_loss=0.09143, over 3848071.85 frames. ], batch size: 89, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 10:08:39,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2604220.0, ans=0.2 2024-08-14 10:08:52,390 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-14 10:08:53,707 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2604320.0, ans=0.125 2024-08-14 10:09:17,018 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 21 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-14 10:09:24,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2604520.0, ans=0.0 2024-08-14 10:09:29,559 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.346e+01 2.567e+01 2.904e+01 5.000e+01, threshold=5.134e+01, percent-clipped=0.0 2024-08-14 10:09:37,438 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2604620.0, ans=0.0 2024-08-14 10:09:50,856 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 14100, loss[loss=0.09205, beats_loss=0.01302, ecapa_loss=0.0001446, whisper_loss=0.07759, over 18989.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01078, ecapa_loss=0.000156, whisper_loss=0.0909, over 3861722.18 frames. ], batch size: 75, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 10:09:52,536 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 25 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-14 10:09:55,698 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 20 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-14 10:10:02,921 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 18 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-14 10:10:11,649 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.958e+01 2024-08-14 10:10:14,438 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2604820.0, ans=0.2 2024-08-14 10:10:35,420 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2605020.0, ans=0.0 2024-08-14 10:10:47,348 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 23 from LS+wenet, 17 from Vox, 16 fro AS 2024-08-14 10:10:58,026 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.24 vs. limit=15.0 2024-08-14 10:11:00,846 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2605120.0, ans=0.1 2024-08-14 10:11:03,252 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 14150, loss[loss=0.1062, beats_loss=0.01116, ecapa_loss=0.0001265, whisper_loss=0.09377, over 23602.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01072, ecapa_loss=0.0001561, whisper_loss=0.09072, over 3808923.03 frames. ], batch size: 92, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 10:11:32,188 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 26 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-14 10:11:40,487 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 23 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-14 10:11:40,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2605420.0, ans=0.0 2024-08-14 10:11:52,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2605520.0, ans=0.125 2024-08-14 10:11:53,299 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.758e+01 2.390e+01 2.553e+01 2.829e+01 7.364e+01, threshold=5.106e+01, percent-clipped=2.0 2024-08-14 10:11:57,888 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 32 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-14 10:11:59,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2605620.0, ans=0.1 2024-08-14 10:12:15,764 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 14200, loss[loss=0.1041, beats_loss=0.01226, ecapa_loss=0.0001437, whisper_loss=0.09038, over 21843.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01072, ecapa_loss=0.0001563, whisper_loss=0.0912, over 3835472.13 frames. ], batch size: 89, lr: 3.43e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:12:33,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2605820.0, ans=0.2 2024-08-14 10:12:46,371 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.71 vs. limit=22.5 2024-08-14 10:12:53,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2605920.0, ans=0.125 2024-08-14 10:13:16,420 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2606120.0, ans=0.2 2024-08-14 10:13:27,330 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 14250, loss[loss=0.09663, beats_loss=0.01006, ecapa_loss=0.0001698, whisper_loss=0.08486, over 20691.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01073, ecapa_loss=0.0001563, whisper_loss=0.09089, over 3842698.66 frames. ], batch size: 83, lr: 3.43e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:13:29,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2606220.0, ans=10.0 2024-08-14 10:13:38,516 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.17 vs. limit=15.0 2024-08-14 10:13:48,178 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2606320.0, ans=0.0 2024-08-14 10:13:53,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2606320.0, ans=0.2 2024-08-14 10:13:55,211 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2606420.0, ans=0.0 2024-08-14 10:13:56,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2606420.0, ans=0.1 2024-08-14 10:13:57,969 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2606420.0, ans=0.125 2024-08-14 10:13:58,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2606420.0, ans=0.125 2024-08-14 10:14:11,525 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2606520.0, ans=0.125 2024-08-14 10:14:18,354 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.034e+01 2.438e+01 2.661e+01 3.044e+01 6.273e+01, threshold=5.322e+01, percent-clipped=2.0 2024-08-14 10:14:19,091 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2606520.0, ans=0.125 2024-08-14 10:14:20,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2606520.0, ans=0.125 2024-08-14 10:14:22,991 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 21 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-14 10:14:37,780 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.78 vs. limit=15.0 2024-08-14 10:14:39,862 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 14300, loss[loss=0.1172, beats_loss=0.009921, ecapa_loss=0.0001421, whisper_loss=0.1059, over 23534.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01076, ecapa_loss=0.0001565, whisper_loss=0.09093, over 3879274.08 frames. ], batch size: 91, lr: 3.43e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:15:08,849 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-14 10:15:21,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2606920.0, ans=0.04949747468305833 2024-08-14 10:15:36,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2607020.0, ans=0.125 2024-08-14 10:15:42,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2607120.0, ans=0.0 2024-08-14 10:15:43,154 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.34 vs. limit=6.0 2024-08-14 10:15:49,066 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 21 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-14 10:15:55,689 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 14350, loss[loss=0.104, beats_loss=0.01228, ecapa_loss=0.000129, whisper_loss=0.09041, over 19872.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01076, ecapa_loss=0.0001563, whisper_loss=0.09122, over 3915196.56 frames. ], batch size: 77, lr: 3.43e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:16:08,153 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-14 10:16:10,324 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 32 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-14 10:16:36,433 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2607420.0, ans=0.1 2024-08-14 10:16:40,401 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 16 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-14 10:16:52,580 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 35 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-14 10:16:56,425 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.823e+01 2.361e+01 2.607e+01 2.997e+01 4.259e+01, threshold=5.213e+01, percent-clipped=0.0 2024-08-14 10:17:13,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2607620.0, ans=0.125 2024-08-14 10:17:23,451 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 14400, loss[loss=0.09315, beats_loss=0.01186, ecapa_loss=0.0001755, whisper_loss=0.07954, over 17874.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01065, ecapa_loss=0.0001569, whisper_loss=0.09138, over 3930252.51 frames. ], batch size: 75, lr: 3.43e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:17:43,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2607820.0, ans=0.125 2024-08-14 10:18:38,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2608120.0, ans=0.0 2024-08-14 10:18:39,500 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2608120.0, ans=0.125 2024-08-14 10:18:45,782 INFO [train_multi_KD3.py:1116] (0/4) Epoch 18, batch 14450, loss[loss=0.1137, beats_loss=0.009133, ecapa_loss=0.0001601, whisper_loss=0.103, over 17684.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01064, ecapa_loss=0.0001564, whisper_loss=0.09184, over 3933212.12 frames. ], batch size: 69, lr: 3.43e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:18:46,082 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 26 from LS+wenet, 20 from Vox, 12 fro AS 2024-08-14 10:18:53,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2608220.0, ans=0.95 2024-08-14 10:18:59,710 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-14 10:19:09,339 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-14 10:19:12,892 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2608420.0, ans=0.125 2024-08-14 10:19:14,039 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 18 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-14 10:19:24,115 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 20 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-14 10:19:26,016 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.27 vs. limit=15.0 2024-08-14 10:19:30,297 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/epoch-18.pt 2024-08-14 10:19:55,471 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 0, loss[loss=0.1025, beats_loss=0.008999, ecapa_loss=0.0001596, whisper_loss=0.09193, over 20954.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.008999, ecapa_loss=0.0001596, whisper_loss=0.09193, over 20954.00 frames. ], batch size: 83, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:19:55,473 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-14 10:20:37,897 INFO [train_multi_KD3.py:1149] (0/4) Epoch 19, validation on ASR_libri: loss=0.2539, beats_loss=0, ecapa_loss=0.0005486, whisper_loss=0.2484, over 922467.00 frames. 2024-08-14 10:20:53,993 INFO [train_multi_KD3.py:1149] (0/4) Epoch 19, validation on SV_voxceleb1: loss=0.004382, beats_loss=0, ecapa_loss=0.0004382, whisper_loss=0, over 939242.00 frames. 2024-08-14 10:21:32,443 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.1843, 4.0284, 3.5783, 3.8447], device='cuda:0') 2024-08-14 10:22:56,777 INFO [train_multi_KD3.py:1149] (0/4) Epoch 19, validation on AT_audioset: loss=0.02338, beats_loss=0.02338, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 10:22:56,785 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32601MB 2024-08-14 10:23:09,191 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.979e+01 2.393e+01 2.576e+01 3.022e+01 6.974e+01, threshold=5.152e+01, percent-clipped=1.0 2024-08-14 10:23:46,832 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 27 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-14 10:23:52,002 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2608720.0, ans=0.125 2024-08-14 10:24:08,821 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.18 vs. limit=15.0 2024-08-14 10:24:28,686 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2608820.0, ans=0.125 2024-08-14 10:24:36,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2608820.0, ans=0.04949747468305833 2024-08-14 10:25:06,135 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 50, loss[loss=0.1042, beats_loss=0.007678, ecapa_loss=0.0001803, whisper_loss=0.09471, over 21637.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.009616, ecapa_loss=0.0001645, whisper_loss=0.09242, over 901242.27 frames. ], batch size: 86, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:25:35,216 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.71 vs. limit=15.0 2024-08-14 10:25:37,721 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.07 vs. limit=22.5 2024-08-14 10:25:44,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2609120.0, ans=0.0 2024-08-14 10:25:47,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2609120.0, ans=0.2 2024-08-14 10:26:14,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2609220.0, ans=0.125 2024-08-14 10:26:25,476 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 18 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-14 10:26:25,862 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.25 vs. limit=15.0 2024-08-14 10:26:51,407 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 24 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-14 10:26:56,195 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2609420.0, ans=0.125 2024-08-14 10:27:04,928 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 100, loss[loss=0.09399, beats_loss=0.008793, ecapa_loss=0.0001712, whisper_loss=0.08348, over 14607.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.00984, ecapa_loss=0.0001598, whisper_loss=0.08986, over 1550562.57 frames. ], batch size: 59, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:27:16,385 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.117e+01 2.614e+01 2.833e+01 3.144e+01 8.943e+01, threshold=5.666e+01, percent-clipped=3.0 2024-08-14 10:27:46,541 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2609620.0, ans=0.1 2024-08-14 10:27:58,214 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-14 10:27:58,686 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.54 vs. limit=15.0 2024-08-14 10:28:17,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2609820.0, ans=0.125 2024-08-14 10:28:19,453 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 24 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-14 10:28:56,396 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 150, loss[loss=0.08026, beats_loss=0.01424, ecapa_loss=0.0001468, whisper_loss=0.06454, over 18709.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.009852, ecapa_loss=0.0001587, whisper_loss=0.09088, over 2072312.09 frames. ], batch size: 76, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:28:58,290 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-14 10:29:04,608 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2610020.0, ans=0.125 2024-08-14 10:29:20,023 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 27 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-14 10:29:21,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2610120.0, ans=0.0 2024-08-14 10:29:34,642 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-14 10:29:41,536 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2610220.0, ans=0.125 2024-08-14 10:29:45,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2610220.0, ans=0.125 2024-08-14 10:29:50,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2610320.0, ans=0.125 2024-08-14 10:29:58,826 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2610320.0, ans=0.125 2024-08-14 10:30:01,635 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 25 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-14 10:30:01,883 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2610420.0, ans=0.1 2024-08-14 10:30:03,310 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.32 vs. limit=22.5 2024-08-14 10:30:18,281 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 200, loss[loss=0.07663, beats_loss=0.01416, ecapa_loss=0.0001315, whisper_loss=0.06116, over 15406.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01001, ecapa_loss=0.0001575, whisper_loss=0.09083, over 2469434.57 frames. ], batch size: 62, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:30:20,189 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2610520.0, ans=0.125 2024-08-14 10:30:25,920 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.004e+01 2.407e+01 2.774e+01 3.039e+01 4.574e+01, threshold=5.548e+01, percent-clipped=0.0 2024-08-14 10:30:33,436 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2610620.0, ans=0.125 2024-08-14 10:30:34,376 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 15 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-14 10:30:34,698 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2610620.0, ans=0.1 2024-08-14 10:30:40,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=2610620.0, ans=0.025 2024-08-14 10:30:50,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2610720.0, ans=10.0 2024-08-14 10:31:02,862 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 28 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-14 10:31:11,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2610820.0, ans=0.125 2024-08-14 10:31:20,593 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2610920.0, ans=0.125 2024-08-14 10:31:24,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2610920.0, ans=0.0 2024-08-14 10:31:33,359 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2610920.0, ans=0.125 2024-08-14 10:31:37,727 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 250, loss[loss=0.09354, beats_loss=0.01204, ecapa_loss=0.0001376, whisper_loss=0.08013, over 20235.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01005, ecapa_loss=0.0001569, whisper_loss=0.09234, over 2781357.66 frames. ], batch size: 81, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:31:43,308 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.81 vs. limit=15.0 2024-08-14 10:32:20,291 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2611220.0, ans=0.0 2024-08-14 10:32:25,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=2611220.0, ans=0.025 2024-08-14 10:32:51,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2611420.0, ans=0.125 2024-08-14 10:33:00,367 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2611520.0, ans=0.125 2024-08-14 10:33:01,466 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 300, loss[loss=0.09379, beats_loss=0.01043, ecapa_loss=0.0001705, whisper_loss=0.08166, over 16814.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01026, ecapa_loss=0.0001578, whisper_loss=0.09059, over 2987931.73 frames. ], batch size: 69, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:33:03,238 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-14 10:33:09,056 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 21 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-14 10:33:10,136 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.860e+01 2.366e+01 2.598e+01 2.945e+01 2.183e+02, threshold=5.197e+01, percent-clipped=2.0 2024-08-14 10:33:11,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2611520.0, ans=0.0 2024-08-14 10:33:15,700 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2611520.0, ans=0.125 2024-08-14 10:33:28,403 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.45 vs. limit=6.0 2024-08-14 10:33:29,793 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2611620.0, ans=0.0 2024-08-14 10:33:36,895 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.77 vs. limit=15.0 2024-08-14 10:33:44,316 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-14 10:33:45,548 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-14 10:33:56,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2611820.0, ans=0.0 2024-08-14 10:33:56,764 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.64 vs. limit=15.0 2024-08-14 10:34:14,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2611920.0, ans=0.125 2024-08-14 10:34:29,471 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 350, loss[loss=0.08825, beats_loss=0.0119, ecapa_loss=0.0001406, whisper_loss=0.07494, over 18240.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01025, ecapa_loss=0.0001586, whisper_loss=0.09103, over 3184053.69 frames. ], batch size: 73, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:34:41,697 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2612020.0, ans=0.125 2024-08-14 10:34:46,421 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 25 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-14 10:34:57,978 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 24 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-14 10:35:16,098 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2612220.0, ans=0.1 2024-08-14 10:35:17,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2612220.0, ans=0.0 2024-08-14 10:35:20,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2612320.0, ans=0.1 2024-08-14 10:35:30,986 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2612320.0, ans=0.0 2024-08-14 10:35:37,410 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 23 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-14 10:35:48,797 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-14 10:35:53,736 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 400, loss[loss=0.09111, beats_loss=0.0107, ecapa_loss=0.0001686, whisper_loss=0.07873, over 16352.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01039, ecapa_loss=0.000158, whisper_loss=0.09061, over 3315819.78 frames. ], batch size: 62, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:36:01,861 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.645e+01 2.324e+01 2.549e+01 2.797e+01 3.225e+02, threshold=5.099e+01, percent-clipped=2.0 2024-08-14 10:36:08,962 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.85 vs. limit=22.5 2024-08-14 10:36:16,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2612620.0, ans=0.0 2024-08-14 10:36:16,717 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2612620.0, ans=0.125 2024-08-14 10:36:16,720 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2612620.0, ans=0.125 2024-08-14 10:36:25,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2612720.0, ans=0.07 2024-08-14 10:36:29,885 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.90 vs. limit=12.0 2024-08-14 10:36:38,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2612720.0, ans=0.125 2024-08-14 10:37:17,650 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 450, loss[loss=0.09144, beats_loss=0.01113, ecapa_loss=0.000151, whisper_loss=0.0788, over 17347.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01037, ecapa_loss=0.0001589, whisper_loss=0.08989, over 3432123.83 frames. ], batch size: 71, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:37:17,823 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 24 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-14 10:37:25,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2613020.0, ans=0.2 2024-08-14 10:37:25,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2613020.0, ans=0.125 2024-08-14 10:37:29,215 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2613020.0, ans=0.07 2024-08-14 10:37:50,541 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.86 vs. limit=15.0 2024-08-14 10:37:58,586 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.85 vs. limit=15.0 2024-08-14 10:38:20,188 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 26 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-14 10:38:21,936 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 15 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-14 10:38:39,192 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.02 vs. limit=15.0 2024-08-14 10:38:47,407 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 500, loss[loss=0.1071, beats_loss=0.00954, ecapa_loss=0.0001392, whisper_loss=0.09619, over 20417.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01037, ecapa_loss=0.0001575, whisper_loss=0.0903, over 3538671.22 frames. ], batch size: 77, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:38:56,740 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.290e+01 2.547e+01 2.928e+01 5.420e+01, threshold=5.093e+01, percent-clipped=1.0 2024-08-14 10:39:07,221 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 32 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-14 10:39:44,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2613820.0, ans=0.0 2024-08-14 10:40:04,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2613920.0, ans=0.125 2024-08-14 10:40:06,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2613920.0, ans=0.125 2024-08-14 10:40:17,704 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-14 10:40:17,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2614020.0, ans=0.125 2024-08-14 10:40:18,804 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 550, loss[loss=0.1086, beats_loss=0.01153, ecapa_loss=0.0001791, whisper_loss=0.09531, over 16479.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01036, ecapa_loss=0.0001564, whisper_loss=0.09098, over 3622949.67 frames. ], batch size: 66, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:40:26,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2614020.0, ans=0.125 2024-08-14 10:40:29,295 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.08 vs. limit=15.0 2024-08-14 10:40:29,985 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-14 10:40:44,713 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 29 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-14 10:41:26,145 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 26 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-14 10:41:38,980 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.86 vs. limit=6.0 2024-08-14 10:41:46,274 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 600, loss[loss=0.09364, beats_loss=0.01207, ecapa_loss=0.0001311, whisper_loss=0.08026, over 18802.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0104, ecapa_loss=0.000156, whisper_loss=0.09064, over 3687234.70 frames. ], batch size: 72, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:41:50,883 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=5.554e+00 2024-08-14 10:41:52,669 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.834e-01 2024-08-14 10:41:53,782 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.288e+01 2.520e+01 2.805e+01 9.045e+01, threshold=5.041e+01, percent-clipped=2.0 2024-08-14 10:42:09,067 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-14 10:42:15,985 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 10:42:19,168 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 29 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-14 10:42:19,409 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2614720.0, ans=0.125 2024-08-14 10:42:25,136 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 22 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-14 10:42:36,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2614820.0, ans=0.125 2024-08-14 10:42:36,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2614820.0, ans=0.1 2024-08-14 10:42:38,283 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.31 vs. limit=22.5 2024-08-14 10:42:49,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2614920.0, ans=0.0 2024-08-14 10:43:00,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2614920.0, ans=0.125 2024-08-14 10:43:03,946 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 650, loss[loss=0.1029, beats_loss=0.01015, ecapa_loss=0.0001635, whisper_loss=0.09111, over 18823.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01041, ecapa_loss=0.0001547, whisper_loss=0.09042, over 3751271.40 frames. ], batch size: 74, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:43:11,493 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-14 10:43:22,999 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.32 vs. limit=15.0 2024-08-14 10:43:26,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2615120.0, ans=0.1 2024-08-14 10:43:35,211 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2615220.0, ans=0.0 2024-08-14 10:43:45,960 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.55 vs. limit=15.0 2024-08-14 10:43:52,421 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 14 from LS+wenet, 24 from Vox, 19 fro AS 2024-08-14 10:44:03,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2615420.0, ans=0.035 2024-08-14 10:44:03,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2615420.0, ans=0.0 2024-08-14 10:44:13,288 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 700, loss[loss=0.1011, beats_loss=0.01089, ecapa_loss=0.0001195, whisper_loss=0.08906, over 14841.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01039, ecapa_loss=0.0001557, whisper_loss=0.09047, over 3768957.31 frames. ], batch size: 55, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:44:15,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2615520.0, ans=0.125 2024-08-14 10:44:19,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2615520.0, ans=0.1 2024-08-14 10:44:19,898 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.981e+01 2.455e+01 2.625e+01 2.898e+01 4.319e+01, threshold=5.251e+01, percent-clipped=0.0 2024-08-14 10:44:21,451 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 26 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-14 10:44:21,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2615520.0, ans=0.125 2024-08-14 10:44:25,260 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-14 10:44:35,493 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2615620.0, ans=0.1 2024-08-14 10:44:41,670 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 24 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-14 10:44:57,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2615820.0, ans=0.125 2024-08-14 10:45:09,271 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 30 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-14 10:45:10,943 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2615920.0, ans=0.2 2024-08-14 10:45:12,049 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 19 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-14 10:45:14,905 INFO [train_multi_KD3.py:844] (0/4) A total of 96 cuts. 31 from LS+wenet, 32 from Vox, 33 fro AS 2024-08-14 10:45:20,524 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 750, loss[loss=0.07628, beats_loss=0.01196, ecapa_loss=0.0001785, whisper_loss=0.06254, over 20828.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01048, ecapa_loss=0.000155, whisper_loss=0.0902, over 3783556.09 frames. ], batch size: 90, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:45:23,519 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=2616020.0, ans=0.02 2024-08-14 10:45:30,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2616020.0, ans=0.1 2024-08-14 10:45:36,647 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 28 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-14 10:45:43,557 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 27 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-14 10:46:01,304 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2616320.0, ans=0.09899494936611666 2024-08-14 10:46:02,284 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 28 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-14 10:46:15,378 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-14 10:46:20,001 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2616420.0, ans=0.0 2024-08-14 10:46:22,331 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 21 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-14 10:46:27,271 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 800, loss[loss=0.09821, beats_loss=0.01084, ecapa_loss=0.0001483, whisper_loss=0.08588, over 21252.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01051, ecapa_loss=0.0001554, whisper_loss=0.0898, over 3801813.22 frames. ], batch size: 84, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:46:33,959 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.840e+01 2.304e+01 2.552e+01 2.845e+01 4.485e+01, threshold=5.104e+01, percent-clipped=0.0 2024-08-14 10:46:34,133 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 19 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-14 10:46:35,527 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-14 10:46:38,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2616520.0, ans=0.0 2024-08-14 10:46:39,637 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 27 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-14 10:46:42,489 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 18 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-14 10:46:42,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2616620.0, ans=0.07 2024-08-14 10:46:54,450 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 28 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-14 10:46:54,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2616720.0, ans=0.2 2024-08-14 10:47:07,995 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 31 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-14 10:47:20,114 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.69 vs. limit=15.0 2024-08-14 10:47:20,983 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 29 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-14 10:47:34,629 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 850, loss[loss=0.1004, beats_loss=0.01144, ecapa_loss=0.0001411, whisper_loss=0.08755, over 22659.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01052, ecapa_loss=0.0001544, whisper_loss=0.08943, over 3775279.33 frames. ], batch size: 93, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:47:40,203 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 18 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-14 10:47:44,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2617020.0, ans=0.0 2024-08-14 10:48:03,117 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2617220.0, ans=0.1 2024-08-14 10:48:03,225 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2617220.0, ans=0.2 2024-08-14 10:48:05,810 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-14 10:48:11,174 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 23 from LS+wenet, 34 from Vox, 35 fro AS 2024-08-14 10:48:14,319 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.19 vs. limit=22.5 2024-08-14 10:48:16,097 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 15 from LS+wenet, 24 from Vox, 21 fro AS 2024-08-14 10:48:42,413 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 900, loss[loss=0.08985, beats_loss=0.01102, ecapa_loss=0.0001346, whisper_loss=0.07749, over 21928.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01057, ecapa_loss=0.0001541, whisper_loss=0.08882, over 3802059.66 frames. ], batch size: 87, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:48:48,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2617520.0, ans=0.2 2024-08-14 10:48:49,480 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.308e+01 2.548e+01 2.901e+01 4.285e+01, threshold=5.097e+01, percent-clipped=0.0 2024-08-14 10:48:52,305 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 17 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-14 10:49:00,286 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.75 vs. limit=15.0 2024-08-14 10:49:25,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2617820.0, ans=0.0 2024-08-14 10:49:44,484 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2617920.0, ans=0.125 2024-08-14 10:49:45,608 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 21 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-14 10:49:49,507 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 950, loss[loss=0.1115, beats_loss=0.01107, ecapa_loss=0.0001543, whisper_loss=0.09888, over 23110.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01062, ecapa_loss=0.0001526, whisper_loss=0.08867, over 3824246.03 frames. ], batch size: 90, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:50:07,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2618120.0, ans=0.1 2024-08-14 10:50:08,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2618120.0, ans=0.1 2024-08-14 10:50:11,202 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-14 10:50:24,858 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2618220.0, ans=0.125 2024-08-14 10:50:26,249 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2618220.0, ans=0.0 2024-08-14 10:50:34,450 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-14 10:50:57,239 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 1000, loss[loss=0.09195, beats_loss=0.01152, ecapa_loss=0.0001541, whisper_loss=0.07889, over 18771.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01072, ecapa_loss=0.0001522, whisper_loss=0.08837, over 3854649.20 frames. ], batch size: 74, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:51:02,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2618520.0, ans=0.125 2024-08-14 10:51:03,742 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.735e+01 2.412e+01 2.681e+01 3.043e+01 1.164e+02, threshold=5.362e+01, percent-clipped=2.0 2024-08-14 10:51:12,751 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 23 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-14 10:51:14,314 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-14 10:51:14,813 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.59 vs. limit=15.0 2024-08-14 10:51:28,775 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 26 from LS+wenet, 9 from Vox, 25 fro AS 2024-08-14 10:51:29,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2618720.0, ans=0.125 2024-08-14 10:51:30,320 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2618720.0, ans=0.125 2024-08-14 10:51:34,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2618720.0, ans=0.0 2024-08-14 10:51:34,706 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.46 vs. limit=6.0 2024-08-14 10:51:41,002 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 15 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-14 10:52:03,783 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 1050, loss[loss=0.0954, beats_loss=0.0107, ecapa_loss=0.0001268, whisper_loss=0.08342, over 20947.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01069, ecapa_loss=0.0001518, whisper_loss=0.08911, over 3864977.90 frames. ], batch size: 84, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:52:14,286 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-14 10:52:25,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2619120.0, ans=0.125 2024-08-14 10:52:33,109 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-14 10:52:38,575 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2619220.0, ans=0.125 2024-08-14 10:52:38,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=2619220.0, ans=10.0 2024-08-14 10:53:04,914 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.94 vs. limit=15.0 2024-08-14 10:53:11,175 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 1100, loss[loss=0.09457, beats_loss=0.009736, ecapa_loss=0.0002039, whisper_loss=0.0828, over 16058.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01069, ecapa_loss=0.0001523, whisper_loss=0.08894, over 3863753.73 frames. ], batch size: 67, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:53:12,891 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2619520.0, ans=0.125 2024-08-14 10:53:15,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2619520.0, ans=0.0 2024-08-14 10:53:17,216 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.365e+01 2.665e+01 2.962e+01 1.430e+02, threshold=5.329e+01, percent-clipped=2.0 2024-08-14 10:53:38,625 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2619720.0, ans=0.1 2024-08-14 10:53:48,044 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 23 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-14 10:53:59,924 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 26 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-14 10:54:17,066 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.62 vs. limit=15.0 2024-08-14 10:54:17,476 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 1150, loss[loss=0.09351, beats_loss=0.009287, ecapa_loss=0.0001686, whisper_loss=0.08254, over 15043.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01072, ecapa_loss=0.0001518, whisper_loss=0.08901, over 3850948.02 frames. ], batch size: 60, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:54:36,430 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-14 10:54:42,883 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 15 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-14 10:54:57,699 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.29 vs. limit=15.0 2024-08-14 10:54:59,700 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2620320.0, ans=0.0 2024-08-14 10:55:03,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2620320.0, ans=0.0 2024-08-14 10:55:07,633 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=2620320.0, ans=10.0 2024-08-14 10:55:12,880 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 22 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-14 10:55:14,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2620420.0, ans=0.0 2024-08-14 10:55:24,660 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 1200, loss[loss=0.08399, beats_loss=0.01314, ecapa_loss=0.0001436, whisper_loss=0.06941, over 22322.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01081, ecapa_loss=0.0001508, whisper_loss=0.08854, over 3831736.27 frames. ], batch size: 93, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:55:31,585 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.782e+01 2.349e+01 2.616e+01 2.854e+01 5.362e+01, threshold=5.231e+01, percent-clipped=1.0 2024-08-14 10:55:34,457 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-14 10:55:37,424 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2620620.0, ans=0.125 2024-08-14 10:55:42,581 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2620620.0, ans=0.125 2024-08-14 10:55:45,419 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.86 vs. limit=22.5 2024-08-14 10:56:06,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2620820.0, ans=0.1 2024-08-14 10:56:13,736 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 20 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-14 10:56:21,569 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 23 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-14 10:56:31,874 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 1250, loss[loss=0.09018, beats_loss=0.01624, ecapa_loss=0.0001074, whisper_loss=0.07287, over 19473.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01087, ecapa_loss=0.0001507, whisper_loss=0.08811, over 3819445.44 frames. ], batch size: 79, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:56:35,178 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 22 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-14 10:56:46,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2621120.0, ans=0.125 2024-08-14 10:56:56,508 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-14 10:57:07,597 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.33 vs. limit=22.5 2024-08-14 10:57:11,201 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2621320.0, ans=0.125 2024-08-14 10:57:19,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2621320.0, ans=0.1 2024-08-14 10:57:39,416 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 1300, loss[loss=0.116, beats_loss=0.007162, ecapa_loss=0.0001337, whisper_loss=0.1075, over 15495.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01084, ecapa_loss=0.0001516, whisper_loss=0.08805, over 3832682.81 frames. ], batch size: 54, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:57:40,125 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.97 vs. limit=15.0 2024-08-14 10:57:45,729 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.787e+01 2.286e+01 2.497e+01 2.754e+01 3.684e+01, threshold=4.994e+01, percent-clipped=0.0 2024-08-14 10:57:56,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2621620.0, ans=0.125 2024-08-14 10:58:01,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2621620.0, ans=0.125 2024-08-14 10:58:09,940 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-14 10:58:24,016 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 17 from LS+wenet, 24 from Vox, 15 fro AS 2024-08-14 10:58:31,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2621820.0, ans=0.125 2024-08-14 10:58:38,684 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 22 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-14 10:58:41,324 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 24 from LS+wenet, 29 from Vox, 23 fro AS 2024-08-14 10:58:41,636 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2621920.0, ans=0.1 2024-08-14 10:58:45,530 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 23 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-14 10:58:46,616 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 1350, loss[loss=0.09629, beats_loss=0.009918, ecapa_loss=0.0001863, whisper_loss=0.08451, over 20027.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01071, ecapa_loss=0.0001519, whisper_loss=0.08953, over 3855936.66 frames. ], batch size: 83, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:59:18,278 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 21 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-14 10:59:21,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2622220.0, ans=0.09899494936611666 2024-08-14 10:59:44,965 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.94 vs. limit=15.0 2024-08-14 10:59:47,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2622420.0, ans=0.07 2024-08-14 10:59:53,316 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 1400, loss[loss=0.09389, beats_loss=0.01055, ecapa_loss=0.0001668, whisper_loss=0.08167, over 21347.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01071, ecapa_loss=0.000152, whisper_loss=0.08981, over 3872480.93 frames. ], batch size: 89, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:59:59,981 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.798e+01 2.361e+01 2.575e+01 2.810e+01 4.774e+01, threshold=5.151e+01, percent-clipped=0.0 2024-08-14 11:00:07,367 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2622620.0, ans=0.125 2024-08-14 11:00:10,055 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2622620.0, ans=0.125 2024-08-14 11:00:13,750 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 15 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-14 11:00:15,035 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 16 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-14 11:00:50,559 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.17 vs. limit=15.0 2024-08-14 11:00:59,966 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 1450, loss[loss=0.1, beats_loss=0.009477, ecapa_loss=0.0001642, whisper_loss=0.08892, over 14717.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01069, ecapa_loss=0.0001521, whisper_loss=0.08943, over 3857972.31 frames. ], batch size: 60, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:01:32,783 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-14 11:01:36,599 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 20 from LS+wenet, 9 from Vox, 31 fro AS 2024-08-14 11:01:45,603 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2623220.0, ans=0.2 2024-08-14 11:01:49,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2623220.0, ans=0.0 2024-08-14 11:02:04,727 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 22 from LS+wenet, 12 from Vox, 37 fro AS 2024-08-14 11:02:23,557 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 1500, loss[loss=0.08321, beats_loss=0.01163, ecapa_loss=0.0001295, whisper_loss=0.07028, over 15439.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.0107, ecapa_loss=0.0001515, whisper_loss=0.08911, over 3850949.00 frames. ], batch size: 61, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:02:25,611 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.09 vs. limit=15.0 2024-08-14 11:02:30,955 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 2.366e+01 2.617e+01 2.967e+01 6.359e+01, threshold=5.234e+01, percent-clipped=3.0 2024-08-14 11:02:35,638 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-14 11:02:38,291 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2623620.0, ans=0.07 2024-08-14 11:03:01,590 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.028e+01 2024-08-14 11:03:17,317 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.26 vs. limit=22.5 2024-08-14 11:03:18,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2623820.0, ans=0.0 2024-08-14 11:03:28,424 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.99 vs. limit=15.0 2024-08-14 11:03:35,280 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 31 from LS+wenet, 13 from Vox, 19 fro AS 2024-08-14 11:03:37,890 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 1550, loss[loss=0.1096, beats_loss=0.009932, ecapa_loss=0.0001532, whisper_loss=0.09816, over 22704.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01062, ecapa_loss=0.0001507, whisper_loss=0.08947, over 3835191.93 frames. ], batch size: 91, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:03:43,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2624020.0, ans=0.125 2024-08-14 11:03:49,786 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-14 11:04:11,983 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 23 from LS+wenet, 19 from Vox, 49 fro AS 2024-08-14 11:04:25,932 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2624320.0, ans=0.125 2024-08-14 11:04:28,799 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2624320.0, ans=0.125 2024-08-14 11:04:52,633 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 11:04:54,181 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 1600, loss[loss=0.08, beats_loss=0.01125, ecapa_loss=0.0001619, whisper_loss=0.06713, over 21230.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01066, ecapa_loss=0.0001506, whisper_loss=0.08932, over 3855734.31 frames. ], batch size: 89, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:05:01,274 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.852e+01 2.369e+01 2.524e+01 2.843e+01 4.192e+01, threshold=5.047e+01, percent-clipped=0.0 2024-08-14 11:05:29,002 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.011e+01 2024-08-14 11:05:41,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2624820.0, ans=0.0 2024-08-14 11:05:42,899 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2624820.0, ans=0.0 2024-08-14 11:05:43,096 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.26 vs. limit=15.0 2024-08-14 11:05:55,231 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-14 11:06:10,007 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 1650, loss[loss=0.09784, beats_loss=0.008127, ecapa_loss=0.000167, whisper_loss=0.08805, over 14300.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01068, ecapa_loss=0.00015, whisper_loss=0.08943, over 3864539.43 frames. ], batch size: 57, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:06:10,475 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.312e-02 2024-08-14 11:06:14,593 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2625020.0, ans=0.2 2024-08-14 11:06:18,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2625020.0, ans=0.125 2024-08-14 11:06:19,530 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2625020.0, ans=0.125 2024-08-14 11:06:20,658 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-14 11:06:35,225 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 23 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-14 11:06:43,345 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-14 11:06:51,417 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2625220.0, ans=0.125 2024-08-14 11:07:00,400 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2625320.0, ans=0.2 2024-08-14 11:07:10,236 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 19 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-14 11:07:19,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2625420.0, ans=0.125 2024-08-14 11:07:24,133 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-14 11:07:25,608 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 1700, loss[loss=0.1013, beats_loss=0.01103, ecapa_loss=0.0001505, whisper_loss=0.08874, over 22468.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01067, ecapa_loss=0.0001497, whisper_loss=0.08887, over 3840800.78 frames. ], batch size: 89, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:07:32,940 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.731e+01 2.263e+01 2.524e+01 2.794e+01 4.972e+01, threshold=5.048e+01, percent-clipped=0.0 2024-08-14 11:08:08,134 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.08 vs. limit=22.5 2024-08-14 11:08:15,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2625820.0, ans=0.0 2024-08-14 11:08:16,943 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2625820.0, ans=0.0 2024-08-14 11:08:29,907 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 18 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-14 11:08:32,145 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.67 vs. limit=22.5 2024-08-14 11:08:40,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2626020.0, ans=0.0 2024-08-14 11:08:40,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2626020.0, ans=0.2 2024-08-14 11:08:41,420 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 1750, loss[loss=0.07963, beats_loss=0.00978, ecapa_loss=0.0001484, whisper_loss=0.06836, over 18526.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01057, ecapa_loss=0.0001505, whisper_loss=0.08951, over 3864970.69 frames. ], batch size: 72, lr: 3.32e-03, grad_scale: 1.152921504606847e+18 2024-08-14 11:08:48,107 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.52 vs. limit=22.5 2024-08-14 11:08:51,657 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 20 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-14 11:09:00,757 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2626120.0, ans=0.125 2024-08-14 11:09:17,403 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 16 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-14 11:09:24,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2626320.0, ans=0.125 2024-08-14 11:09:31,217 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2626320.0, ans=0.125 2024-08-14 11:09:35,187 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 17 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-14 11:09:43,620 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-14 11:09:51,336 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2626420.0, ans=0.0 2024-08-14 11:09:55,067 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 1800, loss[loss=0.1154, beats_loss=0.008304, ecapa_loss=0.0001723, whisper_loss=0.1054, over 19262.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01052, ecapa_loss=0.0001504, whisper_loss=0.08971, over 3872262.51 frames. ], batch size: 75, lr: 3.32e-03, grad_scale: 1.152921504606847e+18 2024-08-14 11:10:02,573 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 32 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-14 11:10:05,044 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.754e+01 2.260e+01 2.552e+01 2.816e+01 4.964e+01, threshold=5.104e+01, percent-clipped=0.0 2024-08-14 11:10:47,234 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 17 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-14 11:10:48,954 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.63 vs. limit=10.0 2024-08-14 11:10:55,914 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 19 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-14 11:11:08,640 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.87 vs. limit=22.5 2024-08-14 11:11:11,075 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 1850, loss[loss=0.08605, beats_loss=0.0121, ecapa_loss=0.0001619, whisper_loss=0.07232, over 21749.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01051, ecapa_loss=0.00015, whisper_loss=0.09031, over 3858181.98 frames. ], batch size: 89, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:11:15,074 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.42 vs. limit=15.0 2024-08-14 11:11:16,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2627020.0, ans=0.1 2024-08-14 11:11:42,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2627220.0, ans=0.125 2024-08-14 11:11:48,868 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-14 11:11:59,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2627320.0, ans=0.125 2024-08-14 11:12:02,056 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-14 11:12:02,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2627320.0, ans=0.0 2024-08-14 11:12:05,002 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2627320.0, ans=0.125 2024-08-14 11:12:08,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2627420.0, ans=0.125 2024-08-14 11:12:11,045 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.28 vs. limit=15.0 2024-08-14 11:12:25,098 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 1900, loss[loss=0.08898, beats_loss=0.01086, ecapa_loss=0.0001225, whisper_loss=0.0769, over 20173.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01041, ecapa_loss=0.0001517, whisper_loss=0.09095, over 3879488.54 frames. ], batch size: 78, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:12:33,411 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.746e+01 2.313e+01 2.505e+01 2.769e+01 4.411e+01, threshold=5.010e+01, percent-clipped=0.0 2024-08-14 11:12:51,277 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-14 11:13:01,906 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 18 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-14 11:13:05,181 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.43 vs. limit=15.0 2024-08-14 11:13:06,104 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 23 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-14 11:13:08,409 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.895e+05 2024-08-14 11:13:12,828 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-14 11:13:30,108 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 24 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-14 11:13:39,279 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 1950, loss[loss=0.1053, beats_loss=0.01034, ecapa_loss=0.0001666, whisper_loss=0.09333, over 19346.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0105, ecapa_loss=0.000152, whisper_loss=0.09069, over 3852226.42 frames. ], batch size: 79, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:13:41,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2628020.0, ans=0.1 2024-08-14 11:13:51,121 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 15 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-14 11:14:10,217 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2628220.0, ans=0.125 2024-08-14 11:14:11,956 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.80 vs. limit=15.0 2024-08-14 11:14:13,698 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.67 vs. limit=6.0 2024-08-14 11:14:23,300 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2628320.0, ans=0.125 2024-08-14 11:14:38,756 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2628420.0, ans=0.1 2024-08-14 11:14:46,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2628420.0, ans=0.125 2024-08-14 11:14:54,981 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 2000, loss[loss=0.08006, beats_loss=0.0142, ecapa_loss=0.0001318, whisper_loss=0.06455, over 22584.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0105, ecapa_loss=0.0001527, whisper_loss=0.09074, over 3882667.65 frames. ], batch size: 94, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:14:57,093 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-14 11:15:04,509 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.926e+01 2.355e+01 2.639e+01 2.929e+01 2.426e+02, threshold=5.277e+01, percent-clipped=2.0 2024-08-14 11:15:04,960 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2628520.0, ans=0.0 2024-08-14 11:15:10,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2628620.0, ans=0.0 2024-08-14 11:15:10,968 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.06 vs. limit=15.0 2024-08-14 11:15:11,048 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.51 vs. limit=15.0 2024-08-14 11:15:28,150 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.04 vs. limit=10.0 2024-08-14 11:15:36,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2628720.0, ans=0.0 2024-08-14 11:15:48,645 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 18 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-14 11:16:11,347 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-14 11:16:11,568 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2629020.0, ans=0.125 2024-08-14 11:16:12,435 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 2050, loss[loss=0.0987, beats_loss=0.01109, ecapa_loss=0.0001332, whisper_loss=0.08627, over 23210.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01053, ecapa_loss=0.0001527, whisper_loss=0.09068, over 3876023.22 frames. ], batch size: 91, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:16:21,591 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-14 11:16:23,070 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 12 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-14 11:16:28,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2629120.0, ans=0.125 2024-08-14 11:16:40,282 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 27 from LS+wenet, 12 from Vox, 38 fro AS 2024-08-14 11:16:42,031 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2629220.0, ans=0.1 2024-08-14 11:16:43,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2629220.0, ans=0.0 2024-08-14 11:16:48,860 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2629220.0, ans=0.2 2024-08-14 11:16:49,187 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.80 vs. limit=15.0 2024-08-14 11:17:16,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2629420.0, ans=0.125 2024-08-14 11:17:18,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2629420.0, ans=0.0 2024-08-14 11:17:18,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2629420.0, ans=0.04949747468305833 2024-08-14 11:17:30,957 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 2100, loss[loss=0.08768, beats_loss=0.01237, ecapa_loss=0.0001494, whisper_loss=0.07382, over 21805.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01064, ecapa_loss=0.0001518, whisper_loss=0.09031, over 3872321.33 frames. ], batch size: 90, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:17:35,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2629520.0, ans=0.0 2024-08-14 11:17:39,366 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.688e+01 2.267e+01 2.457e+01 2.784e+01 3.709e+01, threshold=4.913e+01, percent-clipped=0.0 2024-08-14 11:17:42,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2629520.0, ans=0.1 2024-08-14 11:17:59,304 WARNING [optim.py:496] (0/4) Scaling gradients by 0.09374744445085526, model_norm_threshold=49.13302230834961 2024-08-14 11:17:59,528 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.27, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.336e+04, grad_sumsq=7.336e+04, orig_rms_sq=1.000e+00 2024-08-14 11:18:07,598 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 25 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-14 11:18:16,203 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.51 vs. limit=22.5 2024-08-14 11:18:26,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2629820.0, ans=0.125 2024-08-14 11:18:49,669 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 2150, loss[loss=0.1123, beats_loss=0.01006, ecapa_loss=0.0001616, whisper_loss=0.1006, over 18130.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0107, ecapa_loss=0.0001511, whisper_loss=0.09059, over 3875722.13 frames. ], batch size: 74, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:18:59,605 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 26 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-14 11:19:01,845 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.62 vs. limit=15.0 2024-08-14 11:19:10,074 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 31 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-14 11:19:29,655 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2630220.0, ans=0.0 2024-08-14 11:19:33,937 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2630220.0, ans=0.125 2024-08-14 11:19:38,699 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2630320.0, ans=0.125 2024-08-14 11:19:46,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2630320.0, ans=0.0 2024-08-14 11:19:51,461 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2630420.0, ans=0.1 2024-08-14 11:19:53,327 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.59 vs. limit=15.0 2024-08-14 11:20:04,162 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2630420.0, ans=0.125 2024-08-14 11:20:08,781 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 2200, loss[loss=0.0964, beats_loss=0.009911, ecapa_loss=0.0001538, whisper_loss=0.08495, over 19059.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01069, ecapa_loss=0.0001515, whisper_loss=0.09056, over 3837218.32 frames. ], batch size: 76, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:20:12,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2630520.0, ans=0.0 2024-08-14 11:20:17,591 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.906e+01 2.405e+01 2.616e+01 2.970e+01 5.241e+02, threshold=5.232e+01, percent-clipped=2.0 2024-08-14 11:20:26,355 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.60 vs. limit=15.0 2024-08-14 11:20:35,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2630620.0, ans=0.1 2024-08-14 11:20:53,549 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 23 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-14 11:21:02,691 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 11:21:04,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2630820.0, ans=0.125 2024-08-14 11:21:09,511 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2630820.0, ans=0.1 2024-08-14 11:21:13,510 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2630920.0, ans=0.125 2024-08-14 11:21:15,101 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-14 11:21:29,081 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 2250, loss[loss=0.1174, beats_loss=0.008973, ecapa_loss=0.0001705, whisper_loss=0.1067, over 22597.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01069, ecapa_loss=0.0001522, whisper_loss=0.09079, over 3828801.34 frames. ], batch size: 92, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:21:32,135 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2631020.0, ans=0.09899494936611666 2024-08-14 11:21:40,143 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 23 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-14 11:21:41,377 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-14 11:21:42,448 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 26 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-14 11:22:03,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2631220.0, ans=0.0 2024-08-14 11:22:49,258 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 2300, loss[loss=0.09337, beats_loss=0.01476, ecapa_loss=0.0001181, whisper_loss=0.07743, over 16742.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01065, ecapa_loss=0.0001521, whisper_loss=0.09201, over 3894375.88 frames. ], batch size: 66, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:22:49,430 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 19 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-14 11:22:53,038 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 23 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-14 11:22:58,676 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.348e+01 2.609e+01 2.849e+01 2.533e+02, threshold=5.217e+01, percent-clipped=1.0 2024-08-14 11:22:58,833 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 21 from LS+wenet, 27 from Vox, 25 fro AS 2024-08-14 11:23:26,469 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 22 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-14 11:23:28,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2631720.0, ans=0.1 2024-08-14 11:23:44,015 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 25 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-14 11:23:56,081 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2631920.0, ans=0.1 2024-08-14 11:24:02,447 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 21 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-14 11:24:08,785 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 2350, loss[loss=0.1134, beats_loss=0.009331, ecapa_loss=0.000173, whisper_loss=0.1024, over 16301.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01064, ecapa_loss=0.0001534, whisper_loss=0.09172, over 3883810.14 frames. ], batch size: 68, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:24:21,552 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-14 11:24:27,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2632120.0, ans=0.2 2024-08-14 11:24:31,419 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.43 vs. limit=15.0 2024-08-14 11:24:42,362 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.05 vs. limit=15.0 2024-08-14 11:24:44,666 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 36 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-14 11:24:49,585 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2632220.0, ans=0.125 2024-08-14 11:24:54,655 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 31 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-14 11:24:56,247 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 23 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-14 11:24:56,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2632320.0, ans=0.125 2024-08-14 11:25:11,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2632420.0, ans=0.125 2024-08-14 11:25:29,181 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 2400, loss[loss=0.1043, beats_loss=0.01176, ecapa_loss=0.0001328, whisper_loss=0.09125, over 22670.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01058, ecapa_loss=0.0001539, whisper_loss=0.09184, over 3906089.03 frames. ], batch size: 89, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:25:29,283 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 16 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-14 11:25:38,063 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.318e+01 2.574e+01 2.948e+01 5.851e+01, threshold=5.149e+01, percent-clipped=1.0 2024-08-14 11:25:53,104 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-14 11:25:56,226 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 21 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-14 11:26:06,302 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-14 11:26:12,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2632720.0, ans=0.125 2024-08-14 11:26:15,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2632820.0, ans=0.0 2024-08-14 11:26:20,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2632820.0, ans=0.0 2024-08-14 11:26:22,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2632820.0, ans=0.125 2024-08-14 11:26:24,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2632820.0, ans=0.125 2024-08-14 11:26:28,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2632820.0, ans=0.0 2024-08-14 11:26:34,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2632920.0, ans=0.125 2024-08-14 11:26:42,682 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 21 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-14 11:26:46,784 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 2450, loss[loss=0.1059, beats_loss=0.01309, ecapa_loss=0.0001179, whisper_loss=0.09167, over 23209.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01063, ecapa_loss=0.0001539, whisper_loss=0.09143, over 3915850.44 frames. ], batch size: 90, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:27:11,358 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-14 11:27:22,195 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2633220.0, ans=0.125 2024-08-14 11:27:23,625 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 17 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-14 11:27:25,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2633220.0, ans=0.0 2024-08-14 11:27:27,060 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.38 vs. limit=15.0 2024-08-14 11:27:29,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2633220.0, ans=0.1 2024-08-14 11:27:31,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2633320.0, ans=0.0 2024-08-14 11:27:33,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2633320.0, ans=0.2 2024-08-14 11:27:56,069 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 24 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-14 11:28:03,467 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 2500, loss[loss=0.09967, beats_loss=0.01186, ecapa_loss=0.0001546, whisper_loss=0.08626, over 20876.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01055, ecapa_loss=0.0001542, whisper_loss=0.09216, over 3888518.38 frames. ], batch size: 82, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:28:12,000 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.237e+01 2.442e+01 2.717e+01 4.288e+01, threshold=4.884e+01, percent-clipped=0.0 2024-08-14 11:28:21,811 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 26 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-14 11:29:04,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2633920.0, ans=0.0 2024-08-14 11:29:20,839 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 2550, loss[loss=0.1211, beats_loss=0.008058, ecapa_loss=0.0001763, whisper_loss=0.1113, over 14859.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01051, ecapa_loss=0.0001538, whisper_loss=0.09291, over 3886634.84 frames. ], batch size: 58, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:29:27,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2634020.0, ans=0.0 2024-08-14 11:29:38,992 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 22 from LS+wenet, 12 from Vox, 49 fro AS 2024-08-14 11:29:40,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2634120.0, ans=0.0 2024-08-14 11:29:56,907 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 19 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-14 11:29:58,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2634220.0, ans=0.1 2024-08-14 11:30:28,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2634420.0, ans=0.0 2024-08-14 11:30:33,493 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 23 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-14 11:30:40,425 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 2600, loss[loss=0.1042, beats_loss=0.01107, ecapa_loss=0.0001373, whisper_loss=0.09173, over 22445.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01056, ecapa_loss=0.0001546, whisper_loss=0.0923, over 3900261.51 frames. ], batch size: 86, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:30:49,114 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.954e+01 2.475e+01 2.751e+01 3.111e+01 1.109e+02, threshold=5.502e+01, percent-clipped=3.0 2024-08-14 11:30:53,308 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.61 vs. limit=15.0 2024-08-14 11:31:23,132 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2634720.0, ans=0.0 2024-08-14 11:31:26,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2634820.0, ans=0.125 2024-08-14 11:31:26,816 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.98 vs. limit=15.0 2024-08-14 11:31:52,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2634920.0, ans=0.125 2024-08-14 11:31:58,230 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 2650, loss[loss=0.09682, beats_loss=0.01138, ecapa_loss=0.0001709, whisper_loss=0.08373, over 21544.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01058, ecapa_loss=0.0001553, whisper_loss=0.09187, over 3888263.73 frames. ], batch size: 89, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:31:58,339 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 17 from LS+wenet, 21 from Vox, 17 fro AS 2024-08-14 11:32:02,830 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.367e-02 2024-08-14 11:32:10,597 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2635020.0, ans=0.04949747468305833 2024-08-14 11:32:11,920 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2635120.0, ans=0.125 2024-08-14 11:32:17,693 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2635120.0, ans=0.1 2024-08-14 11:32:20,993 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.77 vs. limit=6.0 2024-08-14 11:32:22,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2635120.0, ans=0.04949747468305833 2024-08-14 11:32:29,028 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.60 vs. limit=22.5 2024-08-14 11:32:32,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2635220.0, ans=0.1 2024-08-14 11:32:44,704 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 21 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-14 11:33:10,592 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-14 11:33:13,700 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 2700, loss[loss=0.1124, beats_loss=0.01052, ecapa_loss=0.0001688, whisper_loss=0.1001, over 22578.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01059, ecapa_loss=0.0001556, whisper_loss=0.09165, over 3898371.23 frames. ], batch size: 93, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:33:22,070 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.781e+01 2.370e+01 2.672e+01 3.079e+01 4.287e+01, threshold=5.344e+01, percent-clipped=0.0 2024-08-14 11:33:35,451 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2635620.0, ans=0.0 2024-08-14 11:33:36,974 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 18 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-14 11:34:08,242 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2635820.0, ans=0.0 2024-08-14 11:34:12,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2635820.0, ans=0.0 2024-08-14 11:34:14,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2635820.0, ans=0.0 2024-08-14 11:34:37,827 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 2750, loss[loss=0.07889, beats_loss=0.01201, ecapa_loss=0.0001498, whisper_loss=0.06539, over 18149.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01062, ecapa_loss=0.0001545, whisper_loss=0.09154, over 3902096.36 frames. ], batch size: 74, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:34:46,829 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.53 vs. limit=22.5 2024-08-14 11:34:47,967 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.73 vs. limit=15.0 2024-08-14 11:35:01,452 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2636120.0, ans=0.0 2024-08-14 11:35:31,475 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-14 11:36:07,560 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 2800, loss[loss=0.1077, beats_loss=0.009847, ecapa_loss=0.0001332, whisper_loss=0.09649, over 20991.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01061, ecapa_loss=0.0001541, whisper_loss=0.09178, over 3908345.98 frames. ], batch size: 81, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:36:19,660 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.323e+01 2.596e+01 2.984e+01 3.829e+01, threshold=5.192e+01, percent-clipped=0.0 2024-08-14 11:36:33,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2636620.0, ans=0.1 2024-08-14 11:36:50,235 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2636720.0, ans=0.125 2024-08-14 11:37:35,468 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 23 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-14 11:37:38,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2636920.0, ans=0.0 2024-08-14 11:37:48,244 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 2850, loss[loss=0.08838, beats_loss=0.01055, ecapa_loss=0.0001689, whisper_loss=0.07615, over 20281.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0107, ecapa_loss=0.0001531, whisper_loss=0.09089, over 3862130.42 frames. ], batch size: 83, lr: 3.31e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:37:55,138 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 23 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-14 11:38:03,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2637020.0, ans=0.125 2024-08-14 11:38:10,676 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-14 11:38:17,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2637120.0, ans=0.125 2024-08-14 11:38:27,495 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 24 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-14 11:38:38,014 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.26 vs. limit=15.0 2024-08-14 11:38:46,879 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2637220.0, ans=0.0 2024-08-14 11:38:49,926 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-14 11:39:01,671 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2637320.0, ans=0.125 2024-08-14 11:39:08,395 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.66 vs. limit=6.0 2024-08-14 11:39:12,557 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 19 from LS+wenet, 24 from Vox, 17 fro AS 2024-08-14 11:39:51,067 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 2900, loss[loss=0.11, beats_loss=0.008456, ecapa_loss=0.00016, whisper_loss=0.09997, over 20416.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01077, ecapa_loss=0.0001536, whisper_loss=0.09087, over 3855027.15 frames. ], batch size: 81, lr: 3.31e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:40:05,934 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.270e+01 2.545e+01 2.877e+01 7.977e+01, threshold=5.090e+01, percent-clipped=2.0 2024-08-14 11:40:48,425 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 25 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-14 11:41:01,899 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.71 vs. limit=15.0 2024-08-14 11:41:04,013 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2637820.0, ans=0.09899494936611666 2024-08-14 11:41:06,141 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.93 vs. limit=15.0 2024-08-14 11:41:47,498 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-14 11:41:53,278 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2637920.0, ans=0.0 2024-08-14 11:41:57,205 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 2950, loss[loss=0.1127, beats_loss=0.009119, ecapa_loss=0.000186, whisper_loss=0.1017, over 18343.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01073, ecapa_loss=0.0001556, whisper_loss=0.09069, over 3846404.02 frames. ], batch size: 75, lr: 3.31e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:42:07,306 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.08 vs. limit=12.0 2024-08-14 11:43:03,129 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-14 11:43:11,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.whiten.whitening_limit, batch_count=2638320.0, ans=12.0 2024-08-14 11:43:15,484 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 16 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-14 11:43:20,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2638320.0, ans=0.2 2024-08-14 11:43:46,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2638420.0, ans=0.0 2024-08-14 11:43:56,112 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 3000, loss[loss=0.09092, beats_loss=0.01038, ecapa_loss=0.0001745, whisper_loss=0.0788, over 19246.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01073, ecapa_loss=0.0001567, whisper_loss=0.09068, over 3859224.96 frames. ], batch size: 80, lr: 3.31e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:43:56,113 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-14 11:44:19,616 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.1831, 3.7050, 3.7981, 4.0315], device='cuda:0') 2024-08-14 11:44:34,407 INFO [train_multi_KD3.py:1149] (0/4) Epoch 19, validation on ASR_libri: loss=0.2526, beats_loss=0, ecapa_loss=0.0005472, whisper_loss=0.2471, over 922467.00 frames. 2024-08-14 11:44:51,392 INFO [train_multi_KD3.py:1149] (0/4) Epoch 19, validation on SV_voxceleb1: loss=0.00425, beats_loss=0, ecapa_loss=0.000425, whisper_loss=0, over 939242.00 frames. 2024-08-14 11:46:47,999 INFO [train_multi_KD3.py:1149] (0/4) Epoch 19, validation on AT_audioset: loss=0.02345, beats_loss=0.02345, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 11:46:48,003 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32601MB 2024-08-14 11:46:52,471 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 27 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-14 11:46:54,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2638520.0, ans=0.0 2024-08-14 11:46:57,812 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.515e+01 2.846e+01 3.137e+01 6.212e+01, threshold=5.693e+01, percent-clipped=1.0 2024-08-14 11:47:21,956 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-14 11:47:22,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2638720.0, ans=0.125 2024-08-14 11:47:35,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2638820.0, ans=0.125 2024-08-14 11:47:55,213 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2638920.0, ans=0.2 2024-08-14 11:47:56,848 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2638920.0, ans=0.0 2024-08-14 11:47:58,402 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2638920.0, ans=0.0 2024-08-14 11:48:07,511 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 3050, loss[loss=0.09098, beats_loss=0.01068, ecapa_loss=0.0001627, whisper_loss=0.07867, over 14333.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01073, ecapa_loss=0.0001566, whisper_loss=0.0911, over 3896501.60 frames. ], batch size: 58, lr: 3.31e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:48:07,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2639020.0, ans=0.0 2024-08-14 11:48:12,792 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 22 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-14 11:48:24,538 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2639120.0, ans=0.09899494936611666 2024-08-14 11:48:28,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2639120.0, ans=0.0 2024-08-14 11:48:29,652 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 23 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-14 11:48:34,288 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 25 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-14 11:48:42,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=2639220.0, ans=15.0 2024-08-14 11:48:46,491 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2639220.0, ans=0.125 2024-08-14 11:48:49,965 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-14 11:49:04,145 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2639320.0, ans=0.2 2024-08-14 11:49:06,332 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 32 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-14 11:49:16,392 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 11:49:25,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2639420.0, ans=0.0 2024-08-14 11:49:30,161 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 3100, loss[loss=0.09743, beats_loss=0.01339, ecapa_loss=0.0001282, whisper_loss=0.08275, over 16337.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01075, ecapa_loss=0.0001561, whisper_loss=0.09134, over 3881357.01 frames. ], batch size: 61, lr: 3.31e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:49:30,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2639520.0, ans=0.125 2024-08-14 11:49:37,304 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2639520.0, ans=0.0 2024-08-14 11:49:39,666 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.912e+01 2.342e+01 2.628e+01 3.036e+01 4.820e+01, threshold=5.256e+01, percent-clipped=0.0 2024-08-14 11:49:41,446 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 22 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-14 11:50:14,157 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 22 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-14 11:50:16,005 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2639820.0, ans=0.0 2024-08-14 11:50:38,375 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-14 11:50:42,703 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-264000.pt 2024-08-14 11:50:49,415 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 3150, loss[loss=0.124, beats_loss=0.008452, ecapa_loss=0.0001708, whisper_loss=0.1138, over 23115.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01076, ecapa_loss=0.0001565, whisper_loss=0.09139, over 3887772.95 frames. ], batch size: 90, lr: 3.31e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:51:05,093 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 28 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-14 11:51:23,236 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-14 11:51:39,281 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 24 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-14 11:51:43,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2640320.0, ans=0.125 2024-08-14 11:51:49,717 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2640320.0, ans=0.125 2024-08-14 11:52:01,658 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-14 11:52:04,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2640420.0, ans=0.0 2024-08-14 11:52:06,890 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 3200, loss[loss=0.1011, beats_loss=0.01188, ecapa_loss=0.0001458, whisper_loss=0.08772, over 22459.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01069, ecapa_loss=0.0001566, whisper_loss=0.09138, over 3849729.72 frames. ], batch size: 89, lr: 3.31e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:52:10,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2640520.0, ans=0.1 2024-08-14 11:52:11,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2640520.0, ans=0.0 2024-08-14 11:52:16,456 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.40 vs. limit=15.0 2024-08-14 11:52:16,904 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.565e+01 2.359e+01 2.591e+01 2.913e+01 5.020e+01, threshold=5.181e+01, percent-clipped=0.0 2024-08-14 11:53:13,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2640920.0, ans=0.0 2024-08-14 11:53:16,302 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 18 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-14 11:53:22,985 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 3250, loss[loss=0.1018, beats_loss=0.01082, ecapa_loss=0.0001826, whisper_loss=0.08913, over 20403.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01071, ecapa_loss=0.0001564, whisper_loss=0.09139, over 3890463.72 frames. ], batch size: 86, lr: 3.31e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:53:24,020 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2641020.0, ans=0.0 2024-08-14 11:53:27,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2641020.0, ans=0.125 2024-08-14 11:53:47,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2641120.0, ans=0.0 2024-08-14 11:54:13,909 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-14 11:54:24,561 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 25 from LS+wenet, 30 from Vox, 39 fro AS 2024-08-14 11:54:42,880 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 15 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-14 11:54:43,151 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2641520.0, ans=0.125 2024-08-14 11:54:44,012 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 3300, loss[loss=0.07979, beats_loss=0.01121, ecapa_loss=0.0001464, whisper_loss=0.06711, over 15274.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01073, ecapa_loss=0.0001559, whisper_loss=0.09151, over 3891943.45 frames. ], batch size: 60, lr: 3.31e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:54:51,959 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.50 vs. limit=6.0 2024-08-14 11:54:54,009 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.805e+01 2.343e+01 2.686e+01 3.135e+01 1.274e+02, threshold=5.372e+01, percent-clipped=3.0 2024-08-14 11:55:01,866 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2641620.0, ans=0.125 2024-08-14 11:55:18,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2641720.0, ans=0.1 2024-08-14 11:55:35,401 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2641820.0, ans=0.2 2024-08-14 11:55:50,069 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 28 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-14 11:55:56,061 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-14 11:56:04,016 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 3350, loss[loss=0.1119, beats_loss=0.01041, ecapa_loss=0.0001893, whisper_loss=0.09963, over 22268.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01069, ecapa_loss=0.0001567, whisper_loss=0.09181, over 3900162.18 frames. ], batch size: 92, lr: 3.31e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:56:16,409 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2642020.0, ans=0.0 2024-08-14 11:56:43,704 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2642220.0, ans=0.125 2024-08-14 11:56:47,793 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 18 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-14 11:56:50,050 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.11 vs. limit=22.5 2024-08-14 11:56:52,531 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 19 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-14 11:56:55,508 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-14 11:57:23,064 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 3400, loss[loss=0.07032, beats_loss=0.01396, ecapa_loss=0.0001878, whisper_loss=0.05448, over 20448.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0108, ecapa_loss=0.0001571, whisper_loss=0.09085, over 3921393.42 frames. ], batch size: 93, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 11:57:34,542 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.929e+01 2.486e+01 2.836e+01 3.327e+01 1.695e+02, threshold=5.673e+01, percent-clipped=4.0 2024-08-14 11:57:45,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2642620.0, ans=0.125 2024-08-14 11:57:57,908 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 23 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-14 11:57:58,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2642720.0, ans=0.0 2024-08-14 11:58:11,960 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2642820.0, ans=0.125 2024-08-14 11:58:24,206 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 26 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-14 11:58:26,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2642920.0, ans=0.0 2024-08-14 11:58:43,992 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 3450, loss[loss=0.09633, beats_loss=0.01206, ecapa_loss=0.0001426, whisper_loss=0.08284, over 19986.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01082, ecapa_loss=0.0001579, whisper_loss=0.09026, over 3911257.85 frames. ], batch size: 78, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 11:59:11,556 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2643120.0, ans=0.1 2024-08-14 11:59:12,934 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 16 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-14 11:59:49,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2643420.0, ans=0.125 2024-08-14 12:00:03,600 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 3500, loss[loss=0.1207, beats_loss=0.01151, ecapa_loss=0.0001682, whisper_loss=0.1076, over 22436.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01085, ecapa_loss=0.0001572, whisper_loss=0.09038, over 3908570.63 frames. ], batch size: 93, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:00:10,275 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.93 vs. limit=10.0 2024-08-14 12:00:15,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2643520.0, ans=0.2 2024-08-14 12:00:16,073 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.678e+01 2.311e+01 2.583e+01 2.814e+01 3.893e+01, threshold=5.167e+01, percent-clipped=0.0 2024-08-14 12:00:30,901 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 36 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-14 12:00:43,131 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 19 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-14 12:00:51,472 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2643720.0, ans=0.0 2024-08-14 12:01:01,830 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 36 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-14 12:01:02,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2643820.0, ans=0.125 2024-08-14 12:01:12,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2643920.0, ans=0.0 2024-08-14 12:01:14,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2643920.0, ans=0.125 2024-08-14 12:01:17,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2643920.0, ans=0.0 2024-08-14 12:01:18,380 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.95 vs. limit=15.0 2024-08-14 12:01:19,141 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 10 from Vox, 36 fro AS 2024-08-14 12:01:26,609 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 3550, loss[loss=0.09156, beats_loss=0.01218, ecapa_loss=0.000164, whisper_loss=0.07774, over 22507.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01088, ecapa_loss=0.0001567, whisper_loss=0.09, over 3895586.59 frames. ], batch size: 91, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:01:34,166 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.79 vs. limit=12.0 2024-08-14 12:01:57,822 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2644120.0, ans=0.0 2024-08-14 12:01:58,302 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.85 vs. limit=22.5 2024-08-14 12:02:06,236 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.04 vs. limit=15.0 2024-08-14 12:02:08,843 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 35 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-14 12:02:13,680 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 19 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-14 12:02:29,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2644320.0, ans=0.05 2024-08-14 12:02:35,004 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 31 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-14 12:02:51,348 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 3600, loss[loss=0.1312, beats_loss=0.007407, ecapa_loss=0.0002002, whisper_loss=0.1218, over 20409.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01079, ecapa_loss=0.0001572, whisper_loss=0.09034, over 3867611.55 frames. ], batch size: 80, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:02:58,001 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2644520.0, ans=0.125 2024-08-14 12:02:58,314 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.36 vs. limit=15.0 2024-08-14 12:03:00,502 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 26 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-14 12:03:00,845 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=3.602e+00 2024-08-14 12:03:01,952 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.967e+01 2.481e+01 2.628e+01 2.850e+01 4.421e+01, threshold=5.257e+01, percent-clipped=0.0 2024-08-14 12:03:02,218 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 33 from Vox, 29 fro AS 2024-08-14 12:03:08,182 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.302e+01 2024-08-14 12:03:21,591 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 22 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-14 12:03:23,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2644720.0, ans=0.1 2024-08-14 12:03:25,334 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=4.910e-01 2024-08-14 12:03:25,443 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.47 vs. limit=15.0 2024-08-14 12:03:31,825 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.39 vs. limit=6.0 2024-08-14 12:03:34,316 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-14 12:03:38,658 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 26 from LS+wenet, 10 from Vox, 39 fro AS 2024-08-14 12:03:41,076 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2644820.0, ans=0.0 2024-08-14 12:03:51,298 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.76 vs. limit=22.5 2024-08-14 12:04:08,633 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 3650, loss[loss=0.122, beats_loss=0.01012, ecapa_loss=0.0001244, whisper_loss=0.1107, over 24469.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01071, ecapa_loss=0.000158, whisper_loss=0.09059, over 3848471.19 frames. ], batch size: 93, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:04:22,266 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 37 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-14 12:04:31,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2645120.0, ans=0.0 2024-08-14 12:04:34,898 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.43 vs. limit=22.5 2024-08-14 12:05:24,170 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 3700, loss[loss=0.09937, beats_loss=0.009355, ecapa_loss=0.0001961, whisper_loss=0.08805, over 17797.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0107, ecapa_loss=0.0001584, whisper_loss=0.0904, over 3842495.83 frames. ], batch size: 73, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:05:27,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2645520.0, ans=0.2 2024-08-14 12:05:33,728 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.777e+01 2.298e+01 2.531e+01 2.738e+01 1.071e+02, threshold=5.062e+01, percent-clipped=1.0 2024-08-14 12:06:07,803 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-14 12:06:09,711 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2645820.0, ans=0.0 2024-08-14 12:06:25,286 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 24 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-14 12:06:27,046 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 34 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-14 12:06:39,005 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 3750, loss[loss=0.1172, beats_loss=0.01096, ecapa_loss=0.0001338, whisper_loss=0.1049, over 20386.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01066, ecapa_loss=0.0001576, whisper_loss=0.09045, over 3827696.29 frames. ], batch size: 80, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:07:34,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2646320.0, ans=0.125 2024-08-14 12:07:44,089 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2646420.0, ans=0.125 2024-08-14 12:07:47,803 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.41 vs. limit=15.0 2024-08-14 12:07:55,184 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2646520.0, ans=0.0 2024-08-14 12:07:56,008 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 3800, loss[loss=0.08639, beats_loss=0.01007, ecapa_loss=0.0001703, whisper_loss=0.07462, over 19423.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01064, ecapa_loss=0.0001581, whisper_loss=0.091, over 3823723.94 frames. ], batch size: 79, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:08:05,965 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.055e+01 2.378e+01 2.670e+01 2.953e+01 4.426e+01, threshold=5.341e+01, percent-clipped=0.0 2024-08-14 12:08:32,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2646720.0, ans=0.0 2024-08-14 12:08:38,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2646720.0, ans=0.0 2024-08-14 12:08:44,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2646820.0, ans=0.125 2024-08-14 12:08:47,480 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 22 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-14 12:08:49,463 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.09 vs. limit=22.5 2024-08-14 12:08:50,928 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.81 vs. limit=12.0 2024-08-14 12:08:55,427 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2646820.0, ans=0.125 2024-08-14 12:09:00,114 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2646920.0, ans=0.0 2024-08-14 12:09:05,815 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 20 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-14 12:09:06,450 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.10 vs. limit=22.5 2024-08-14 12:09:11,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2646920.0, ans=0.05 2024-08-14 12:09:14,117 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 3850, loss[loss=0.1178, beats_loss=0.01202, ecapa_loss=0.000154, whisper_loss=0.1042, over 18443.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0107, ecapa_loss=0.0001577, whisper_loss=0.09089, over 3841846.76 frames. ], batch size: 73, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:09:15,720 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 16 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-14 12:09:55,598 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-14 12:10:18,187 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.66 vs. limit=15.0 2024-08-14 12:10:22,626 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 17 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-14 12:10:35,103 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 3900, loss[loss=0.1058, beats_loss=0.01239, ecapa_loss=0.000142, whisper_loss=0.09201, over 22085.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0107, ecapa_loss=0.0001577, whisper_loss=0.09141, over 3879906.54 frames. ], batch size: 88, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:10:38,895 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2647520.0, ans=0.125 2024-08-14 12:10:48,103 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.360e+01 2.691e+01 2.914e+01 3.544e+02, threshold=5.383e+01, percent-clipped=1.0 2024-08-14 12:11:07,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2647620.0, ans=0.0 2024-08-14 12:11:15,884 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.39 vs. limit=22.5 2024-08-14 12:11:28,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2647820.0, ans=0.0 2024-08-14 12:11:47,522 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2647920.0, ans=0.025 2024-08-14 12:11:47,664 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.06 vs. limit=15.0 2024-08-14 12:11:55,093 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2647920.0, ans=0.125 2024-08-14 12:12:01,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2647920.0, ans=0.0 2024-08-14 12:12:05,064 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 3950, loss[loss=0.09668, beats_loss=0.01113, ecapa_loss=0.0001472, whisper_loss=0.08408, over 18229.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01065, ecapa_loss=0.0001586, whisper_loss=0.09172, over 3904630.13 frames. ], batch size: 72, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:12:54,158 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2648220.0, ans=0.125 2024-08-14 12:13:15,981 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 18 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-14 12:13:32,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2648420.0, ans=0.0 2024-08-14 12:13:48,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2648420.0, ans=0.0 2024-08-14 12:13:50,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2648520.0, ans=0.125 2024-08-14 12:13:52,347 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 4000, loss[loss=0.09006, beats_loss=0.01235, ecapa_loss=0.0001367, whisper_loss=0.07635, over 22106.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0106, ecapa_loss=0.0001585, whisper_loss=0.0915, over 3915121.24 frames. ], batch size: 89, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:13:52,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2648520.0, ans=0.05 2024-08-14 12:13:54,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2648520.0, ans=0.0 2024-08-14 12:14:07,568 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.071e+01 2.464e+01 2.683e+01 2.941e+01 4.279e+01, threshold=5.366e+01, percent-clipped=0.0 2024-08-14 12:14:56,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2648720.0, ans=0.0 2024-08-14 12:15:12,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2648820.0, ans=0.125 2024-08-14 12:15:16,180 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.90 vs. limit=15.0 2024-08-14 12:15:22,764 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.78 vs. limit=6.0 2024-08-14 12:15:43,398 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2648920.0, ans=0.125 2024-08-14 12:15:52,694 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 4050, loss[loss=0.1244, beats_loss=0.006628, ecapa_loss=0.0001907, whisper_loss=0.1159, over 20710.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01055, ecapa_loss=0.0001587, whisper_loss=0.09218, over 3908659.18 frames. ], batch size: 80, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:15:53,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2649020.0, ans=0.125 2024-08-14 12:16:03,625 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2649020.0, ans=0.1 2024-08-14 12:16:50,900 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 21 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-14 12:16:51,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2649320.0, ans=0.025 2024-08-14 12:16:57,040 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 24 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-14 12:17:02,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2649420.0, ans=0.0 2024-08-14 12:17:18,110 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 23 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-14 12:17:20,835 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 4100, loss[loss=0.09328, beats_loss=0.01268, ecapa_loss=0.0001763, whisper_loss=0.07884, over 22024.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01061, ecapa_loss=0.0001591, whisper_loss=0.09155, over 3865348.89 frames. ], batch size: 92, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:17:28,789 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 27 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-14 12:17:33,406 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.700e+01 2.297e+01 2.541e+01 2.897e+01 6.382e+01, threshold=5.082e+01, percent-clipped=1.0 2024-08-14 12:17:37,982 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 21 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-14 12:17:40,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2649620.0, ans=0.035 2024-08-14 12:17:44,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2649620.0, ans=0.1 2024-08-14 12:17:46,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2649620.0, ans=0.125 2024-08-14 12:17:55,233 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2649620.0, ans=0.2 2024-08-14 12:18:00,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2649720.0, ans=0.125 2024-08-14 12:18:01,014 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 12:18:18,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2649820.0, ans=0.1 2024-08-14 12:18:39,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2649920.0, ans=0.125 2024-08-14 12:18:41,125 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 27 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-14 12:18:53,339 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 4150, loss[loss=0.1035, beats_loss=0.01016, ecapa_loss=0.0001504, whisper_loss=0.09185, over 14690.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01064, ecapa_loss=0.0001597, whisper_loss=0.09139, over 3858486.48 frames. ], batch size: 59, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:18:56,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2650020.0, ans=0.1 2024-08-14 12:19:31,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2650220.0, ans=0.0 2024-08-14 12:19:59,823 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2650420.0, ans=0.125 2024-08-14 12:20:05,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2650420.0, ans=0.125 2024-08-14 12:20:16,654 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 4200, loss[loss=0.09452, beats_loss=0.01182, ecapa_loss=0.000166, whisper_loss=0.08103, over 22462.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01065, ecapa_loss=0.0001596, whisper_loss=0.09119, over 3838171.09 frames. ], batch size: 92, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:20:24,740 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 21 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-14 12:20:26,428 INFO [train_multi_KD3.py:844] (0/4) A total of 96 cuts. 22 from LS+wenet, 29 from Vox, 45 fro AS 2024-08-14 12:20:27,461 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.777e+01 2.390e+01 2.581e+01 2.872e+01 4.290e+01, threshold=5.161e+01, percent-clipped=0.0 2024-08-14 12:20:34,466 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2650620.0, ans=0.0 2024-08-14 12:20:45,707 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.66 vs. limit=15.0 2024-08-14 12:21:01,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2650720.0, ans=0.0 2024-08-14 12:21:13,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2650820.0, ans=0.0 2024-08-14 12:21:16,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2650820.0, ans=0.0 2024-08-14 12:21:26,958 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 20 from LS+wenet, 25 from Vox, 48 fro AS 2024-08-14 12:21:35,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2651020.0, ans=0.04949747468305833 2024-08-14 12:21:36,316 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 4250, loss[loss=0.09404, beats_loss=0.009607, ecapa_loss=0.0001538, whisper_loss=0.0829, over 14491.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01072, ecapa_loss=0.0001577, whisper_loss=0.09045, over 3850474.09 frames. ], batch size: 59, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:21:44,280 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.25 vs. limit=15.0 2024-08-14 12:21:48,822 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 27 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-14 12:21:55,300 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 21 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-14 12:22:01,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2651120.0, ans=0.09899494936611666 2024-08-14 12:22:23,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2651220.0, ans=0.0 2024-08-14 12:22:54,242 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=3.215e+00 2024-08-14 12:22:57,759 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 4300, loss[loss=0.09809, beats_loss=0.01126, ecapa_loss=0.0001441, whisper_loss=0.08539, over 21618.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01075, ecapa_loss=0.0001573, whisper_loss=0.09046, over 3865731.72 frames. ], batch size: 86, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:23:08,851 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.823e+01 2.461e+01 2.630e+01 3.002e+01 3.746e+02, threshold=5.260e+01, percent-clipped=1.0 2024-08-14 12:23:11,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2651520.0, ans=0.0 2024-08-14 12:23:13,920 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 25 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-14 12:23:38,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2651720.0, ans=0.1 2024-08-14 12:23:47,500 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2651820.0, ans=0.0 2024-08-14 12:24:03,387 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2651920.0, ans=0.0 2024-08-14 12:24:15,547 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 4350, loss[loss=0.1216, beats_loss=0.008973, ecapa_loss=0.0001328, whisper_loss=0.1113, over 19976.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01068, ecapa_loss=0.0001585, whisper_loss=0.09047, over 3856674.28 frames. ], batch size: 74, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:24:31,288 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=12.19 vs. limit=12.0 2024-08-14 12:24:40,414 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 37 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-14 12:24:52,397 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.75 vs. limit=6.0 2024-08-14 12:24:55,174 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2652220.0, ans=0.2 2024-08-14 12:24:58,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2652220.0, ans=0.0 2024-08-14 12:25:07,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2652320.0, ans=0.125 2024-08-14 12:25:11,640 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 29 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-14 12:25:11,858 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2652320.0, ans=0.125 2024-08-14 12:25:16,081 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-14 12:25:20,669 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-14 12:25:30,609 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 4400, loss[loss=0.09848, beats_loss=0.01105, ecapa_loss=0.0001454, whisper_loss=0.08597, over 18906.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01067, ecapa_loss=0.0001582, whisper_loss=0.09069, over 3871330.81 frames. ], batch size: 75, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:25:35,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2652520.0, ans=0.0 2024-08-14 12:25:40,711 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.793e+01 2.397e+01 2.574e+01 2.948e+01 5.281e+01, threshold=5.148e+01, percent-clipped=1.0 2024-08-14 12:25:46,849 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 22 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-14 12:26:13,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2652820.0, ans=0.125 2024-08-14 12:26:18,721 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 13 from Vox, 47 fro AS 2024-08-14 12:26:21,781 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2652820.0, ans=0.125 2024-08-14 12:26:22,011 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.16 vs. limit=12.0 2024-08-14 12:26:33,849 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 30 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-14 12:26:43,943 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 4450, loss[loss=0.09873, beats_loss=0.01011, ecapa_loss=0.00016, whisper_loss=0.08702, over 20340.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01069, ecapa_loss=0.0001575, whisper_loss=0.09075, over 3874347.40 frames. ], batch size: 80, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:26:57,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2653120.0, ans=0.0 2024-08-14 12:27:14,983 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2653220.0, ans=0.125 2024-08-14 12:27:22,869 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2653220.0, ans=0.0 2024-08-14 12:27:24,297 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2653220.0, ans=0.1 2024-08-14 12:27:37,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2653320.0, ans=0.125 2024-08-14 12:27:39,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2653320.0, ans=0.125 2024-08-14 12:27:56,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2653520.0, ans=0.0 2024-08-14 12:27:57,772 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 4500, loss[loss=0.1041, beats_loss=0.01108, ecapa_loss=0.0001542, whisper_loss=0.09144, over 21568.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01063, ecapa_loss=0.0001569, whisper_loss=0.09068, over 3883262.33 frames. ], batch size: 89, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:27:58,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2653520.0, ans=0.125 2024-08-14 12:28:08,298 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.507e+01 2.296e+01 2.547e+01 2.865e+01 4.084e+01, threshold=5.093e+01, percent-clipped=0.0 2024-08-14 12:28:11,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2653620.0, ans=0.125 2024-08-14 12:28:13,697 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.77 vs. limit=15.0 2024-08-14 12:28:26,091 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-14 12:28:38,173 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.74 vs. limit=15.0 2024-08-14 12:29:00,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2653920.0, ans=0.125 2024-08-14 12:29:06,296 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-14 12:29:12,671 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 19 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-14 12:29:13,865 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 4550, loss[loss=0.1021, beats_loss=0.01348, ecapa_loss=0.0001173, whisper_loss=0.08742, over 19202.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01071, ecapa_loss=0.0001558, whisper_loss=0.0906, over 3919004.42 frames. ], batch size: 72, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:29:37,840 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2654120.0, ans=0.125 2024-08-14 12:29:46,320 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 19 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-14 12:30:29,082 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 4600, loss[loss=0.1114, beats_loss=0.01221, ecapa_loss=0.0001237, whisper_loss=0.098, over 22899.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0107, ecapa_loss=0.0001553, whisper_loss=0.09062, over 3913669.61 frames. ], batch size: 88, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:30:29,427 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 28 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-14 12:30:39,991 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.934e+01 2.344e+01 2.580e+01 2.840e+01 1.542e+02, threshold=5.160e+01, percent-clipped=2.0 2024-08-14 12:30:47,833 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 28 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-14 12:31:29,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2654920.0, ans=0.0 2024-08-14 12:31:34,488 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2654920.0, ans=0.125 2024-08-14 12:31:45,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2655020.0, ans=0.125 2024-08-14 12:31:46,965 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 4650, loss[loss=0.1098, beats_loss=0.01086, ecapa_loss=0.0001552, whisper_loss=0.09741, over 19070.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01075, ecapa_loss=0.0001553, whisper_loss=0.09044, over 3921793.09 frames. ], batch size: 76, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:31:47,859 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2655020.0, ans=0.0 2024-08-14 12:32:26,461 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2655220.0, ans=0.0 2024-08-14 12:32:29,932 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2024-08-14 12:32:43,774 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 30 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-14 12:32:58,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2655420.0, ans=0.0 2024-08-14 12:33:02,871 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 29 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-14 12:33:06,472 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 17 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-14 12:33:13,274 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 4700, loss[loss=0.1131, beats_loss=0.009174, ecapa_loss=0.0001795, whisper_loss=0.1022, over 22937.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01078, ecapa_loss=0.0001545, whisper_loss=0.09005, over 3929363.80 frames. ], batch size: 93, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:33:14,032 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.56 vs. limit=15.0 2024-08-14 12:33:25,131 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.883e+01 2.291e+01 2.499e+01 2.871e+01 5.538e+01, threshold=4.999e+01, percent-clipped=1.0 2024-08-14 12:33:39,862 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2655620.0, ans=0.125 2024-08-14 12:33:42,106 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.35 vs. limit=10.0 2024-08-14 12:33:58,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2655720.0, ans=0.2 2024-08-14 12:33:59,987 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 18 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-14 12:34:05,949 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.56 vs. limit=15.0 2024-08-14 12:34:27,735 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2655920.0, ans=0.0 2024-08-14 12:34:37,942 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 4750, loss[loss=0.111, beats_loss=0.009943, ecapa_loss=0.0001557, whisper_loss=0.09954, over 23015.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01068, ecapa_loss=0.0001557, whisper_loss=0.09068, over 3952178.77 frames. ], batch size: 93, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:34:41,101 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 26 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-14 12:34:44,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2656020.0, ans=0.125 2024-08-14 12:34:47,109 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.46 vs. limit=15.0 2024-08-14 12:34:52,675 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-14 12:34:54,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2656120.0, ans=0.125 2024-08-14 12:34:58,587 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-14 12:35:06,649 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.95 vs. limit=15.0 2024-08-14 12:35:18,179 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2656220.0, ans=0.125 2024-08-14 12:35:25,516 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.19 vs. limit=22.5 2024-08-14 12:35:32,378 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.57 vs. limit=10.0 2024-08-14 12:35:40,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2656420.0, ans=0.125 2024-08-14 12:35:44,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2656420.0, ans=0.1 2024-08-14 12:35:51,513 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 4800, loss[loss=0.09506, beats_loss=0.01054, ecapa_loss=0.0001915, whisper_loss=0.08261, over 18266.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01072, ecapa_loss=0.0001571, whisper_loss=0.09047, over 3923874.20 frames. ], batch size: 76, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:35:51,955 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-14 12:36:02,327 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.391e+01 2.624e+01 2.971e+01 4.050e+02, threshold=5.248e+01, percent-clipped=2.0 2024-08-14 12:36:03,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2656520.0, ans=0.125 2024-08-14 12:36:49,409 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 29 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-14 12:37:05,732 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 4850, loss[loss=0.0773, beats_loss=0.0125, ecapa_loss=0.0001221, whisper_loss=0.06359, over 14283.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01072, ecapa_loss=0.0001586, whisper_loss=0.0908, over 3933666.65 frames. ], batch size: 56, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:37:10,461 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 18 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-14 12:37:17,597 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.86 vs. limit=15.0 2024-08-14 12:37:22,860 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 24 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-14 12:37:23,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2657120.0, ans=0.1 2024-08-14 12:37:24,600 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 26 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-14 12:37:41,289 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2657220.0, ans=0.0 2024-08-14 12:38:20,948 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 4900, loss[loss=0.1098, beats_loss=0.01111, ecapa_loss=0.0001389, whisper_loss=0.09726, over 19103.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01076, ecapa_loss=0.0001567, whisper_loss=0.09027, over 3899785.96 frames. ], batch size: 76, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:38:31,282 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.774e+01 2.386e+01 2.578e+01 2.812e+01 7.156e+01, threshold=5.157e+01, percent-clipped=2.0 2024-08-14 12:38:50,194 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 17 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-14 12:38:54,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2657720.0, ans=0.125 2024-08-14 12:39:01,224 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.91 vs. limit=15.0 2024-08-14 12:39:27,330 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.17 vs. limit=10.0 2024-08-14 12:39:32,936 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-14 12:39:36,861 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 4950, loss[loss=0.09676, beats_loss=0.01106, ecapa_loss=0.0001207, whisper_loss=0.08449, over 17105.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0108, ecapa_loss=0.0001559, whisper_loss=0.08993, over 3877277.27 frames. ], batch size: 67, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:39:43,975 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 18 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-14 12:39:51,776 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.36 vs. limit=22.5 2024-08-14 12:39:56,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2658120.0, ans=0.2 2024-08-14 12:40:01,927 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 18 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-14 12:40:16,828 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 31 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-14 12:40:20,160 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2658320.0, ans=0.1 2024-08-14 12:40:38,755 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 14 from Vox, 43 fro AS 2024-08-14 12:40:48,328 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.56 vs. limit=15.0 2024-08-14 12:40:49,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2658520.0, ans=0.125 2024-08-14 12:40:50,166 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 5000, loss[loss=0.09385, beats_loss=0.01155, ecapa_loss=0.000162, whisper_loss=0.08067, over 20790.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01079, ecapa_loss=0.0001568, whisper_loss=0.0897, over 3863236.49 frames. ], batch size: 88, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:41:01,000 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.751e+01 2.269e+01 2.546e+01 2.965e+01 4.784e+01, threshold=5.092e+01, percent-clipped=0.0 2024-08-14 12:41:05,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2658620.0, ans=0.1 2024-08-14 12:41:16,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2658620.0, ans=0.0 2024-08-14 12:41:18,782 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-14 12:41:27,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2658720.0, ans=0.1 2024-08-14 12:41:32,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2658720.0, ans=0.125 2024-08-14 12:41:54,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2658920.0, ans=0.125 2024-08-14 12:41:57,444 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2658920.0, ans=0.125 2024-08-14 12:41:59,246 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.73 vs. limit=15.0 2024-08-14 12:42:03,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2658920.0, ans=0.2 2024-08-14 12:42:05,623 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 5050, loss[loss=0.1013, beats_loss=0.01129, ecapa_loss=0.0001501, whisper_loss=0.08848, over 20400.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01089, ecapa_loss=0.0001578, whisper_loss=0.08928, over 3861688.40 frames. ], batch size: 83, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:42:13,548 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 28 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-14 12:42:21,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2659120.0, ans=0.0 2024-08-14 12:42:22,633 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 18 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-14 12:42:27,122 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-14 12:42:31,728 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 15 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-14 12:42:33,763 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 24 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-14 12:42:45,262 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 19 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-14 12:42:47,216 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2659220.0, ans=0.0 2024-08-14 12:42:48,231 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 20 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-14 12:42:57,339 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 11 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-14 12:43:07,205 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2659420.0, ans=0.125 2024-08-14 12:43:19,225 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 22 from LS+wenet, 21 from Vox, 49 fro AS 2024-08-14 12:43:21,873 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 5100, loss[loss=0.07744, beats_loss=0.01472, ecapa_loss=0.0001539, whisper_loss=0.06118, over 13743.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01093, ecapa_loss=0.0001567, whisper_loss=0.08932, over 3873453.61 frames. ], batch size: 59, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:43:24,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2659520.0, ans=0.125 2024-08-14 12:43:32,126 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.756e+01 2.354e+01 2.635e+01 2.968e+01 4.253e+01, threshold=5.269e+01, percent-clipped=0.0 2024-08-14 12:43:33,965 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 22 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-14 12:43:40,488 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.80 vs. limit=22.5 2024-08-14 12:44:00,336 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 36 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-14 12:44:07,680 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 24 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-14 12:44:15,296 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 24 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-14 12:44:26,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2659920.0, ans=0.0 2024-08-14 12:44:30,165 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 29 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-14 12:44:36,138 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 5150, loss[loss=0.1098, beats_loss=0.01111, ecapa_loss=0.0001467, whisper_loss=0.09718, over 23259.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01092, ecapa_loss=0.0001554, whisper_loss=0.08943, over 3870845.71 frames. ], batch size: 92, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:44:36,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2660020.0, ans=0.125 2024-08-14 12:44:43,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2660020.0, ans=0.1 2024-08-14 12:44:49,760 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2660120.0, ans=0.125 2024-08-14 12:44:51,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2660120.0, ans=0.125 2024-08-14 12:44:51,738 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2660120.0, ans=0.07 2024-08-14 12:44:53,440 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2660120.0, ans=0.0 2024-08-14 12:44:59,208 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-14 12:45:36,209 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.22 vs. limit=22.5 2024-08-14 12:45:38,373 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 18 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-14 12:45:41,891 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2660420.0, ans=0.125 2024-08-14 12:45:51,585 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 5200, loss[loss=0.1003, beats_loss=0.009824, ecapa_loss=0.0001454, whisper_loss=0.08898, over 19556.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0109, ecapa_loss=0.0001557, whisper_loss=0.08955, over 3837803.19 frames. ], batch size: 76, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:45:53,585 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.337e+00 2024-08-14 12:45:56,953 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2660520.0, ans=0.125 2024-08-14 12:46:02,486 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.363e+01 2.791e+01 3.410e+01 2.422e+02, threshold=5.583e+01, percent-clipped=4.0 2024-08-14 12:46:15,548 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2660620.0, ans=0.125 2024-08-14 12:46:16,635 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 17 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-14 12:46:27,006 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 20 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-14 12:46:37,728 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.48 vs. limit=15.0 2024-08-14 12:46:45,208 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.75 vs. limit=15.0 2024-08-14 12:46:47,996 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.44 vs. limit=22.5 2024-08-14 12:47:05,264 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 23 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-14 12:47:06,566 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 5250, loss[loss=0.1032, beats_loss=0.00872, ecapa_loss=0.0001704, whisper_loss=0.09277, over 18457.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01092, ecapa_loss=0.0001554, whisper_loss=0.08931, over 3817425.98 frames. ], batch size: 74, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:47:18,315 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 19 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-14 12:47:48,660 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 20 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-14 12:47:56,724 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2661320.0, ans=0.125 2024-08-14 12:47:58,405 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2661320.0, ans=0.07 2024-08-14 12:48:00,178 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=2661320.0, ans=10.0 2024-08-14 12:48:01,704 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2661320.0, ans=0.0 2024-08-14 12:48:07,362 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 20 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-14 12:48:10,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2661420.0, ans=0.1 2024-08-14 12:48:15,946 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 36 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-14 12:48:20,122 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 5300, loss[loss=0.08893, beats_loss=0.01164, ecapa_loss=0.000216, whisper_loss=0.07513, over 13089.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01087, ecapa_loss=0.0001552, whisper_loss=0.09026, over 3828527.20 frames. ], batch size: 59, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:48:22,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2661520.0, ans=0.125 2024-08-14 12:48:24,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2661520.0, ans=0.0 2024-08-14 12:48:29,728 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.879e+01 2.292e+01 2.528e+01 2.841e+01 9.142e+01, threshold=5.056e+01, percent-clipped=1.0 2024-08-14 12:48:48,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2661720.0, ans=0.0 2024-08-14 12:48:56,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2661720.0, ans=0.125 2024-08-14 12:49:02,796 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-14 12:49:17,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2661920.0, ans=0.125 2024-08-14 12:49:33,640 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 5350, loss[loss=0.0911, beats_loss=0.0106, ecapa_loss=0.0001225, whisper_loss=0.07927, over 19544.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01084, ecapa_loss=0.0001541, whisper_loss=0.08975, over 3838434.64 frames. ], batch size: 75, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:49:37,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2662020.0, ans=0.1 2024-08-14 12:49:45,725 WARNING [optim.py:496] (0/4) Scaling gradients by 0.07321541011333466, model_norm_threshold=50.560279846191406 2024-08-14 12:49:45,899 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.720e+04, grad_sumsq=6.720e+04, orig_rms_sq=1.000e+00 2024-08-14 12:49:47,073 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.75 vs. limit=22.5 2024-08-14 12:50:11,870 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 22 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-14 12:50:26,862 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 25 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-14 12:50:48,794 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 5400, loss[loss=0.119, beats_loss=0.0102, ecapa_loss=0.0001746, whisper_loss=0.1071, over 21482.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01075, ecapa_loss=0.0001542, whisper_loss=0.08997, over 3818775.74 frames. ], batch size: 87, lr: 3.30e-03, grad_scale: 5.764607523034235e+17 2024-08-14 12:50:58,338 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.993e+01 2.310e+01 2.504e+01 2.679e+01 6.906e+02, threshold=5.009e+01, percent-clipped=1.0 2024-08-14 12:51:12,023 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=14.17 vs. limit=15.0 2024-08-14 12:51:17,153 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 25 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-14 12:51:43,810 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 22 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-14 12:51:46,782 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 11 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-14 12:51:55,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2662920.0, ans=0.125 2024-08-14 12:52:00,848 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 5450, loss[loss=0.09183, beats_loss=0.01264, ecapa_loss=0.0001266, whisper_loss=0.07792, over 22735.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01072, ecapa_loss=0.0001544, whisper_loss=0.09042, over 3805098.08 frames. ], batch size: 90, lr: 3.30e-03, grad_scale: 5.764607523034235e+17 2024-08-14 12:52:10,296 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2663020.0, ans=0.125 2024-08-14 12:52:16,137 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2663120.0, ans=0.05 2024-08-14 12:52:23,622 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.87 vs. limit=22.5 2024-08-14 12:52:46,581 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2663320.0, ans=0.2 2024-08-14 12:52:49,535 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 17 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-14 12:53:03,585 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2663420.0, ans=0.125 2024-08-14 12:53:06,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2663420.0, ans=0.2 2024-08-14 12:53:14,789 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 5500, loss[loss=0.1197, beats_loss=0.01112, ecapa_loss=0.0001705, whisper_loss=0.1068, over 22821.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01074, ecapa_loss=0.0001546, whisper_loss=0.09065, over 3851308.36 frames. ], batch size: 91, lr: 3.30e-03, grad_scale: 5.764607523034235e+17 2024-08-14 12:53:24,845 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.674e+01 2.510e+01 2.706e+01 3.048e+01 6.260e+01, threshold=5.412e+01, percent-clipped=1.0 2024-08-14 12:53:29,826 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2663620.0, ans=0.125 2024-08-14 12:53:50,565 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-14 12:53:50,819 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2663720.0, ans=0.125 2024-08-14 12:53:55,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2663720.0, ans=0.125 2024-08-14 12:54:01,071 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2663820.0, ans=0.0 2024-08-14 12:54:05,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2663820.0, ans=0.05 2024-08-14 12:54:13,413 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.86 vs. limit=15.0 2024-08-14 12:54:28,854 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 5550, loss[loss=0.09966, beats_loss=0.01106, ecapa_loss=0.0001722, whisper_loss=0.08687, over 21882.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01082, ecapa_loss=0.000156, whisper_loss=0.09054, over 3875660.56 frames. ], batch size: 90, lr: 3.30e-03, grad_scale: 5.764607523034235e+17 2024-08-14 12:54:49,919 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 19 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-14 12:55:00,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2664220.0, ans=0.1 2024-08-14 12:55:03,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2664220.0, ans=0.0 2024-08-14 12:55:12,379 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 22 from LS+wenet, 34 from Vox, 34 fro AS 2024-08-14 12:55:17,076 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2664320.0, ans=0.125 2024-08-14 12:55:20,285 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.25 vs. limit=10.0 2024-08-14 12:55:30,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2664420.0, ans=0.1 2024-08-14 12:55:40,917 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 13 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-14 12:55:43,911 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 5600, loss[loss=0.09292, beats_loss=0.01225, ecapa_loss=0.0001247, whisper_loss=0.07942, over 21096.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01083, ecapa_loss=0.0001564, whisper_loss=0.09015, over 3882653.65 frames. ], batch size: 83, lr: 3.30e-03, grad_scale: 5.764607523034235e+17 2024-08-14 12:55:47,993 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.86 vs. limit=15.0 2024-08-14 12:55:48,727 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 18 from LS+wenet, 21 from Vox, 15 fro AS 2024-08-14 12:55:54,384 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.335e+01 2.676e+01 3.034e+01 3.132e+02, threshold=5.352e+01, percent-clipped=2.0 2024-08-14 12:56:09,429 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2664620.0, ans=0.125 2024-08-14 12:56:19,461 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 30 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-14 12:56:57,463 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 5650, loss[loss=0.1263, beats_loss=0.009213, ecapa_loss=0.0001964, whisper_loss=0.1151, over 19434.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01078, ecapa_loss=0.000156, whisper_loss=0.09082, over 3894661.86 frames. ], batch size: 78, lr: 3.30e-03, grad_scale: 5.764607523034235e+17 2024-08-14 12:57:00,834 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 28 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-14 12:57:06,944 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-14 12:57:07,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2665020.0, ans=0.0 2024-08-14 12:57:28,923 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2665220.0, ans=0.1 2024-08-14 12:57:30,854 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.04 vs. limit=15.0 2024-08-14 12:57:34,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2665220.0, ans=0.125 2024-08-14 12:57:38,527 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2665220.0, ans=0.125 2024-08-14 12:57:38,591 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2665220.0, ans=0.0 2024-08-14 12:57:47,984 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 12:57:48,187 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.92 vs. limit=15.0 2024-08-14 12:57:50,289 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 20 from LS+wenet, 30 from Vox, 41 fro AS 2024-08-14 12:57:54,040 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.30 vs. limit=6.0 2024-08-14 12:58:04,626 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 35 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-14 12:58:10,354 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 5700, loss[loss=0.1004, beats_loss=0.01035, ecapa_loss=0.0001501, whisper_loss=0.0885, over 17368.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0108, ecapa_loss=0.000157, whisper_loss=0.09046, over 3923497.99 frames. ], batch size: 68, lr: 3.30e-03, grad_scale: 5.764607523034235e+17 2024-08-14 12:58:20,804 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.840e+01 2.440e+01 2.658e+01 3.007e+01 5.166e+01, threshold=5.317e+01, percent-clipped=0.0 2024-08-14 12:58:27,218 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 36 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-14 12:58:45,728 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.00 vs. limit=12.0 2024-08-14 12:58:51,128 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 19 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-14 12:59:21,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2665920.0, ans=0.0 2024-08-14 12:59:25,381 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 5750, loss[loss=0.1054, beats_loss=0.0105, ecapa_loss=0.0001706, whisper_loss=0.09318, over 20646.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01083, ecapa_loss=0.0001575, whisper_loss=0.09027, over 3900461.62 frames. ], batch size: 82, lr: 3.30e-03, grad_scale: 5.764607523034235e+17 2024-08-14 12:59:38,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2666020.0, ans=0.0 2024-08-14 12:59:52,391 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.14 vs. limit=22.5 2024-08-14 13:00:00,332 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 14 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-14 13:00:00,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2666220.0, ans=0.0 2024-08-14 13:00:06,435 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2666220.0, ans=0.1 2024-08-14 13:00:09,529 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2666320.0, ans=0.125 2024-08-14 13:00:15,628 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.70 vs. limit=12.0 2024-08-14 13:00:40,040 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 5800, loss[loss=0.09692, beats_loss=0.01148, ecapa_loss=0.0001336, whisper_loss=0.08411, over 19405.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01072, ecapa_loss=0.0001572, whisper_loss=0.09017, over 3843824.43 frames. ], batch size: 77, lr: 3.30e-03, grad_scale: 5.764607523034235e+17 2024-08-14 13:00:50,347 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.741e+01 2.337e+01 2.671e+01 3.010e+01 5.088e+01, threshold=5.343e+01, percent-clipped=0.0 2024-08-14 13:01:13,269 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 24 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-14 13:01:21,327 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2666720.0, ans=0.1 2024-08-14 13:01:24,166 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2666820.0, ans=10.0 2024-08-14 13:01:25,341 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 22 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-14 13:01:46,863 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.93 vs. limit=5.0 2024-08-14 13:01:47,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2666920.0, ans=0.1 2024-08-14 13:01:54,793 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 5850, loss[loss=0.07281, beats_loss=0.01587, ecapa_loss=8.893e-05, whisper_loss=0.05605, over 16428.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0107, ecapa_loss=0.0001575, whisper_loss=0.0908, over 3871946.92 frames. ], batch size: 63, lr: 3.30e-03, grad_scale: 5.764607523034235e+17 2024-08-14 13:01:59,359 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-14 13:02:07,652 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 13:02:09,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2667120.0, ans=0.125 2024-08-14 13:02:21,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2667120.0, ans=0.0 2024-08-14 13:02:21,638 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.23 vs. limit=22.5 2024-08-14 13:02:31,555 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 16 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-14 13:02:45,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2667320.0, ans=0.0 2024-08-14 13:02:51,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2667320.0, ans=0.0 2024-08-14 13:02:58,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2667420.0, ans=0.0 2024-08-14 13:03:01,648 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 31 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-14 13:03:08,961 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 5900, loss[loss=0.09461, beats_loss=0.0117, ecapa_loss=0.0001579, whisper_loss=0.08133, over 23312.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01068, ecapa_loss=0.0001573, whisper_loss=0.09052, over 3889327.60 frames. ], batch size: 96, lr: 3.30e-03, grad_scale: 5.764607523034235e+17 2024-08-14 13:03:10,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2667520.0, ans=0.035 2024-08-14 13:03:13,810 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2667520.0, ans=0.0 2024-08-14 13:03:18,977 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.745e+01 2.335e+01 2.608e+01 2.996e+01 4.185e+01, threshold=5.216e+01, percent-clipped=0.0 2024-08-14 13:03:20,865 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-14 13:03:26,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2667620.0, ans=0.125 2024-08-14 13:03:30,026 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2667620.0, ans=0.0 2024-08-14 13:03:42,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2667720.0, ans=0.125 2024-08-14 13:03:49,015 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 23 from LS+wenet, 32 from Vox, 31 fro AS 2024-08-14 13:04:00,965 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-14 13:04:11,498 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2667920.0, ans=0.0 2024-08-14 13:04:11,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2667920.0, ans=0.0 2024-08-14 13:04:12,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2667920.0, ans=0.125 2024-08-14 13:04:14,263 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 21 from LS+wenet, 12 from Vox, 21 fro AS 2024-08-14 13:04:20,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2667920.0, ans=0.125 2024-08-14 13:04:22,601 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 5950, loss[loss=0.1065, beats_loss=0.01129, ecapa_loss=0.000154, whisper_loss=0.0937, over 19741.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01076, ecapa_loss=0.0001574, whisper_loss=0.09011, over 3921368.16 frames. ], batch size: 80, lr: 3.30e-03, grad_scale: 5.764607523034235e+17 2024-08-14 13:04:49,924 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 15 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-14 13:04:50,542 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.19 vs. limit=15.0 2024-08-14 13:04:53,218 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2668220.0, ans=0.125 2024-08-14 13:04:57,751 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 36 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-14 13:05:20,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2668320.0, ans=0.1 2024-08-14 13:05:30,755 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.13 vs. limit=15.0 2024-08-14 13:05:37,223 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 6000, loss[loss=0.121, beats_loss=0.01021, ecapa_loss=0.0002182, whisper_loss=0.1086, over 22978.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0108, ecapa_loss=0.0001572, whisper_loss=0.08997, over 3924099.95 frames. ], batch size: 94, lr: 3.30e-03, grad_scale: 5.764607523034235e+17 2024-08-14 13:05:37,225 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-14 13:06:13,765 INFO [train_multi_KD3.py:1149] (0/4) Epoch 19, validation on ASR_libri: loss=0.253, beats_loss=0, ecapa_loss=0.000548, whisper_loss=0.2476, over 922467.00 frames. 2024-08-14 13:06:28,008 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.8655, 3.5723, 3.8740, 3.6920], device='cuda:0') 2024-08-14 13:06:29,971 INFO [train_multi_KD3.py:1149] (0/4) Epoch 19, validation on SV_voxceleb1: loss=0.004318, beats_loss=0, ecapa_loss=0.0004318, whisper_loss=0, over 939242.00 frames. 2024-08-14 13:08:03,295 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.9049, 2.6972, 2.6253, 2.5906], device='cuda:0') 2024-08-14 13:08:18,953 INFO [train_multi_KD3.py:1149] (0/4) Epoch 19, validation on AT_audioset: loss=0.02353, beats_loss=0.02353, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 13:08:18,958 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32601MB 2024-08-14 13:08:29,206 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.205e+01 2.512e+01 2.812e+01 4.887e+01, threshold=5.023e+01, percent-clipped=0.0 2024-08-14 13:08:41,279 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 23 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-14 13:08:55,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2668720.0, ans=0.125 2024-08-14 13:09:01,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2668720.0, ans=0.125 2024-08-14 13:09:04,242 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2668820.0, ans=0.125 2024-08-14 13:09:07,089 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 30 from LS+wenet, 32 from Vox, 33 fro AS 2024-08-14 13:09:07,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2668820.0, ans=0.125 2024-08-14 13:09:10,763 WARNING [optim.py:496] (0/4) Scaling gradients by 0.0662800669670105, model_norm_threshold=50.23476028442383 2024-08-14 13:09:10,948 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.20, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.127e+05, grad_sumsq=1.141e+07, orig_rms_sq=9.876e-03 2024-08-14 13:09:14,889 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 26 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-14 13:09:25,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2668920.0, ans=0.0 2024-08-14 13:09:30,575 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.74 vs. limit=15.0 2024-08-14 13:09:32,970 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 24 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-14 13:09:33,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2669020.0, ans=0.0 2024-08-14 13:09:34,090 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 6050, loss[loss=0.1154, beats_loss=0.00712, ecapa_loss=0.0002031, whisper_loss=0.1062, over 15922.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01078, ecapa_loss=0.0001576, whisper_loss=0.09014, over 3896857.32 frames. ], batch size: 63, lr: 3.29e-03, grad_scale: 5.764607523034235e+17 2024-08-14 13:09:39,131 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2669020.0, ans=0.1 2024-08-14 13:09:47,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2669120.0, ans=0.125 2024-08-14 13:09:56,750 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-14 13:09:56,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2669120.0, ans=0.125 2024-08-14 13:10:01,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2669120.0, ans=0.0 2024-08-14 13:10:06,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2669220.0, ans=0.125 2024-08-14 13:10:15,135 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2669220.0, ans=0.125 2024-08-14 13:10:23,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2669320.0, ans=0.125 2024-08-14 13:10:46,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=2669420.0, ans=0.5 2024-08-14 13:10:48,939 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 6100, loss[loss=0.09815, beats_loss=0.01072, ecapa_loss=0.000179, whisper_loss=0.08564, over 18116.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01086, ecapa_loss=0.0001566, whisper_loss=0.08945, over 3898904.51 frames. ], batch size: 73, lr: 3.29e-03, grad_scale: 5.764607523034235e+17 2024-08-14 13:10:52,367 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2669520.0, ans=0.125 2024-08-14 13:10:58,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2669520.0, ans=0.0 2024-08-14 13:10:59,185 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.993e+01 2.399e+01 2.783e+01 3.218e+01 7.579e+02, threshold=5.567e+01, percent-clipped=5.0 2024-08-14 13:11:01,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2669520.0, ans=0.0 2024-08-14 13:11:49,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2669920.0, ans=0.1 2024-08-14 13:11:52,524 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 21 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-14 13:11:59,109 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2669920.0, ans=0.0 2024-08-14 13:12:04,175 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 6150, loss[loss=0.09802, beats_loss=0.01001, ecapa_loss=0.0001622, whisper_loss=0.08638, over 22426.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01076, ecapa_loss=0.0001578, whisper_loss=0.09013, over 3919258.14 frames. ], batch size: 92, lr: 3.29e-03, grad_scale: 5.764607523034235e+17 2024-08-14 13:12:07,549 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2670020.0, ans=0.125 2024-08-14 13:12:20,480 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-14 13:12:28,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2670120.0, ans=0.0 2024-08-14 13:12:32,088 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 23 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-14 13:12:44,290 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 23 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-14 13:13:16,233 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.33 vs. limit=15.0 2024-08-14 13:13:18,093 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 6200, loss[loss=0.0979, beats_loss=0.01163, ecapa_loss=0.0001624, whisper_loss=0.08464, over 17852.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01075, ecapa_loss=0.0001573, whisper_loss=0.09039, over 3920560.11 frames. ], batch size: 72, lr: 3.29e-03, grad_scale: 5.764607523034235e+17 2024-08-14 13:13:28,518 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.740e+01 2.359e+01 2.589e+01 2.919e+01 1.541e+02, threshold=5.179e+01, percent-clipped=2.0 2024-08-14 13:13:33,693 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-14 13:13:33,876 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2670620.0, ans=0.125 2024-08-14 13:13:33,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2670620.0, ans=0.1 2024-08-14 13:13:45,585 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2670620.0, ans=0.125 2024-08-14 13:13:46,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2670720.0, ans=0.125 2024-08-14 13:14:00,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2670720.0, ans=0.0 2024-08-14 13:14:01,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2670820.0, ans=0.125 2024-08-14 13:14:06,432 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.17 vs. limit=22.5 2024-08-14 13:14:13,946 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.80 vs. limit=22.5 2024-08-14 13:14:15,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2670820.0, ans=0.125 2024-08-14 13:14:21,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2670920.0, ans=0.125 2024-08-14 13:14:27,726 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 23 from LS+wenet, 31 from Vox, 38 fro AS 2024-08-14 13:14:32,017 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 6250, loss[loss=0.1085, beats_loss=0.01221, ecapa_loss=0.0001387, whisper_loss=0.09487, over 22563.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01064, ecapa_loss=0.0001576, whisper_loss=0.09118, over 3906646.52 frames. ], batch size: 90, lr: 3.29e-03, grad_scale: 5.764607523034235e+17 2024-08-14 13:14:36,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2671020.0, ans=0.0 2024-08-14 13:14:45,746 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 28 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-14 13:14:57,358 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 14 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-14 13:15:00,474 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-14 13:15:01,899 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 27 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-14 13:15:14,341 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2671220.0, ans=0.0 2024-08-14 13:15:24,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2671320.0, ans=0.125 2024-08-14 13:15:34,925 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.50 vs. limit=22.5 2024-08-14 13:15:37,422 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2671420.0, ans=0.125 2024-08-14 13:15:45,363 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 6300, loss[loss=0.1076, beats_loss=0.009893, ecapa_loss=0.0001729, whisper_loss=0.09597, over 22495.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01055, ecapa_loss=0.000159, whisper_loss=0.09144, over 3876171.14 frames. ], batch size: 92, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:15:51,632 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 25 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-14 13:15:53,377 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 13:15:57,178 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.285e+01 2.511e+01 2.818e+01 8.993e+01, threshold=5.023e+01, percent-clipped=1.0 2024-08-14 13:16:02,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2671620.0, ans=0.1 2024-08-14 13:16:04,592 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 12 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-14 13:16:18,178 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 22 from LS+wenet, 17 from Vox, 51 fro AS 2024-08-14 13:16:24,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2671720.0, ans=0.1 2024-08-14 13:16:28,671 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 33 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-14 13:16:38,969 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 18 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-14 13:16:39,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2671820.0, ans=0.125 2024-08-14 13:16:42,122 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 21 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-14 13:16:42,320 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2671820.0, ans=0.125 2024-08-14 13:16:46,636 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 21 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-14 13:17:00,137 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 6350, loss[loss=0.09947, beats_loss=0.008452, ecapa_loss=0.0001618, whisper_loss=0.0894, over 19879.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01068, ecapa_loss=0.0001572, whisper_loss=0.09061, over 3864404.51 frames. ], batch size: 79, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:17:06,528 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 22 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-14 13:17:07,675 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.60 vs. limit=12.0 2024-08-14 13:17:21,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2672120.0, ans=0.125 2024-08-14 13:17:32,279 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 20 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-14 13:17:48,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2672320.0, ans=0.1 2024-08-14 13:17:50,125 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-14 13:18:01,433 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-14 13:18:09,017 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 25 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-14 13:18:14,266 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 6400, loss[loss=0.1061, beats_loss=0.0102, ecapa_loss=0.0001719, whisper_loss=0.09418, over 21829.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01073, ecapa_loss=0.0001558, whisper_loss=0.09051, over 3878180.80 frames. ], batch size: 90, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:18:14,499 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 33 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-14 13:18:24,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2672520.0, ans=0.125 2024-08-14 13:18:25,756 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.913e+01 2.344e+01 2.584e+01 2.860e+01 4.850e+01, threshold=5.168e+01, percent-clipped=0.0 2024-08-14 13:18:35,229 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.22 vs. limit=22.5 2024-08-14 13:18:42,540 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2672720.0, ans=0.0 2024-08-14 13:19:05,760 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 20 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 13:19:20,572 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 20 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-14 13:19:28,489 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 6450, loss[loss=0.1024, beats_loss=0.01096, ecapa_loss=0.0001876, whisper_loss=0.0896, over 21143.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01075, ecapa_loss=0.0001556, whisper_loss=0.0907, over 3896857.44 frames. ], batch size: 87, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:19:54,196 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.50 vs. limit=10.0 2024-08-14 13:19:56,546 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 22 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 13:20:08,242 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-14 13:20:23,075 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 24 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-14 13:20:30,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2673420.0, ans=0.015 2024-08-14 13:20:33,518 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 29 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-14 13:20:36,302 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-14 13:20:41,399 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 6500, loss[loss=0.08515, beats_loss=0.0123, ecapa_loss=0.0001554, whisper_loss=0.07129, over 21585.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01074, ecapa_loss=0.0001559, whisper_loss=0.09115, over 3917694.61 frames. ], batch size: 93, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:20:48,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2673520.0, ans=0.125 2024-08-14 13:20:53,345 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.410e+01 2.661e+01 2.982e+01 1.028e+02, threshold=5.322e+01, percent-clipped=1.0 2024-08-14 13:21:13,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2673720.0, ans=0.125 2024-08-14 13:21:20,393 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 20 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-14 13:21:51,332 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 19 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-14 13:21:55,643 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 6550, loss[loss=0.1171, beats_loss=0.01124, ecapa_loss=0.0001349, whisper_loss=0.1046, over 19199.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01077, ecapa_loss=0.0001562, whisper_loss=0.09108, over 3940190.60 frames. ], batch size: 76, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:21:57,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2674020.0, ans=0.1 2024-08-14 13:22:00,444 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2674020.0, ans=0.2 2024-08-14 13:22:48,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2674320.0, ans=0.125 2024-08-14 13:22:51,909 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.04 vs. limit=22.5 2024-08-14 13:22:52,337 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.33 vs. limit=8.0 2024-08-14 13:23:00,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2674420.0, ans=0.0 2024-08-14 13:23:02,738 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 23 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-14 13:23:05,997 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2674420.0, ans=0.0 2024-08-14 13:23:07,805 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2674520.0, ans=0.125 2024-08-14 13:23:08,623 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 6600, loss[loss=0.1017, beats_loss=0.01259, ecapa_loss=0.0001246, whisper_loss=0.08791, over 22932.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01076, ecapa_loss=0.0001564, whisper_loss=0.09163, over 3963700.67 frames. ], batch size: 89, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:23:14,953 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2674520.0, ans=0.125 2024-08-14 13:23:19,222 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 24 from LS+wenet, 24 from Vox, 20 fro AS 2024-08-14 13:23:20,871 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.120e+01 2.440e+01 2.726e+01 3.181e+01 5.119e+01, threshold=5.452e+01, percent-clipped=0.0 2024-08-14 13:23:21,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2674520.0, ans=0.0 2024-08-14 13:23:22,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2674620.0, ans=0.125 2024-08-14 13:23:27,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2674620.0, ans=0.2 2024-08-14 13:23:31,377 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-14 13:23:35,548 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-14 13:23:52,251 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 25 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-14 13:23:55,461 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2674820.0, ans=0.0 2024-08-14 13:23:58,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2674820.0, ans=0.0 2024-08-14 13:23:59,685 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2674820.0, ans=0.125 2024-08-14 13:24:01,459 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2674820.0, ans=0.0 2024-08-14 13:24:13,164 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-14 13:24:21,644 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 6650, loss[loss=0.111, beats_loss=0.008128, ecapa_loss=0.0001591, whisper_loss=0.1013, over 20886.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01071, ecapa_loss=0.0001568, whisper_loss=0.09176, over 3973204.06 frames. ], batch size: 80, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:24:30,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2675020.0, ans=0.2 2024-08-14 13:24:32,737 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 35 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-14 13:24:35,067 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.04 vs. limit=15.0 2024-08-14 13:24:38,558 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 22 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-14 13:24:48,497 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 23 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-14 13:24:48,834 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2675120.0, ans=0.125 2024-08-14 13:24:50,049 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-14 13:25:00,762 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2675220.0, ans=0.125 2024-08-14 13:25:09,329 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 14 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-14 13:25:13,040 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.99 vs. limit=15.0 2024-08-14 13:25:21,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2675420.0, ans=0.125 2024-08-14 13:25:29,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2675420.0, ans=0.125 2024-08-14 13:25:35,597 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 6700, loss[loss=0.09714, beats_loss=0.0122, ecapa_loss=0.0001473, whisper_loss=0.08347, over 21723.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01069, ecapa_loss=0.0001569, whisper_loss=0.09186, over 3945523.75 frames. ], batch size: 91, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:25:47,503 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.972e+01 2.392e+01 2.630e+01 2.889e+01 1.018e+02, threshold=5.259e+01, percent-clipped=2.0 2024-08-14 13:26:01,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2675620.0, ans=0.0 2024-08-14 13:26:17,945 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.60 vs. limit=15.0 2024-08-14 13:26:19,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2675820.0, ans=0.2 2024-08-14 13:26:23,435 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2675820.0, ans=0.125 2024-08-14 13:26:29,417 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2675820.0, ans=0.125 2024-08-14 13:26:46,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2675920.0, ans=0.125 2024-08-14 13:26:49,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2676020.0, ans=0.04949747468305833 2024-08-14 13:26:49,832 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 6750, loss[loss=0.09636, beats_loss=0.00895, ecapa_loss=0.0002205, whisper_loss=0.08521, over 18061.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01072, ecapa_loss=0.0001556, whisper_loss=0.09134, over 3923280.82 frames. ], batch size: 73, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:27:25,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2676220.0, ans=0.125 2024-08-14 13:27:47,917 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.10 vs. limit=22.5 2024-08-14 13:28:02,262 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 6800, loss[loss=0.1147, beats_loss=0.01054, ecapa_loss=0.0001379, whisper_loss=0.1028, over 23079.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01061, ecapa_loss=0.0001566, whisper_loss=0.09191, over 3920337.61 frames. ], batch size: 90, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:28:07,522 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2676520.0, ans=0.125 2024-08-14 13:28:14,526 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.378e+01 2.676e+01 3.043e+01 8.013e+01, threshold=5.353e+01, percent-clipped=1.0 2024-08-14 13:28:16,758 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2676620.0, ans=0.0 2024-08-14 13:28:19,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2676620.0, ans=0.125 2024-08-14 13:28:19,738 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2676620.0, ans=0.2 2024-08-14 13:28:26,619 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.15 vs. limit=15.0 2024-08-14 13:28:31,343 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.22 vs. limit=15.0 2024-08-14 13:28:33,580 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 24 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-14 13:28:56,767 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 24 from LS+wenet, 20 from Vox, 49 fro AS 2024-08-14 13:29:05,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2676920.0, ans=0.125 2024-08-14 13:29:11,600 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2676920.0, ans=0.0 2024-08-14 13:29:17,016 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 6850, loss[loss=0.09608, beats_loss=0.01058, ecapa_loss=0.0001525, whisper_loss=0.08398, over 14990.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01061, ecapa_loss=0.0001566, whisper_loss=0.09179, over 3893785.23 frames. ], batch size: 58, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:29:46,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2677220.0, ans=0.125 2024-08-14 13:29:50,830 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2677220.0, ans=0.2 2024-08-14 13:29:51,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2677220.0, ans=0.125 2024-08-14 13:29:53,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2677220.0, ans=0.05 2024-08-14 13:30:13,021 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 23 from LS+wenet, 12 from Vox, 21 fro AS 2024-08-14 13:30:15,983 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 35 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-14 13:30:19,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2677420.0, ans=0.07 2024-08-14 13:30:28,596 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 6900, loss[loss=0.1174, beats_loss=0.01067, ecapa_loss=0.0001453, whisper_loss=0.1053, over 21270.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01064, ecapa_loss=0.0001556, whisper_loss=0.0921, over 3884612.32 frames. ], batch size: 85, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:30:32,870 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 26 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-14 13:30:33,466 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.97 vs. limit=6.0 2024-08-14 13:30:39,659 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.298e+01 2.502e+01 2.840e+01 6.631e+01, threshold=5.005e+01, percent-clipped=1.0 2024-08-14 13:30:50,327 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2677620.0, ans=0.125 2024-08-14 13:30:58,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2677720.0, ans=0.125 2024-08-14 13:30:59,915 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2677720.0, ans=0.2 2024-08-14 13:31:06,410 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-14 13:31:09,404 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 26 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-14 13:31:16,836 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2677820.0, ans=0.1 2024-08-14 13:31:25,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2677920.0, ans=0.1 2024-08-14 13:31:39,221 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 6950, loss[loss=0.1175, beats_loss=0.01009, ecapa_loss=0.0001696, whisper_loss=0.1058, over 13733.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01068, ecapa_loss=0.0001565, whisper_loss=0.0913, over 3880907.35 frames. ], batch size: 57, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:31:50,679 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 15 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-14 13:31:52,682 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2678120.0, ans=0.125 2024-08-14 13:31:53,769 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 30 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-14 13:31:57,760 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=2678120.0, ans=15.0 2024-08-14 13:32:02,434 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 32 from Vox, 31 fro AS 2024-08-14 13:32:02,652 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=2678120.0, ans=0.1 2024-08-14 13:32:11,584 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2678220.0, ans=0.125 2024-08-14 13:32:25,839 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 26 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-14 13:32:30,104 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 16 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-14 13:32:30,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2678320.0, ans=0.0 2024-08-14 13:32:40,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2678420.0, ans=0.125 2024-08-14 13:32:41,343 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 21 from LS+wenet, 14 from Vox, 18 fro AS 2024-08-14 13:32:50,673 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 7000, loss[loss=0.09689, beats_loss=0.01122, ecapa_loss=0.0001626, whisper_loss=0.08405, over 22138.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01069, ecapa_loss=0.0001565, whisper_loss=0.09111, over 3848235.84 frames. ], batch size: 91, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:33:01,972 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.701e+01 2.255e+01 2.474e+01 2.854e+01 4.338e+01, threshold=4.947e+01, percent-clipped=0.0 2024-08-14 13:33:11,026 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 23 from LS+wenet, 11 from Vox, 34 fro AS 2024-08-14 13:33:15,288 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 36 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-14 13:33:20,480 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 26 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-14 13:33:20,711 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2678720.0, ans=0.0 2024-08-14 13:33:20,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2678720.0, ans=0.1 2024-08-14 13:33:32,538 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2678820.0, ans=0.0 2024-08-14 13:33:33,974 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2678820.0, ans=0.0 2024-08-14 13:33:39,406 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 25 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-14 13:33:49,300 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-14 13:33:52,435 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2678920.0, ans=0.0 2024-08-14 13:34:01,702 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 7050, loss[loss=0.1204, beats_loss=0.008854, ecapa_loss=0.0001605, whisper_loss=0.1099, over 22224.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0107, ecapa_loss=0.0001562, whisper_loss=0.09112, over 3849372.28 frames. ], batch size: 91, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:34:04,540 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 25 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-14 13:34:13,458 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2679020.0, ans=0.125 2024-08-14 13:34:13,928 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.78 vs. limit=15.0 2024-08-14 13:34:14,583 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 29 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-14 13:34:17,394 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 25 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-14 13:34:19,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2679120.0, ans=0.1 2024-08-14 13:34:43,729 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2679320.0, ans=0.125 2024-08-14 13:35:03,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=2679420.0, ans=0.95 2024-08-14 13:35:06,657 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2679420.0, ans=0.125 2024-08-14 13:35:11,326 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.17 vs. limit=15.0 2024-08-14 13:35:13,262 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 7100, loss[loss=0.1129, beats_loss=0.008915, ecapa_loss=0.0001613, whisper_loss=0.1024, over 22414.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01069, ecapa_loss=0.0001544, whisper_loss=0.09091, over 3833646.80 frames. ], batch size: 91, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:35:20,825 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-14 13:35:24,634 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.302e+01 2.502e+01 2.737e+01 3.925e+01, threshold=5.005e+01, percent-clipped=0.0 2024-08-14 13:35:25,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2679520.0, ans=0.5 2024-08-14 13:35:41,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2679720.0, ans=0.2 2024-08-14 13:35:46,647 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.45 vs. limit=22.5 2024-08-14 13:35:47,914 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2679720.0, ans=0.07 2024-08-14 13:35:47,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2679720.0, ans=0.09899494936611666 2024-08-14 13:35:56,407 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2679820.0, ans=0.1 2024-08-14 13:36:22,604 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-268000.pt 2024-08-14 13:36:30,077 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 7150, loss[loss=0.1039, beats_loss=0.009521, ecapa_loss=0.0001766, whisper_loss=0.09265, over 16133.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01073, ecapa_loss=0.0001547, whisper_loss=0.09072, over 3836041.24 frames. ], batch size: 66, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:36:33,211 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2680020.0, ans=0.125 2024-08-14 13:36:44,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2680020.0, ans=0.0 2024-08-14 13:36:52,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2680120.0, ans=0.125 2024-08-14 13:37:14,491 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2680220.0, ans=0.0 2024-08-14 13:37:25,217 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2680320.0, ans=0.125 2024-08-14 13:37:27,023 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.97 vs. limit=15.0 2024-08-14 13:37:34,059 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-14 13:37:34,356 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2680320.0, ans=0.2 2024-08-14 13:37:35,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2680320.0, ans=0.125 2024-08-14 13:37:41,238 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 28 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-14 13:37:43,069 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2680420.0, ans=0.0 2024-08-14 13:37:51,633 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 24 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-14 13:37:52,777 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 7200, loss[loss=0.1046, beats_loss=0.01, ecapa_loss=0.0001504, whisper_loss=0.09311, over 19642.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01068, ecapa_loss=0.0001564, whisper_loss=0.09069, over 3856203.04 frames. ], batch size: 75, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:38:03,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2680520.0, ans=0.0 2024-08-14 13:38:04,359 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.787e+01 2.341e+01 2.648e+01 2.948e+01 9.250e+01, threshold=5.295e+01, percent-clipped=2.0 2024-08-14 13:38:07,591 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 21 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-14 13:38:21,772 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-14 13:38:24,631 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2680720.0, ans=0.05 2024-08-14 13:38:26,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2680720.0, ans=0.125 2024-08-14 13:38:28,766 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2680720.0, ans=0.125 2024-08-14 13:38:46,192 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2680820.0, ans=0.125 2024-08-14 13:38:47,840 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.70 vs. limit=15.0 2024-08-14 13:38:56,063 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-14 13:39:00,456 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 26 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-14 13:39:07,578 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 7250, loss[loss=0.09478, beats_loss=0.01087, ecapa_loss=0.0001412, whisper_loss=0.0825, over 18747.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0107, ecapa_loss=0.0001558, whisper_loss=0.09098, over 3884461.00 frames. ], batch size: 73, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:39:14,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2681020.0, ans=0.125 2024-08-14 13:39:18,786 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 37 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-14 13:39:20,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2681020.0, ans=0.125 2024-08-14 13:39:41,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2681220.0, ans=0.015 2024-08-14 13:39:58,465 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 17 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-14 13:40:06,278 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.79 vs. limit=15.0 2024-08-14 13:40:21,177 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 7300, loss[loss=0.08388, beats_loss=0.0155, ecapa_loss=0.0001119, whisper_loss=0.06726, over 23100.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01071, ecapa_loss=0.0001551, whisper_loss=0.09068, over 3881388.84 frames. ], batch size: 95, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:40:33,351 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.289e+01 2.573e+01 2.951e+01 1.378e+02, threshold=5.146e+01, percent-clipped=1.0 2024-08-14 13:40:44,553 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 13:40:57,187 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 23 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-14 13:41:10,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2681820.0, ans=0.125 2024-08-14 13:41:18,403 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 21 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-14 13:41:20,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2681920.0, ans=0.125 2024-08-14 13:41:32,282 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-14 13:41:36,490 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 7350, loss[loss=0.09647, beats_loss=0.009786, ecapa_loss=0.0001797, whisper_loss=0.08489, over 16199.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01069, ecapa_loss=0.0001561, whisper_loss=0.09036, over 3850815.46 frames. ], batch size: 64, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:41:44,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2682020.0, ans=0.125 2024-08-14 13:41:58,017 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.40 vs. limit=10.0 2024-08-14 13:42:07,340 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.14 vs. limit=22.5 2024-08-14 13:42:12,354 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2682220.0, ans=0.2 2024-08-14 13:42:19,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2682320.0, ans=0.0 2024-08-14 13:42:24,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2682320.0, ans=0.125 2024-08-14 13:42:36,069 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2682420.0, ans=0.1 2024-08-14 13:42:43,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2682420.0, ans=0.125 2024-08-14 13:42:44,314 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.69 vs. limit=15.0 2024-08-14 13:42:50,689 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 7400, loss[loss=0.09552, beats_loss=0.01196, ecapa_loss=0.0001136, whisper_loss=0.08243, over 17496.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01071, ecapa_loss=0.0001561, whisper_loss=0.08996, over 3838880.97 frames. ], batch size: 67, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:42:52,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2682520.0, ans=0.1 2024-08-14 13:42:52,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2682520.0, ans=0.125 2024-08-14 13:43:02,169 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.818e+01 2.321e+01 2.551e+01 2.887e+01 1.021e+02, threshold=5.101e+01, percent-clipped=1.0 2024-08-14 13:43:21,418 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.36 vs. limit=15.0 2024-08-14 13:43:55,131 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 28 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-14 13:43:55,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=2682920.0, ans=15.0 2024-08-14 13:44:02,380 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 7450, loss[loss=0.08765, beats_loss=0.0126, ecapa_loss=0.0001244, whisper_loss=0.0738, over 15624.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01066, ecapa_loss=0.0001559, whisper_loss=0.09092, over 3837072.84 frames. ], batch size: 61, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:44:08,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2683020.0, ans=0.0 2024-08-14 13:44:12,557 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 36 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-14 13:44:21,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2683120.0, ans=0.125 2024-08-14 13:44:31,624 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.07 vs. limit=15.0 2024-08-14 13:44:33,468 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 26 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-14 13:44:38,468 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.14 vs. limit=12.0 2024-08-14 13:44:44,077 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2683220.0, ans=0.125 2024-08-14 13:44:44,488 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.27 vs. limit=22.5 2024-08-14 13:44:52,541 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 35 from LS+wenet, 12 from Vox, 40 fro AS 2024-08-14 13:45:00,834 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.21 vs. limit=22.5 2024-08-14 13:45:12,478 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2683420.0, ans=0.1 2024-08-14 13:45:16,415 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 7500, loss[loss=0.1099, beats_loss=0.008961, ecapa_loss=0.000152, whisper_loss=0.09946, over 21130.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01065, ecapa_loss=0.0001553, whisper_loss=0.09116, over 3880098.99 frames. ], batch size: 81, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:45:28,159 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.801e+01 2.296e+01 2.546e+01 2.865e+01 4.082e+01, threshold=5.092e+01, percent-clipped=0.0 2024-08-14 13:45:35,001 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.38 vs. limit=12.0 2024-08-14 13:45:37,241 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 20 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-14 13:45:39,827 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.55 vs. limit=15.0 2024-08-14 13:46:07,302 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2683820.0, ans=0.0 2024-08-14 13:46:10,253 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2683820.0, ans=0.2 2024-08-14 13:46:14,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2683820.0, ans=0.125 2024-08-14 13:46:32,473 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 7550, loss[loss=0.09238, beats_loss=0.01311, ecapa_loss=0.0001836, whisper_loss=0.07744, over 21875.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01065, ecapa_loss=0.0001576, whisper_loss=0.09125, over 3867345.43 frames. ], batch size: 94, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:46:34,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2684020.0, ans=0.1 2024-08-14 13:46:40,205 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 17 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-14 13:46:52,946 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.79 vs. limit=22.5 2024-08-14 13:46:54,060 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2684120.0, ans=0.0 2024-08-14 13:47:05,740 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 25 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-14 13:47:07,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2684220.0, ans=0.0 2024-08-14 13:47:13,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2684220.0, ans=0.125 2024-08-14 13:47:14,564 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 19 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-14 13:47:42,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2684420.0, ans=0.95 2024-08-14 13:47:46,621 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 7600, loss[loss=0.1017, beats_loss=0.01336, ecapa_loss=0.0001657, whisper_loss=0.08666, over 19959.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01066, ecapa_loss=0.0001569, whisper_loss=0.09075, over 3856787.34 frames. ], batch size: 85, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:47:58,872 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.955e+01 2.371e+01 2.546e+01 2.782e+01 5.094e+01, threshold=5.091e+01, percent-clipped=1.0 2024-08-14 13:48:27,162 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2684720.0, ans=0.125 2024-08-14 13:48:46,307 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2684920.0, ans=0.125 2024-08-14 13:48:48,284 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.52 vs. limit=6.0 2024-08-14 13:48:54,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2684920.0, ans=0.0 2024-08-14 13:49:00,484 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 7650, loss[loss=0.09683, beats_loss=0.009952, ecapa_loss=0.0001619, whisper_loss=0.08526, over 17520.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01064, ecapa_loss=0.0001566, whisper_loss=0.09059, over 3868291.86 frames. ], batch size: 68, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:49:09,076 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.15 vs. limit=10.0 2024-08-14 13:49:15,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2685120.0, ans=0.0 2024-08-14 13:49:29,204 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 22 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-14 13:49:38,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2685220.0, ans=0.1 2024-08-14 13:49:42,297 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-14 13:49:48,046 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2685320.0, ans=0.2 2024-08-14 13:50:08,596 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-14 13:50:13,961 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 7700, loss[loss=0.1151, beats_loss=0.009268, ecapa_loss=0.0001533, whisper_loss=0.1043, over 22473.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01067, ecapa_loss=0.0001558, whisper_loss=0.09035, over 3860464.62 frames. ], batch size: 88, lr: 3.28e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:50:25,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2685520.0, ans=0.125 2024-08-14 13:50:25,909 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.559e+01 2.371e+01 2.640e+01 3.039e+01 4.657e+01, threshold=5.281e+01, percent-clipped=0.0 2024-08-14 13:50:27,986 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.036e-02 2024-08-14 13:50:28,026 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2685620.0, ans=0.0 2024-08-14 13:50:32,153 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 18 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-14 13:50:52,623 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 23 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-14 13:50:59,881 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2685820.0, ans=0.0 2024-08-14 13:51:17,289 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2685920.0, ans=0.2 2024-08-14 13:51:22,613 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2685920.0, ans=0.1 2024-08-14 13:51:26,535 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 7750, loss[loss=0.1097, beats_loss=0.01093, ecapa_loss=0.0001695, whisper_loss=0.09707, over 15931.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01065, ecapa_loss=0.0001559, whisper_loss=0.09049, over 3893033.36 frames. ], batch size: 64, lr: 3.28e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:51:36,407 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2686020.0, ans=0.125 2024-08-14 13:51:36,422 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2686020.0, ans=0.1 2024-08-14 13:51:43,472 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 22 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-14 13:51:47,917 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 26 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-14 13:51:48,685 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.16 vs. limit=15.0 2024-08-14 13:52:05,830 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 24 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-14 13:52:08,014 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.34 vs. limit=22.5 2024-08-14 13:52:25,367 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2686420.0, ans=0.0 2024-08-14 13:52:29,580 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-14 13:52:31,422 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.47 vs. limit=15.0 2024-08-14 13:52:37,061 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.58 vs. limit=15.0 2024-08-14 13:52:40,677 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 7800, loss[loss=0.07033, beats_loss=0.01161, ecapa_loss=0.0001338, whisper_loss=0.05739, over 14600.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01073, ecapa_loss=0.0001553, whisper_loss=0.0902, over 3877456.06 frames. ], batch size: 57, lr: 3.28e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:52:52,228 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.962e+01 2.425e+01 2.611e+01 2.883e+01 9.855e+01, threshold=5.222e+01, percent-clipped=1.0 2024-08-14 13:53:03,278 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 14 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-14 13:53:11,564 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-14 13:53:29,909 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2686820.0, ans=0.2 2024-08-14 13:53:31,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2686820.0, ans=0.0 2024-08-14 13:53:51,912 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 32 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-14 13:53:53,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2687020.0, ans=0.0 2024-08-14 13:53:54,562 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 7850, loss[loss=0.1061, beats_loss=0.01092, ecapa_loss=0.0001493, whisper_loss=0.0937, over 14858.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01067, ecapa_loss=0.0001562, whisper_loss=0.09054, over 3848465.00 frames. ], batch size: 59, lr: 3.28e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:53:57,484 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 22 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-14 13:54:05,041 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2687020.0, ans=0.125 2024-08-14 13:54:06,327 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2687020.0, ans=0.125 2024-08-14 13:54:11,693 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 25 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-14 13:54:17,774 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 28 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-14 13:54:20,647 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 18 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-14 13:54:27,051 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 18 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-14 13:54:39,530 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2687320.0, ans=0.05 2024-08-14 13:54:44,005 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2687320.0, ans=0.125 2024-08-14 13:54:52,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2687420.0, ans=0.125 2024-08-14 13:54:57,588 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 21 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-14 13:55:08,998 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 7900, loss[loss=0.1058, beats_loss=0.01276, ecapa_loss=0.0001055, whisper_loss=0.09203, over 22710.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01072, ecapa_loss=0.0001563, whisper_loss=0.09032, over 3858579.29 frames. ], batch size: 88, lr: 3.28e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:55:16,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2687520.0, ans=0.0 2024-08-14 13:55:20,568 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.378e+01 2.612e+01 2.895e+01 1.059e+02, threshold=5.225e+01, percent-clipped=1.0 2024-08-14 13:55:43,263 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2687720.0, ans=0.2 2024-08-14 13:55:52,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2687820.0, ans=0.1 2024-08-14 13:56:14,497 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2687920.0, ans=0.125 2024-08-14 13:56:22,931 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 7950, loss[loss=0.09351, beats_loss=0.01204, ecapa_loss=0.0001317, whisper_loss=0.08016, over 22857.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01065, ecapa_loss=0.0001558, whisper_loss=0.09098, over 3891242.86 frames. ], batch size: 89, lr: 3.28e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:56:26,194 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-14 13:56:44,073 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 23 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-14 13:56:48,888 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2688120.0, ans=0.0 2024-08-14 13:57:00,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2688220.0, ans=0.125 2024-08-14 13:57:10,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=2688320.0, ans=22.5 2024-08-14 13:57:37,558 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 8000, loss[loss=0.09828, beats_loss=0.01041, ecapa_loss=0.0001615, whisper_loss=0.08625, over 22821.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01061, ecapa_loss=0.0001561, whisper_loss=0.09163, over 3898627.03 frames. ], batch size: 94, lr: 3.28e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:57:39,092 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-14 13:57:42,031 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 22 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-14 13:57:48,981 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.795e+01 2.382e+01 2.629e+01 3.053e+01 3.860e+01, threshold=5.259e+01, percent-clipped=0.0 2024-08-14 13:57:57,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2688620.0, ans=0.04949747468305833 2024-08-14 13:58:09,343 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.16 vs. limit=15.0 2024-08-14 13:58:20,765 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2024-08-14 13:58:29,628 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2688820.0, ans=0.0 2024-08-14 13:58:35,531 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.78 vs. limit=15.0 2024-08-14 13:58:42,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2688920.0, ans=0.2 2024-08-14 13:58:45,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2688920.0, ans=0.2 2024-08-14 13:58:50,857 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 8050, loss[loss=0.08695, beats_loss=0.01047, ecapa_loss=0.0001774, whisper_loss=0.07471, over 19905.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01067, ecapa_loss=0.0001555, whisper_loss=0.09091, over 3881443.57 frames. ], batch size: 80, lr: 3.28e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:58:56,766 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 28 from LS+wenet, 10 from Vox, 28 fro AS 2024-08-14 13:59:33,389 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.01 vs. limit=15.0 2024-08-14 14:00:03,724 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 8100, loss[loss=0.06922, beats_loss=0.01091, ecapa_loss=0.0001523, whisper_loss=0.05679, over 16126.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01068, ecapa_loss=0.0001556, whisper_loss=0.09041, over 3871455.20 frames. ], batch size: 68, lr: 3.28e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 14:00:08,373 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 23 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-14 14:00:09,664 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 35 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-14 14:00:15,746 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.342e+01 2.614e+01 2.950e+01 9.116e+01, threshold=5.228e+01, percent-clipped=3.0 2024-08-14 14:00:36,610 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 17 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-14 14:00:46,666 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-14 14:01:15,788 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 8150, loss[loss=0.09201, beats_loss=0.01189, ecapa_loss=0.0001294, whisper_loss=0.07882, over 16026.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01066, ecapa_loss=0.0001557, whisper_loss=0.0902, over 3869586.85 frames. ], batch size: 61, lr: 3.28e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 14:01:19,704 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2690020.0, ans=0.125 2024-08-14 14:01:25,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2690020.0, ans=0.125 2024-08-14 14:01:33,770 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 26 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-14 14:01:50,576 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-14 14:01:58,671 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2690320.0, ans=0.2 2024-08-14 14:02:23,909 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.808e-03 2024-08-14 14:02:29,245 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 8200, loss[loss=0.11, beats_loss=0.009401, ecapa_loss=0.0001509, whisper_loss=0.09907, over 15805.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01069, ecapa_loss=0.0001547, whisper_loss=0.08966, over 3889816.98 frames. ], batch size: 62, lr: 3.28e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 14:02:34,143 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2690520.0, ans=0.0 2024-08-14 14:02:40,540 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.782e+01 2.302e+01 2.493e+01 2.763e+01 4.005e+01, threshold=4.986e+01, percent-clipped=0.0 2024-08-14 14:02:51,611 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2690620.0, ans=0.125 2024-08-14 14:02:54,686 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.17 vs. limit=15.0 2024-08-14 14:03:42,201 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 8250, loss[loss=0.09153, beats_loss=0.01009, ecapa_loss=0.0002055, whisper_loss=0.07939, over 15229.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01072, ecapa_loss=0.0001559, whisper_loss=0.08964, over 3881443.38 frames. ], batch size: 60, lr: 3.28e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 14:03:57,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2691120.0, ans=0.5 2024-08-14 14:04:02,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2691120.0, ans=0.0 2024-08-14 14:04:03,074 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 14 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-14 14:04:07,711 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2691120.0, ans=0.125 2024-08-14 14:04:10,790 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 24 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-14 14:04:12,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2691220.0, ans=0.125 2024-08-14 14:04:21,450 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 24 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-14 14:04:26,135 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2691320.0, ans=0.125 2024-08-14 14:04:28,999 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.96 vs. limit=22.5 2024-08-14 14:04:35,293 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2691320.0, ans=0.0 2024-08-14 14:04:36,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2691320.0, ans=0.125 2024-08-14 14:04:37,868 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-14 14:04:38,116 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2691320.0, ans=0.1 2024-08-14 14:04:44,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2691420.0, ans=0.0 2024-08-14 14:04:44,217 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2691420.0, ans=0.0 2024-08-14 14:04:51,890 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.53 vs. limit=15.0 2024-08-14 14:04:52,434 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 27 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-14 14:04:56,476 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 8300, loss[loss=0.1049, beats_loss=0.01146, ecapa_loss=0.0001336, whisper_loss=0.09211, over 19950.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01072, ecapa_loss=0.0001554, whisper_loss=0.08955, over 3868234.66 frames. ], batch size: 80, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:05:07,020 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 11 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-14 14:05:08,162 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.767e+01 2.406e+01 2.618e+01 2.998e+01 6.409e+01, threshold=5.237e+01, percent-clipped=1.0 2024-08-14 14:05:21,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2691620.0, ans=0.0 2024-08-14 14:05:21,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2691620.0, ans=0.0 2024-08-14 14:05:40,977 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-14 14:06:10,678 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 8350, loss[loss=0.09678, beats_loss=0.01287, ecapa_loss=0.0001249, whisper_loss=0.08266, over 17381.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01077, ecapa_loss=0.0001548, whisper_loss=0.08977, over 3880788.79 frames. ], batch size: 66, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:06:38,420 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2692120.0, ans=0.2 2024-08-14 14:06:45,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=2692220.0, ans=10.0 2024-08-14 14:06:49,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2692220.0, ans=0.125 2024-08-14 14:06:59,745 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2692320.0, ans=0.0 2024-08-14 14:07:05,459 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2692320.0, ans=0.2 2024-08-14 14:07:06,881 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 14 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-14 14:07:30,285 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 8400, loss[loss=0.09639, beats_loss=0.01217, ecapa_loss=0.0001777, whisper_loss=0.08244, over 21709.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01074, ecapa_loss=0.000155, whisper_loss=0.09032, over 3896184.02 frames. ], batch size: 90, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:07:43,276 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.388e+01 2.632e+01 2.972e+01 1.432e+02, threshold=5.263e+01, percent-clipped=3.0 2024-08-14 14:07:46,927 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 22 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-14 14:07:55,320 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.08 vs. limit=15.0 2024-08-14 14:08:22,830 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2692820.0, ans=0.125 2024-08-14 14:08:24,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2692820.0, ans=0.1 2024-08-14 14:08:28,003 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=16.12 vs. limit=15.0 2024-08-14 14:08:29,153 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 14:08:48,894 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 8450, loss[loss=0.08296, beats_loss=0.01227, ecapa_loss=0.000143, whisper_loss=0.06926, over 20601.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01069, ecapa_loss=0.0001558, whisper_loss=0.09034, over 3865352.63 frames. ], batch size: 84, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:09:13,659 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-14 14:09:19,046 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2693220.0, ans=0.0 2024-08-14 14:09:57,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2693420.0, ans=0.125 2024-08-14 14:10:06,662 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 8500, loss[loss=0.1082, beats_loss=0.01169, ecapa_loss=0.0001307, whisper_loss=0.0952, over 22699.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0107, ecapa_loss=0.0001563, whisper_loss=0.08996, over 3871104.85 frames. ], batch size: 89, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:10:17,010 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 16 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-14 14:10:17,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2693520.0, ans=0.125 2024-08-14 14:10:19,599 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.978e+01 2.292e+01 2.601e+01 3.025e+01 1.070e+02, threshold=5.203e+01, percent-clipped=1.0 2024-08-14 14:10:19,747 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 23 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-14 14:10:20,409 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.51 vs. limit=15.0 2024-08-14 14:10:21,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2693620.0, ans=0.125 2024-08-14 14:10:22,921 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 17 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-14 14:10:29,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2693620.0, ans=0.125 2024-08-14 14:10:32,848 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.97 vs. limit=12.0 2024-08-14 14:10:45,469 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-14 14:10:49,320 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-14 14:10:49,499 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2693720.0, ans=0.2 2024-08-14 14:11:11,176 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.31 vs. limit=12.0 2024-08-14 14:11:24,252 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 17 from Vox, 50 fro AS 2024-08-14 14:11:27,252 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 8550, loss[loss=0.0995, beats_loss=0.01346, ecapa_loss=0.0001549, whisper_loss=0.0845, over 22041.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01078, ecapa_loss=0.0001567, whisper_loss=0.08995, over 3882128.63 frames. ], batch size: 91, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:12:12,116 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2694220.0, ans=0.1 2024-08-14 14:12:13,751 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 25 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-14 14:12:19,526 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 26 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-14 14:12:22,890 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-14 14:12:26,127 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2694320.0, ans=0.0 2024-08-14 14:12:26,577 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.80 vs. limit=10.0 2024-08-14 14:12:45,520 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 8600, loss[loss=0.07683, beats_loss=0.01363, ecapa_loss=0.000141, whisper_loss=0.06179, over 16352.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01074, ecapa_loss=0.0001566, whisper_loss=0.0903, over 3876130.84 frames. ], batch size: 66, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:12:54,826 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 19 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-14 14:12:56,386 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-14 14:12:57,641 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.714e+01 2.473e+01 2.757e+01 3.150e+01 4.170e+01, threshold=5.513e+01, percent-clipped=0.0 2024-08-14 14:13:14,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2694720.0, ans=0.1 2024-08-14 14:13:22,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2694720.0, ans=0.125 2024-08-14 14:13:36,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2694820.0, ans=0.125 2024-08-14 14:13:38,381 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 22 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-14 14:13:39,002 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.63 vs. limit=22.5 2024-08-14 14:13:51,895 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.78 vs. limit=15.0 2024-08-14 14:13:52,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2694920.0, ans=0.125 2024-08-14 14:13:58,754 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2694920.0, ans=0.0 2024-08-14 14:14:03,549 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 8650, loss[loss=0.1078, beats_loss=0.01151, ecapa_loss=0.0001803, whisper_loss=0.0945, over 19285.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01073, ecapa_loss=0.0001571, whisper_loss=0.0901, over 3883225.97 frames. ], batch size: 79, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:14:38,774 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-14 14:14:41,666 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 28 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-14 14:14:50,993 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-14 14:15:01,965 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 22 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-14 14:15:03,145 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 30 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-14 14:15:03,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2695420.0, ans=0.125 2024-08-14 14:15:04,465 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-14 14:15:06,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2695420.0, ans=0.1 2024-08-14 14:15:18,745 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 8700, loss[loss=0.1063, beats_loss=0.009798, ecapa_loss=0.0001793, whisper_loss=0.09471, over 18281.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01073, ecapa_loss=0.0001574, whisper_loss=0.09021, over 3900257.59 frames. ], batch size: 76, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:15:26,691 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 14 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-14 14:15:30,384 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.834e+01 2.361e+01 2.667e+01 2.943e+01 6.389e+01, threshold=5.334e+01, percent-clipped=1.0 2024-08-14 14:15:33,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2695620.0, ans=0.2 2024-08-14 14:16:02,368 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.50 vs. limit=12.0 2024-08-14 14:16:05,229 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.06 vs. limit=22.5 2024-08-14 14:16:31,798 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 8750, loss[loss=0.09313, beats_loss=0.01083, ecapa_loss=0.0001368, whisper_loss=0.08093, over 14988.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01064, ecapa_loss=0.0001564, whisper_loss=0.09104, over 3903829.28 frames. ], batch size: 59, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:16:38,262 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-14 14:16:47,327 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2696120.0, ans=0.0 2024-08-14 14:16:58,061 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.43 vs. limit=10.0 2024-08-14 14:17:00,010 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 36 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-14 14:17:02,719 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 34 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-14 14:17:14,721 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.56 vs. limit=15.0 2024-08-14 14:17:44,086 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 8800, loss[loss=0.1294, beats_loss=0.008285, ecapa_loss=0.0001597, whisper_loss=0.1195, over 16562.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01069, ecapa_loss=0.0001555, whisper_loss=0.09081, over 3913865.64 frames. ], batch size: 65, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:17:44,491 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 28 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-14 14:17:53,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2696520.0, ans=0.2 2024-08-14 14:17:55,731 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.755e+01 2.470e+01 2.757e+01 3.014e+01 7.462e+01, threshold=5.513e+01, percent-clipped=1.0 2024-08-14 14:18:09,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2696620.0, ans=0.0 2024-08-14 14:18:15,635 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.05 vs. limit=15.0 2024-08-14 14:18:27,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2696820.0, ans=0.125 2024-08-14 14:18:38,307 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.53 vs. limit=15.0 2024-08-14 14:18:45,425 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2696920.0, ans=0.0 2024-08-14 14:18:49,661 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 30 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-14 14:18:57,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2697020.0, ans=0.0 2024-08-14 14:18:58,394 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 8850, loss[loss=0.08407, beats_loss=0.01213, ecapa_loss=0.0001528, whisper_loss=0.07041, over 18828.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01075, ecapa_loss=0.0001545, whisper_loss=0.08985, over 3899953.79 frames. ], batch size: 79, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:19:10,912 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.02 vs. limit=22.5 2024-08-14 14:19:25,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2697120.0, ans=0.2 2024-08-14 14:19:32,436 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2697220.0, ans=0.125 2024-08-14 14:20:00,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2697420.0, ans=0.1 2024-08-14 14:20:11,714 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 8900, loss[loss=0.08872, beats_loss=0.00962, ecapa_loss=0.0002042, whisper_loss=0.07706, over 21680.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01076, ecapa_loss=0.000155, whisper_loss=0.08987, over 3882535.83 frames. ], batch size: 91, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:20:23,567 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.627e+01 2.296e+01 2.497e+01 2.712e+01 4.460e+01, threshold=4.994e+01, percent-clipped=0.0 2024-08-14 14:20:27,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2697620.0, ans=0.125 2024-08-14 14:20:57,316 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-14 14:21:03,600 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 23 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-14 14:21:09,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2697920.0, ans=0.125 2024-08-14 14:21:12,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2697920.0, ans=0.125 2024-08-14 14:21:20,269 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 36 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-14 14:21:25,805 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 8950, loss[loss=0.1093, beats_loss=0.01153, ecapa_loss=0.0001495, whisper_loss=0.0963, over 22591.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01079, ecapa_loss=0.0001542, whisper_loss=0.08998, over 3882534.69 frames. ], batch size: 92, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:21:44,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2698120.0, ans=0.2 2024-08-14 14:21:58,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2698220.0, ans=0.125 2024-08-14 14:22:12,999 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 15 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-14 14:22:14,892 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2698320.0, ans=0.125 2024-08-14 14:22:19,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2698320.0, ans=0.125 2024-08-14 14:22:24,694 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 27 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-14 14:22:33,768 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2698420.0, ans=0.2 2024-08-14 14:22:33,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2698420.0, ans=0.2 2024-08-14 14:22:36,870 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2698420.0, ans=0.1 2024-08-14 14:22:39,090 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 9000, loss[loss=0.1139, beats_loss=0.01028, ecapa_loss=0.0001374, whisper_loss=0.1022, over 19545.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01083, ecapa_loss=0.0001536, whisper_loss=0.09005, over 3905510.67 frames. ], batch size: 75, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:22:39,091 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-14 14:23:17,713 INFO [train_multi_KD3.py:1149] (0/4) Epoch 19, validation on ASR_libri: loss=0.2527, beats_loss=0, ecapa_loss=0.0005393, whisper_loss=0.2473, over 922467.00 frames. 2024-08-14 14:23:31,504 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.6991, 4.2239, 4.4840, 4.6138], device='cuda:0') 2024-08-14 14:23:35,747 INFO [train_multi_KD3.py:1149] (0/4) Epoch 19, validation on SV_voxceleb1: loss=0.00426, beats_loss=0, ecapa_loss=0.000426, whisper_loss=0, over 939242.00 frames. 2024-08-14 14:25:16,704 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([0.0003, 0.0433, 0.0237, 0.0237, 0.0074, 0.0777, 0.0243, 0.0451], device='cuda:0') 2024-08-14 14:25:24,013 INFO [train_multi_KD3.py:1149] (0/4) Epoch 19, validation on AT_audioset: loss=0.02357, beats_loss=0.02357, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 14:25:24,022 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32601MB 2024-08-14 14:25:24,545 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 24 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-14 14:25:35,740 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.359e+01 2.561e+01 2.926e+01 5.640e+01, threshold=5.122e+01, percent-clipped=1.0 2024-08-14 14:25:37,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2698620.0, ans=0.5 2024-08-14 14:25:46,717 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2698620.0, ans=0.0 2024-08-14 14:25:52,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2698720.0, ans=0.0 2024-08-14 14:25:52,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2698720.0, ans=0.125 2024-08-14 14:25:52,955 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.65 vs. limit=15.0 2024-08-14 14:25:56,303 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.78 vs. limit=15.0 2024-08-14 14:25:57,229 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=2698720.0, ans=0.1 2024-08-14 14:25:57,275 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2698720.0, ans=0.2 2024-08-14 14:26:30,484 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2698920.0, ans=0.0 2024-08-14 14:26:33,293 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2698920.0, ans=0.1 2024-08-14 14:26:38,389 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 9050, loss[loss=0.09957, beats_loss=0.01197, ecapa_loss=0.0001711, whisper_loss=0.08589, over 21992.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01084, ecapa_loss=0.0001537, whisper_loss=0.08992, over 3899008.06 frames. ], batch size: 91, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:26:42,060 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2699020.0, ans=0.125 2024-08-14 14:26:46,404 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 21 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-14 14:26:51,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2699020.0, ans=0.125 2024-08-14 14:27:13,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2699220.0, ans=0.125 2024-08-14 14:27:27,059 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.38 vs. limit=15.0 2024-08-14 14:27:29,142 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 16 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-14 14:27:41,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2699420.0, ans=0.0 2024-08-14 14:27:46,461 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.15 vs. limit=10.0 2024-08-14 14:27:48,710 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 24 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-14 14:27:52,349 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 9100, loss[loss=0.08597, beats_loss=0.01242, ecapa_loss=0.0001301, whisper_loss=0.07225, over 15162.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0109, ecapa_loss=0.0001542, whisper_loss=0.08967, over 3892370.68 frames. ], batch size: 61, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:28:00,510 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 13 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-14 14:28:00,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2699520.0, ans=0.125 2024-08-14 14:28:04,575 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.934e+01 2.245e+01 2.534e+01 2.882e+01 3.902e+01, threshold=5.067e+01, percent-clipped=0.0 2024-08-14 14:28:05,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2699520.0, ans=0.2 2024-08-14 14:28:07,885 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 20 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-14 14:28:15,391 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 17 from Vox, 48 fro AS 2024-08-14 14:28:19,689 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 23 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-14 14:28:21,155 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-14 14:28:23,512 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2024-08-14 14:28:39,129 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 29 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-14 14:28:42,448 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.22 vs. limit=22.5 2024-08-14 14:28:43,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2699820.0, ans=0.125 2024-08-14 14:29:00,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2699920.0, ans=0.0 2024-08-14 14:29:06,676 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 9150, loss[loss=0.1099, beats_loss=0.009324, ecapa_loss=0.000136, whisper_loss=0.09924, over 23225.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01077, ecapa_loss=0.0001555, whisper_loss=0.09059, over 3915072.34 frames. ], batch size: 88, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:29:13,257 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.07 vs. limit=15.0 2024-08-14 14:29:17,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2700020.0, ans=0.125 2024-08-14 14:29:19,647 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-14 14:29:26,976 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2700120.0, ans=0.125 2024-08-14 14:29:45,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2700220.0, ans=0.125 2024-08-14 14:29:46,426 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 15 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-14 14:29:49,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2700320.0, ans=0.2 2024-08-14 14:29:50,712 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 16 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-14 14:30:00,929 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.82 vs. limit=15.0 2024-08-14 14:30:03,007 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-14 14:30:19,888 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 9200, loss[loss=0.09515, beats_loss=0.009785, ecapa_loss=0.0001852, whisper_loss=0.08352, over 13694.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01087, ecapa_loss=0.0001556, whisper_loss=0.0899, over 3898460.89 frames. ], batch size: 57, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:30:20,162 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 18 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-14 14:30:23,134 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 15 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-14 14:30:24,523 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-14 14:30:31,061 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.860e+01 2.280e+01 2.601e+01 2.975e+01 5.180e+01, threshold=5.201e+01, percent-clipped=1.0 2024-08-14 14:30:54,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2700720.0, ans=0.125 2024-08-14 14:30:56,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2700720.0, ans=0.125 2024-08-14 14:30:59,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2700720.0, ans=0.125 2024-08-14 14:31:01,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2700820.0, ans=0.125 2024-08-14 14:31:03,741 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=7.754e+01 2024-08-14 14:31:10,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2700820.0, ans=0.125 2024-08-14 14:31:12,249 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 20 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-14 14:31:15,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2700820.0, ans=0.1 2024-08-14 14:31:16,793 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2700920.0, ans=0.125 2024-08-14 14:31:29,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2700920.0, ans=0.0 2024-08-14 14:31:30,728 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 18 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-14 14:31:31,822 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 9250, loss[loss=0.08663, beats_loss=0.01224, ecapa_loss=0.0001596, whisper_loss=0.07279, over 17066.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01084, ecapa_loss=0.0001561, whisper_loss=0.08958, over 3889196.71 frames. ], batch size: 71, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:31:39,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2701020.0, ans=0.1 2024-08-14 14:31:42,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2701020.0, ans=0.0 2024-08-14 14:31:51,976 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2701120.0, ans=0.1 2024-08-14 14:32:12,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2701220.0, ans=0.0 2024-08-14 14:32:16,758 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-14 14:32:19,862 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2701320.0, ans=0.125 2024-08-14 14:32:21,257 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2701320.0, ans=0.0 2024-08-14 14:32:25,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2701320.0, ans=0.125 2024-08-14 14:32:35,831 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 17 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-14 14:32:43,975 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 9300, loss[loss=0.1083, beats_loss=0.0111, ecapa_loss=0.0001504, whisper_loss=0.09568, over 19665.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01078, ecapa_loss=0.0001554, whisper_loss=0.09006, over 3874742.62 frames. ], batch size: 77, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:32:56,081 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.362e+01 2.551e+01 2.899e+01 4.764e+01, threshold=5.103e+01, percent-clipped=0.0 2024-08-14 14:33:02,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2701620.0, ans=0.125 2024-08-14 14:33:04,647 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 30 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-14 14:33:14,098 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.88 vs. limit=15.0 2024-08-14 14:33:22,090 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 24 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-14 14:33:23,773 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 25 from LS+wenet, 10 from Vox, 35 fro AS 2024-08-14 14:33:30,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2701820.0, ans=0.0 2024-08-14 14:33:31,374 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 27 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-14 14:33:35,750 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-14 14:33:55,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2701920.0, ans=0.125 2024-08-14 14:33:57,679 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 9350, loss[loss=0.1095, beats_loss=0.009472, ecapa_loss=0.0001755, whisper_loss=0.09831, over 18235.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01087, ecapa_loss=0.0001549, whisper_loss=0.08951, over 3876924.58 frames. ], batch size: 73, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:33:59,766 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2702020.0, ans=0.2 2024-08-14 14:34:12,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2702120.0, ans=0.125 2024-08-14 14:34:17,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2702120.0, ans=0.125 2024-08-14 14:34:18,270 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-14 14:34:37,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2702220.0, ans=0.125 2024-08-14 14:35:01,425 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-14 14:35:11,773 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 9400, loss[loss=0.1022, beats_loss=0.009963, ecapa_loss=0.0001538, whisper_loss=0.09072, over 21144.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01084, ecapa_loss=0.0001543, whisper_loss=0.08984, over 3870114.42 frames. ], batch size: 86, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:35:14,982 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 25 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-14 14:35:15,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2702520.0, ans=0.09899494936611666 2024-08-14 14:35:23,610 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.830e+01 2.407e+01 2.622e+01 2.905e+01 1.999e+02, threshold=5.243e+01, percent-clipped=1.0 2024-08-14 14:35:34,916 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.27 vs. limit=12.0 2024-08-14 14:36:05,901 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 20 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-14 14:36:11,218 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2702920.0, ans=0.0 2024-08-14 14:36:13,826 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-14 14:36:25,168 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 9450, loss[loss=0.1068, beats_loss=0.009617, ecapa_loss=0.0001587, whisper_loss=0.09563, over 18813.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01086, ecapa_loss=0.000154, whisper_loss=0.08933, over 3862942.55 frames. ], batch size: 73, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:36:35,966 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-14 14:36:43,773 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.01 vs. limit=10.0 2024-08-14 14:36:49,142 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.360e+00 2024-08-14 14:36:53,059 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 19 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-14 14:37:04,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2703220.0, ans=0.125 2024-08-14 14:37:06,064 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 26 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-14 14:37:14,071 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.71 vs. limit=15.0 2024-08-14 14:37:18,769 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 21 from LS+wenet, 28 from Vox, 20 fro AS 2024-08-14 14:37:23,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2703420.0, ans=0.2 2024-08-14 14:37:28,677 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2703420.0, ans=0.0 2024-08-14 14:37:30,275 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 21 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-14 14:37:32,193 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2703420.0, ans=0.1 2024-08-14 14:37:37,340 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 9500, loss[loss=0.08929, beats_loss=0.01153, ecapa_loss=0.0001522, whisper_loss=0.07623, over 22001.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01087, ecapa_loss=0.0001537, whisper_loss=0.0894, over 3843617.32 frames. ], batch size: 88, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:37:37,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2703520.0, ans=0.125 2024-08-14 14:37:42,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2703520.0, ans=0.125 2024-08-14 14:37:48,984 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.794e+01 2.397e+01 2.649e+01 2.966e+01 9.786e+01, threshold=5.299e+01, percent-clipped=1.0 2024-08-14 14:37:52,451 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2703620.0, ans=0.1 2024-08-14 14:38:11,539 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2703720.0, ans=0.125 2024-08-14 14:38:25,618 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=21.42 vs. limit=22.5 2024-08-14 14:38:36,244 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 17 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-14 14:38:41,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2703920.0, ans=0.0 2024-08-14 14:38:50,524 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 9550, loss[loss=0.09424, beats_loss=0.01269, ecapa_loss=0.0001374, whisper_loss=0.08017, over 15944.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0108, ecapa_loss=0.0001548, whisper_loss=0.08969, over 3835587.76 frames. ], batch size: 64, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:38:54,041 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2704020.0, ans=0.1 2024-08-14 14:38:55,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2704020.0, ans=0.1 2024-08-14 14:38:56,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2704020.0, ans=0.015 2024-08-14 14:39:00,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2704020.0, ans=0.125 2024-08-14 14:39:05,171 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 21 from LS+wenet, 24 from Vox, 48 fro AS 2024-08-14 14:39:09,081 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2704120.0, ans=0.035 2024-08-14 14:39:17,411 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-14 14:39:24,126 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2704220.0, ans=0.0 2024-08-14 14:39:30,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2704220.0, ans=0.0 2024-08-14 14:39:32,570 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 34 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-14 14:39:42,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2704320.0, ans=0.125 2024-08-14 14:39:43,505 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2704320.0, ans=0.1 2024-08-14 14:40:17,109 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 9600, loss[loss=0.1148, beats_loss=0.01158, ecapa_loss=0.0001345, whisper_loss=0.1019, over 21537.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01084, ecapa_loss=0.0001558, whisper_loss=0.08904, over 3852573.00 frames. ], batch size: 85, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:40:25,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2704520.0, ans=0.125 2024-08-14 14:40:31,794 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.768e+01 2.443e+01 2.792e+01 3.086e+01 6.637e+01, threshold=5.584e+01, percent-clipped=2.0 2024-08-14 14:40:42,917 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2704620.0, ans=0.1 2024-08-14 14:40:57,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2704720.0, ans=0.125 2024-08-14 14:41:19,583 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.20 vs. limit=15.0 2024-08-14 14:41:25,562 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 32 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-14 14:41:48,892 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 9650, loss[loss=0.1049, beats_loss=0.00883, ecapa_loss=0.0001472, whisper_loss=0.09463, over 16216.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01071, ecapa_loss=0.0001567, whisper_loss=0.08966, over 3825677.68 frames. ], batch size: 63, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:41:49,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2705020.0, ans=0.0 2024-08-14 14:42:00,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2705020.0, ans=0.125 2024-08-14 14:42:18,017 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 26 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-14 14:42:25,395 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2705220.0, ans=0.0 2024-08-14 14:42:45,857 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-14 14:42:48,991 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2705320.0, ans=0.07 2024-08-14 14:42:59,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2705420.0, ans=0.125 2024-08-14 14:43:05,803 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 9700, loss[loss=0.1092, beats_loss=0.011, ecapa_loss=0.0001567, whisper_loss=0.09666, over 15571.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01068, ecapa_loss=0.0001582, whisper_loss=0.08982, over 3813656.32 frames. ], batch size: 62, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:43:08,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2705520.0, ans=0.035 2024-08-14 14:43:17,851 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.210e+01 2.464e+01 2.850e+01 7.455e+01, threshold=4.928e+01, percent-clipped=1.0 2024-08-14 14:43:33,770 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2705620.0, ans=0.1 2024-08-14 14:43:45,395 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2705720.0, ans=0.125 2024-08-14 14:44:07,190 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 21 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-14 14:44:20,317 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 9750, loss[loss=0.08612, beats_loss=0.01218, ecapa_loss=0.0001243, whisper_loss=0.07269, over 21789.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01068, ecapa_loss=0.0001567, whisper_loss=0.09033, over 3831307.49 frames. ], batch size: 86, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:44:27,198 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.64 vs. limit=15.0 2024-08-14 14:44:28,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2706020.0, ans=0.2 2024-08-14 14:44:33,068 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 19 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-14 14:44:33,409 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2706020.0, ans=0.2 2024-08-14 14:44:45,615 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 16 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-14 14:45:06,076 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2706320.0, ans=0.015 2024-08-14 14:45:15,418 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 14:45:29,149 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2706420.0, ans=0.1 2024-08-14 14:45:37,054 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 9800, loss[loss=0.1102, beats_loss=0.009194, ecapa_loss=0.0001411, whisper_loss=0.09956, over 18208.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01075, ecapa_loss=0.0001557, whisper_loss=0.09012, over 3835777.44 frames. ], batch size: 70, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:45:43,909 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2706520.0, ans=0.0 2024-08-14 14:45:43,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2706520.0, ans=0.0 2024-08-14 14:45:49,077 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.322e+01 2.608e+01 2.964e+01 4.916e+01, threshold=5.216e+01, percent-clipped=0.0 2024-08-14 14:45:57,359 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2706620.0, ans=0.1 2024-08-14 14:46:01,868 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 20 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-14 14:46:06,893 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.74 vs. limit=6.0 2024-08-14 14:46:10,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2706720.0, ans=0.0 2024-08-14 14:46:51,642 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 9850, loss[loss=0.1088, beats_loss=0.008657, ecapa_loss=0.0001884, whisper_loss=0.09825, over 22456.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01069, ecapa_loss=0.0001561, whisper_loss=0.09109, over 3821646.62 frames. ], batch size: 94, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:46:52,197 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=3.139e-02 2024-08-14 14:46:54,944 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=6.762e+00 2024-08-14 14:47:00,764 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=2707020.0, ans=10.0 2024-08-14 14:47:05,015 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 25 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-14 14:47:06,924 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.09 vs. limit=15.0 2024-08-14 14:47:22,622 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 24 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-14 14:47:27,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2707220.0, ans=0.1 2024-08-14 14:47:33,341 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-14 14:47:34,600 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 20 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-14 14:47:38,417 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2707320.0, ans=10.0 2024-08-14 14:47:50,915 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2707420.0, ans=0.04949747468305833 2024-08-14 14:47:51,186 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=10.30 vs. limit=10.0 2024-08-14 14:47:52,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2707420.0, ans=0.125 2024-08-14 14:47:53,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2707420.0, ans=0.0 2024-08-14 14:48:07,261 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 9900, loss[loss=0.112, beats_loss=0.007895, ecapa_loss=0.0001845, whisper_loss=0.1023, over 17384.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01076, ecapa_loss=0.0001554, whisper_loss=0.0908, over 3796824.90 frames. ], batch size: 71, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:48:11,142 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.60 vs. limit=15.0 2024-08-14 14:48:19,579 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.359e+01 2.713e+01 2.970e+01 4.614e+01, threshold=5.426e+01, percent-clipped=0.0 2024-08-14 14:48:23,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2707620.0, ans=0.125 2024-08-14 14:48:52,682 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2707720.0, ans=0.125 2024-08-14 14:48:52,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2707720.0, ans=0.0 2024-08-14 14:49:03,922 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.21 vs. limit=22.5 2024-08-14 14:49:06,667 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 21 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-14 14:49:10,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2707820.0, ans=0.2 2024-08-14 14:49:15,220 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.996e+05 2024-08-14 14:49:38,544 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 9950, loss[loss=0.08749, beats_loss=0.01206, ecapa_loss=0.0001458, whisper_loss=0.07397, over 20630.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01079, ecapa_loss=0.000155, whisper_loss=0.09098, over 3824694.85 frames. ], batch size: 86, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:49:54,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2708020.0, ans=0.125 2024-08-14 14:49:57,875 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.52 vs. limit=12.0 2024-08-14 14:49:59,745 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2708120.0, ans=0.5 2024-08-14 14:50:17,622 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 27 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-14 14:50:32,842 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 15 from LS+wenet, 10 from Vox, 30 fro AS 2024-08-14 14:50:34,652 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 24 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-14 14:51:18,919 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-14 14:51:27,265 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 10000, loss[loss=0.1126, beats_loss=0.009505, ecapa_loss=0.0001672, whisper_loss=0.1014, over 17757.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01078, ecapa_loss=0.0001555, whisper_loss=0.09073, over 3838516.64 frames. ], batch size: 71, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:51:40,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2708520.0, ans=0.2 2024-08-14 14:51:46,220 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.731e+01 2.366e+01 2.562e+01 2.817e+01 3.470e+01, threshold=5.124e+01, percent-clipped=0.0 2024-08-14 14:51:50,551 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 14 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-14 14:51:57,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2708620.0, ans=0.125 2024-08-14 14:52:11,631 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 21 from LS+wenet, 12 from Vox, 21 fro AS 2024-08-14 14:52:32,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2708820.0, ans=0.125 2024-08-14 14:52:35,850 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.526e+01 2024-08-14 14:52:38,992 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.070e+01 2024-08-14 14:52:40,435 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2708920.0, ans=0.0 2024-08-14 14:52:58,561 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 10050, loss[loss=0.1233, beats_loss=0.009464, ecapa_loss=0.0001458, whisper_loss=0.1124, over 16816.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01066, ecapa_loss=0.0001553, whisper_loss=0.09167, over 3842476.40 frames. ], batch size: 64, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:52:59,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2709020.0, ans=0.0 2024-08-14 14:53:18,342 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 21 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-14 14:53:27,824 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-14 14:53:52,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2709320.0, ans=0.125 2024-08-14 14:54:16,984 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 10100, loss[loss=0.09757, beats_loss=0.01149, ecapa_loss=0.0001374, whisper_loss=0.08471, over 16153.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01058, ecapa_loss=0.000156, whisper_loss=0.09164, over 3862789.11 frames. ], batch size: 64, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:54:19,445 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.82 vs. limit=15.0 2024-08-14 14:54:24,710 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 24 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-14 14:54:26,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2709520.0, ans=0.125 2024-08-14 14:54:29,031 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.269e+01 2.495e+01 2.791e+01 4.696e+01, threshold=4.989e+01, percent-clipped=0.0 2024-08-14 14:54:29,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=2709520.0, ans=0.95 2024-08-14 14:54:46,172 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 22 from LS+wenet, 22 from Vox, 49 fro AS 2024-08-14 14:54:52,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2709720.0, ans=0.1 2024-08-14 14:55:08,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2709820.0, ans=0.125 2024-08-14 14:55:20,682 WARNING [optim.py:496] (0/4) Scaling gradients by 0.040875811129808426, model_norm_threshold=49.8900260925293 2024-08-14 14:55:20,857 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.21, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.113e+05, grad_sumsq=3.113e+05, orig_rms_sq=1.000e+00 2024-08-14 14:55:24,456 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.54 vs. limit=12.0 2024-08-14 14:55:34,411 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 10150, loss[loss=0.1195, beats_loss=0.01003, ecapa_loss=0.000127, whisper_loss=0.1082, over 23465.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01057, ecapa_loss=0.0001573, whisper_loss=0.09157, over 3873980.72 frames. ], batch size: 89, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:55:35,698 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2710020.0, ans=0.0 2024-08-14 14:56:01,951 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.51 vs. limit=15.0 2024-08-14 14:56:13,578 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2710220.0, ans=0.0 2024-08-14 14:56:21,043 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 16 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-14 14:56:39,840 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 28 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-14 14:56:44,158 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-14 14:56:51,707 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 10200, loss[loss=0.1018, beats_loss=0.00946, ecapa_loss=0.0001733, whisper_loss=0.0906, over 21938.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01055, ecapa_loss=0.0001589, whisper_loss=0.09166, over 3849792.24 frames. ], batch size: 91, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:57:00,110 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 37 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-14 14:57:04,288 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.342e+01 2.619e+01 2.972e+01 1.221e+03, threshold=5.239e+01, percent-clipped=2.0 2024-08-14 14:57:04,503 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 35 from Vox, 31 fro AS 2024-08-14 14:57:10,455 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 25 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-14 14:57:27,288 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 14 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-14 14:57:38,544 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.97 vs. limit=12.0 2024-08-14 14:57:42,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2710820.0, ans=0.125 2024-08-14 14:58:01,024 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 27 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-14 14:58:05,887 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-14 14:58:08,646 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 10250, loss[loss=0.1033, beats_loss=0.01005, ecapa_loss=0.0001738, whisper_loss=0.09146, over 18940.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01053, ecapa_loss=0.0001582, whisper_loss=0.09145, over 3842549.63 frames. ], batch size: 78, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:58:14,813 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.68 vs. limit=15.0 2024-08-14 14:58:32,968 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.77 vs. limit=15.0 2024-08-14 14:58:44,698 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 22 from LS+wenet, 22 from Vox, 50 fro AS 2024-08-14 14:59:18,987 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 32 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-14 14:59:29,338 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 10300, loss[loss=0.129, beats_loss=0.00904, ecapa_loss=0.0001429, whisper_loss=0.1185, over 17747.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01054, ecapa_loss=0.0001567, whisper_loss=0.09174, over 3848288.01 frames. ], batch size: 64, lr: 3.27e-03, grad_scale: 1.152921504606847e+18 2024-08-14 14:59:30,765 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 19 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-14 14:59:35,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2711520.0, ans=0.1 2024-08-14 14:59:41,486 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.309e+01 2.627e+01 3.015e+01 4.712e+01, threshold=5.254e+01, percent-clipped=0.0 2024-08-14 14:59:52,511 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2711620.0, ans=0.0 2024-08-14 14:59:57,047 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-14 15:00:07,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=2711720.0, ans=10.0 2024-08-14 15:00:10,553 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2711720.0, ans=0.125 2024-08-14 15:00:25,973 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2711820.0, ans=0.125 2024-08-14 15:00:54,246 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 10350, loss[loss=0.1044, beats_loss=0.01128, ecapa_loss=0.0001725, whisper_loss=0.09138, over 21610.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01062, ecapa_loss=0.0001565, whisper_loss=0.0912, over 3876136.53 frames. ], batch size: 89, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:00:55,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2712020.0, ans=0.0 2024-08-14 15:01:03,094 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 22 from LS+wenet, 13 from Vox, 41 fro AS 2024-08-14 15:01:08,835 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 35 from Vox, 29 fro AS 2024-08-14 15:01:10,225 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-14 15:01:14,942 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=2712120.0, ans=0.025 2024-08-14 15:01:19,377 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 20 from LS+wenet, 13 from Vox, 47 fro AS 2024-08-14 15:01:45,114 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 20 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-14 15:01:58,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2712420.0, ans=0.125 2024-08-14 15:02:05,283 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2712420.0, ans=0.125 2024-08-14 15:02:12,285 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 10400, loss[loss=0.1043, beats_loss=0.01086, ecapa_loss=0.000153, whisper_loss=0.09195, over 21569.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01057, ecapa_loss=0.0001567, whisper_loss=0.09125, over 3883451.94 frames. ], batch size: 88, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:02:19,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2712520.0, ans=0.0 2024-08-14 15:02:25,680 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.275e+01 2.638e+01 3.125e+01 4.616e+01, threshold=5.275e+01, percent-clipped=0.0 2024-08-14 15:02:38,352 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.37 vs. limit=12.0 2024-08-14 15:02:50,824 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 21 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-14 15:02:59,444 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 27 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-14 15:02:59,848 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2712820.0, ans=0.0 2024-08-14 15:03:04,535 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 24 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-14 15:03:08,030 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.66 vs. limit=15.0 2024-08-14 15:03:13,554 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.02 vs. limit=22.5 2024-08-14 15:03:26,981 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 10450, loss[loss=0.08971, beats_loss=0.01294, ecapa_loss=0.0001618, whisper_loss=0.07516, over 14639.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01061, ecapa_loss=0.000156, whisper_loss=0.09064, over 3854041.60 frames. ], batch size: 62, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:03:31,150 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=2713020.0, ans=10.0 2024-08-14 15:03:38,343 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2713020.0, ans=0.125 2024-08-14 15:03:47,170 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 15 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-14 15:03:50,515 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2713120.0, ans=0.125 2024-08-14 15:04:08,479 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2713220.0, ans=0.125 2024-08-14 15:04:11,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2713320.0, ans=0.125 2024-08-14 15:04:13,039 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2713320.0, ans=0.125 2024-08-14 15:04:26,273 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2713420.0, ans=0.1 2024-08-14 15:04:29,493 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.13 vs. limit=15.0 2024-08-14 15:04:36,645 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-14 15:04:37,043 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.30 vs. limit=15.0 2024-08-14 15:04:41,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2713520.0, ans=0.0 2024-08-14 15:04:42,085 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 10500, loss[loss=0.09513, beats_loss=0.009215, ecapa_loss=0.0001622, whisper_loss=0.08429, over 16117.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01064, ecapa_loss=0.0001558, whisper_loss=0.09041, over 3870631.72 frames. ], batch size: 62, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:04:55,511 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.389e+01 2.560e+01 2.877e+01 3.688e+01, threshold=5.121e+01, percent-clipped=0.0 2024-08-14 15:05:00,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2713620.0, ans=0.07 2024-08-14 15:05:01,797 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 25 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-14 15:05:14,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2713720.0, ans=0.05 2024-08-14 15:05:19,395 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 35 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-14 15:05:27,346 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 20 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-14 15:05:27,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2713820.0, ans=0.2 2024-08-14 15:05:28,914 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 28 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-14 15:05:35,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2713820.0, ans=0.1 2024-08-14 15:05:36,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2713820.0, ans=0.125 2024-08-14 15:05:42,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2713920.0, ans=0.125 2024-08-14 15:05:56,573 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 10550, loss[loss=0.1042, beats_loss=0.01006, ecapa_loss=0.0001596, whisper_loss=0.0925, over 14554.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01068, ecapa_loss=0.0001562, whisper_loss=0.09006, over 3854325.19 frames. ], batch size: 57, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:06:02,991 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2714020.0, ans=0.125 2024-08-14 15:06:06,130 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2714020.0, ans=0.125 2024-08-14 15:06:21,746 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-14 15:06:26,226 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-14 15:06:27,558 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 30 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-14 15:06:32,947 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.83 vs. limit=15.0 2024-08-14 15:06:44,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2714320.0, ans=0.1 2024-08-14 15:06:46,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2714320.0, ans=0.125 2024-08-14 15:06:57,378 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.96 vs. limit=6.0 2024-08-14 15:07:10,063 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 15:07:10,934 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 10600, loss[loss=0.1146, beats_loss=0.008573, ecapa_loss=0.0001588, whisper_loss=0.1045, over 15068.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0106, ecapa_loss=0.0001559, whisper_loss=0.09101, over 3875529.34 frames. ], batch size: 56, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:07:11,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2714520.0, ans=0.125 2024-08-14 15:07:24,554 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.742e+01 2.333e+01 2.524e+01 2.900e+01 4.921e+01, threshold=5.049e+01, percent-clipped=0.0 2024-08-14 15:07:38,232 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 16 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-14 15:07:41,860 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.31 vs. limit=15.0 2024-08-14 15:07:54,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2714820.0, ans=0.0 2024-08-14 15:08:25,300 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 10650, loss[loss=0.08139, beats_loss=0.01217, ecapa_loss=0.0001945, whisper_loss=0.06728, over 21601.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01055, ecapa_loss=0.0001553, whisper_loss=0.09178, over 3850368.47 frames. ], batch size: 93, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:08:44,020 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2715120.0, ans=0.2 2024-08-14 15:08:48,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2715120.0, ans=0.1 2024-08-14 15:09:11,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2715320.0, ans=0.1 2024-08-14 15:09:11,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2715320.0, ans=0.125 2024-08-14 15:09:36,129 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2715420.0, ans=0.04949747468305833 2024-08-14 15:09:39,653 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 10700, loss[loss=0.1059, beats_loss=0.008314, ecapa_loss=0.0001325, whisper_loss=0.09631, over 20827.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01058, ecapa_loss=0.000154, whisper_loss=0.0918, over 3866144.20 frames. ], batch size: 76, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:09:49,321 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.81 vs. limit=12.0 2024-08-14 15:09:50,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2715520.0, ans=0.125 2024-08-14 15:09:52,357 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2715520.0, ans=0.125 2024-08-14 15:09:53,061 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.367e+01 2.619e+01 3.037e+01 4.020e+01, threshold=5.239e+01, percent-clipped=0.0 2024-08-14 15:10:14,868 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.30 vs. limit=6.0 2024-08-14 15:10:34,060 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2715820.0, ans=0.125 2024-08-14 15:10:52,470 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2715920.0, ans=0.125 2024-08-14 15:10:54,560 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 10750, loss[loss=0.1047, beats_loss=0.008194, ecapa_loss=0.0001699, whisper_loss=0.09482, over 16499.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01067, ecapa_loss=0.0001544, whisper_loss=0.09103, over 3842778.09 frames. ], batch size: 66, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:10:58,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2716020.0, ans=0.0 2024-08-14 15:11:31,038 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 19 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-14 15:11:37,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2716220.0, ans=0.1 2024-08-14 15:11:47,869 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2716320.0, ans=0.0 2024-08-14 15:12:09,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=2716520.0, ans=0.5 2024-08-14 15:12:09,931 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 10800, loss[loss=0.1128, beats_loss=0.01102, ecapa_loss=0.0001502, whisper_loss=0.1003, over 22750.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01062, ecapa_loss=0.0001552, whisper_loss=0.09093, over 3813410.08 frames. ], batch size: 92, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:12:16,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2716520.0, ans=0.125 2024-08-14 15:12:18,493 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2716520.0, ans=0.125 2024-08-14 15:12:23,576 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.877e+01 2.404e+01 2.650e+01 3.101e+01 5.207e+01, threshold=5.300e+01, percent-clipped=0.0 2024-08-14 15:12:28,332 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-14 15:12:34,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2716620.0, ans=0.125 2024-08-14 15:12:46,839 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 16 from LS+wenet, 25 from Vox, 49 fro AS 2024-08-14 15:12:49,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2716720.0, ans=0.125 2024-08-14 15:12:54,895 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.76 vs. limit=15.0 2024-08-14 15:13:04,834 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2716820.0, ans=0.125 2024-08-14 15:13:07,364 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-14 15:13:07,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2716920.0, ans=0.125 2024-08-14 15:13:23,508 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 10850, loss[loss=0.09567, beats_loss=0.0125, ecapa_loss=0.0001529, whisper_loss=0.08164, over 22509.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01072, ecapa_loss=0.0001551, whisper_loss=0.09112, over 3850897.01 frames. ], batch size: 93, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:13:30,022 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2717020.0, ans=0.125 2024-08-14 15:13:37,489 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 22 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-14 15:13:40,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2717120.0, ans=0.125 2024-08-14 15:13:46,141 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.09 vs. limit=15.0 2024-08-14 15:13:54,277 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 26 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-14 15:13:57,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2717220.0, ans=0.2 2024-08-14 15:14:01,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2717220.0, ans=0.125 2024-08-14 15:14:05,082 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 24 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-14 15:14:15,815 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 15 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-14 15:14:18,747 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-14 15:14:27,812 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2717420.0, ans=0.125 2024-08-14 15:14:39,133 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 10900, loss[loss=0.09688, beats_loss=0.01137, ecapa_loss=0.0001728, whisper_loss=0.08378, over 14754.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01072, ecapa_loss=0.0001544, whisper_loss=0.09132, over 3866720.36 frames. ], batch size: 65, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:14:52,841 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.325e+01 2.589e+01 2.879e+01 4.786e+01, threshold=5.178e+01, percent-clipped=0.0 2024-08-14 15:15:09,337 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2717720.0, ans=0.07 2024-08-14 15:15:28,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2717820.0, ans=0.125 2024-08-14 15:15:28,694 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.47 vs. limit=12.0 2024-08-14 15:15:36,243 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2717820.0, ans=0.125 2024-08-14 15:15:39,114 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2717920.0, ans=0.125 2024-08-14 15:15:53,876 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 10950, loss[loss=0.1068, beats_loss=0.008514, ecapa_loss=0.0001726, whisper_loss=0.09651, over 17895.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01071, ecapa_loss=0.0001545, whisper_loss=0.09194, over 3896481.47 frames. ], batch size: 73, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:15:57,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2718020.0, ans=0.1 2024-08-14 15:16:00,570 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.13 vs. limit=15.0 2024-08-14 15:16:06,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2718020.0, ans=0.0 2024-08-14 15:16:09,108 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.54 vs. limit=15.0 2024-08-14 15:16:19,004 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-14 15:16:27,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2718220.0, ans=0.125 2024-08-14 15:16:28,115 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-14 15:16:37,162 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2718320.0, ans=0.0 2024-08-14 15:16:41,351 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-14 15:16:42,957 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-14 15:16:57,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2718420.0, ans=0.125 2024-08-14 15:17:10,054 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 11000, loss[loss=0.1042, beats_loss=0.01049, ecapa_loss=0.000166, whisper_loss=0.09204, over 13206.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01071, ecapa_loss=0.000156, whisper_loss=0.09157, over 3910747.80 frames. ], batch size: 55, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:17:22,098 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2718520.0, ans=0.125 2024-08-14 15:17:25,015 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.292e+01 2.575e+01 2.886e+01 4.359e+01, threshold=5.149e+01, percent-clipped=0.0 2024-08-14 15:17:43,801 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2718720.0, ans=0.1 2024-08-14 15:17:51,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2718720.0, ans=0.0 2024-08-14 15:18:01,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2718820.0, ans=0.0 2024-08-14 15:18:09,283 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 17 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-14 15:18:17,626 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 15 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-14 15:18:20,953 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2718920.0, ans=0.0 2024-08-14 15:18:34,255 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 11050, loss[loss=0.1123, beats_loss=0.009721, ecapa_loss=0.0001969, whisper_loss=0.1006, over 22944.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01066, ecapa_loss=0.0001556, whisper_loss=0.09183, over 3933687.48 frames. ], batch size: 93, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:19:02,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2719120.0, ans=0.015 2024-08-14 15:19:08,385 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-14 15:19:19,096 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.74 vs. limit=15.0 2024-08-14 15:19:20,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2719220.0, ans=0.0 2024-08-14 15:19:28,256 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 22 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-14 15:20:00,474 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 11100, loss[loss=0.1147, beats_loss=0.01026, ecapa_loss=0.0001492, whisper_loss=0.103, over 19324.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01067, ecapa_loss=0.000154, whisper_loss=0.09158, over 3923482.53 frames. ], batch size: 78, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:20:14,258 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.813e+01 2.445e+01 2.651e+01 2.947e+01 5.465e+01, threshold=5.303e+01, percent-clipped=1.0 2024-08-14 15:20:19,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2719620.0, ans=0.125 2024-08-14 15:20:31,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2719720.0, ans=0.2 2024-08-14 15:20:43,269 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.17 vs. limit=15.0 2024-08-14 15:21:02,908 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 18 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-14 15:21:09,688 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.05 vs. limit=10.0 2024-08-14 15:21:12,014 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-272000.pt 2024-08-14 15:21:19,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2720020.0, ans=0.1 2024-08-14 15:21:19,883 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 11150, loss[loss=0.1319, beats_loss=0.008974, ecapa_loss=0.0001723, whisper_loss=0.1212, over 23430.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01063, ecapa_loss=0.0001537, whisper_loss=0.09126, over 3912565.97 frames. ], batch size: 91, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:21:30,847 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.67 vs. limit=6.0 2024-08-14 15:21:34,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2720120.0, ans=0.0 2024-08-14 15:21:39,131 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 20 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-14 15:21:53,981 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2720220.0, ans=0.1 2024-08-14 15:22:00,072 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.746e+01 2024-08-14 15:22:04,255 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-14 15:22:29,463 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 15:22:33,403 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 11200, loss[loss=0.1219, beats_loss=0.008272, ecapa_loss=0.0001437, whisper_loss=0.1122, over 17010.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01064, ecapa_loss=0.0001537, whisper_loss=0.09104, over 3897859.03 frames. ], batch size: 67, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:22:46,575 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.924e+01 2.435e+01 2.587e+01 2.892e+01 4.591e+01, threshold=5.173e+01, percent-clipped=0.0 2024-08-14 15:22:50,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2720620.0, ans=0.125 2024-08-14 15:22:56,553 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2720620.0, ans=0.0 2024-08-14 15:23:32,537 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.10 vs. limit=10.0 2024-08-14 15:23:35,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2720920.0, ans=0.125 2024-08-14 15:23:35,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2720920.0, ans=0.125 2024-08-14 15:23:47,397 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 11250, loss[loss=0.0964, beats_loss=0.01025, ecapa_loss=0.0001578, whisper_loss=0.08456, over 15287.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01058, ecapa_loss=0.0001545, whisper_loss=0.09151, over 3886910.35 frames. ], batch size: 61, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:23:47,691 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-14 15:23:51,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2721020.0, ans=0.1 2024-08-14 15:24:05,668 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 22 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-14 15:24:28,313 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2721220.0, ans=0.0 2024-08-14 15:24:50,419 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 20 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-14 15:25:01,697 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2721420.0, ans=0.125 2024-08-14 15:25:08,393 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 11300, loss[loss=0.08164, beats_loss=0.01104, ecapa_loss=0.000183, whisper_loss=0.06876, over 21182.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01058, ecapa_loss=0.0001542, whisper_loss=0.09089, over 3870277.62 frames. ], batch size: 89, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:25:10,065 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 27 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-14 15:25:10,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2721520.0, ans=0.2 2024-08-14 15:25:21,482 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.886e+01 2.316e+01 2.542e+01 2.891e+01 3.051e+02, threshold=5.084e+01, percent-clipped=1.0 2024-08-14 15:25:48,621 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 20 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-14 15:25:51,358 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 24 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-14 15:25:54,931 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 15:25:57,525 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 15:26:18,541 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.29 vs. limit=15.0 2024-08-14 15:26:24,170 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 22 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-14 15:26:25,466 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 11350, loss[loss=0.121, beats_loss=0.009106, ecapa_loss=0.0001577, whisper_loss=0.1103, over 15028.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01063, ecapa_loss=0.0001539, whisper_loss=0.0907, over 3854096.31 frames. ], batch size: 57, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:26:28,553 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.15 vs. limit=15.0 2024-08-14 15:26:32,441 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.87 vs. limit=15.0 2024-08-14 15:26:33,865 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.48 vs. limit=15.0 2024-08-14 15:26:45,093 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2722120.0, ans=0.0 2024-08-14 15:26:46,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2722120.0, ans=0.125 2024-08-14 15:27:42,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2722420.0, ans=0.5 2024-08-14 15:27:51,712 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 28 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-14 15:27:57,945 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2722520.0, ans=0.2 2024-08-14 15:27:59,139 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 11400, loss[loss=0.1074, beats_loss=0.01097, ecapa_loss=0.0001424, whisper_loss=0.095, over 21919.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01061, ecapa_loss=0.0001536, whisper_loss=0.09151, over 3859216.97 frames. ], batch size: 87, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:28:01,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2722520.0, ans=0.2 2024-08-14 15:28:13,447 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.371e+01 2.609e+01 2.947e+01 4.785e+01, threshold=5.218e+01, percent-clipped=0.0 2024-08-14 15:28:35,239 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2722720.0, ans=0.125 2024-08-14 15:28:55,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2722820.0, ans=0.125 2024-08-14 15:29:20,216 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2722920.0, ans=0.125 2024-08-14 15:29:31,497 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 11450, loss[loss=0.09383, beats_loss=0.01298, ecapa_loss=0.0001364, whisper_loss=0.07949, over 18808.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01053, ecapa_loss=0.0001541, whisper_loss=0.09216, over 3845530.73 frames. ], batch size: 73, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:29:36,033 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 16 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-14 15:29:42,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2723020.0, ans=0.125 2024-08-14 15:30:28,455 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 25 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-14 15:30:30,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2723220.0, ans=0.125 2024-08-14 15:30:37,758 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 26 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-14 15:30:43,233 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2723320.0, ans=0.0 2024-08-14 15:31:28,224 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-14 15:31:30,655 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 11500, loss[loss=0.09849, beats_loss=0.01092, ecapa_loss=0.0001757, whisper_loss=0.08581, over 22241.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01055, ecapa_loss=0.0001543, whisper_loss=0.09184, over 3866163.11 frames. ], batch size: 93, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:31:52,280 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.370e+01 2.644e+01 2.916e+01 4.086e+01, threshold=5.287e+01, percent-clipped=0.0 2024-08-14 15:32:03,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2723620.0, ans=0.0 2024-08-14 15:32:35,968 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-14 15:32:42,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2723820.0, ans=0.2 2024-08-14 15:32:49,834 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2723820.0, ans=0.0 2024-08-14 15:33:11,228 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2723920.0, ans=0.2 2024-08-14 15:33:16,474 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2723920.0, ans=0.125 2024-08-14 15:33:27,077 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 13 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-14 15:33:31,510 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 11550, loss[loss=0.1119, beats_loss=0.00878, ecapa_loss=0.0001815, whisper_loss=0.1013, over 17921.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01051, ecapa_loss=0.000154, whisper_loss=0.09261, over 3889809.99 frames. ], batch size: 75, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:33:53,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2724120.0, ans=0.125 2024-08-14 15:33:59,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2724120.0, ans=0.125 2024-08-14 15:34:23,598 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.85 vs. limit=15.0 2024-08-14 15:34:28,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2724220.0, ans=0.125 2024-08-14 15:34:58,946 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 13 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-14 15:35:00,495 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-14 15:35:03,437 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 23 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-14 15:35:11,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2724420.0, ans=0.125 2024-08-14 15:35:16,403 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 11600, loss[loss=0.1071, beats_loss=0.008637, ecapa_loss=0.0001578, whisper_loss=0.09684, over 15281.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01045, ecapa_loss=0.0001547, whisper_loss=0.09294, over 3908626.96 frames. ], batch size: 56, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:35:26,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2724520.0, ans=0.1 2024-08-14 15:35:29,421 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.402e+01 2.609e+01 2.881e+01 4.573e+01, threshold=5.219e+01, percent-clipped=0.0 2024-08-14 15:35:29,749 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-14 15:35:34,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2724620.0, ans=0.0 2024-08-14 15:35:35,781 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2724620.0, ans=0.125 2024-08-14 15:35:48,820 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.93 vs. limit=22.5 2024-08-14 15:35:56,249 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=2724720.0, ans=6.0 2024-08-14 15:35:56,451 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.26 vs. limit=12.0 2024-08-14 15:36:00,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2724820.0, ans=0.1 2024-08-14 15:36:02,518 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 18 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-14 15:36:05,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2724820.0, ans=0.0 2024-08-14 15:36:14,496 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.69 vs. limit=6.0 2024-08-14 15:36:16,932 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 31 from Vox, 31 fro AS 2024-08-14 15:36:19,777 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 15 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-14 15:36:25,593 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 37 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-14 15:36:28,212 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 11650, loss[loss=0.09864, beats_loss=0.0121, ecapa_loss=0.0001504, whisper_loss=0.08504, over 19795.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01047, ecapa_loss=0.0001543, whisper_loss=0.09337, over 3928906.58 frames. ], batch size: 84, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:36:30,672 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.85 vs. limit=15.0 2024-08-14 15:36:34,914 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2725020.0, ans=0.125 2024-08-14 15:36:36,693 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2725020.0, ans=0.125 2024-08-14 15:36:47,769 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 18 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-14 15:36:56,028 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.75 vs. limit=15.0 2024-08-14 15:37:07,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2725220.0, ans=0.1 2024-08-14 15:37:27,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2725420.0, ans=0.1 2024-08-14 15:37:33,527 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2725420.0, ans=0.0 2024-08-14 15:37:42,697 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 22 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-14 15:37:42,956 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2725520.0, ans=0.125 2024-08-14 15:37:44,585 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 11700, loss[loss=0.1093, beats_loss=0.01117, ecapa_loss=0.0001588, whisper_loss=0.09655, over 17428.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01058, ecapa_loss=0.0001553, whisper_loss=0.0923, over 3906302.91 frames. ], batch size: 70, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:37:45,391 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.57 vs. limit=15.0 2024-08-14 15:37:59,644 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.942e+01 2.338e+01 2.598e+01 2.950e+01 6.638e+01, threshold=5.196e+01, percent-clipped=2.0 2024-08-14 15:38:22,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2725720.0, ans=0.2 2024-08-14 15:38:34,172 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 14 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-14 15:38:39,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2725820.0, ans=0.125 2024-08-14 15:38:43,225 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2725820.0, ans=0.0 2024-08-14 15:38:59,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2725920.0, ans=0.125 2024-08-14 15:39:11,343 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 11750, loss[loss=0.07669, beats_loss=0.01342, ecapa_loss=0.0001491, whisper_loss=0.06177, over 16736.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01076, ecapa_loss=0.0001532, whisper_loss=0.09142, over 3931561.35 frames. ], batch size: 69, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:39:13,266 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 18 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-14 15:39:16,669 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 20 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-14 15:39:17,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2726020.0, ans=0.1 2024-08-14 15:39:40,030 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 12 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-14 15:39:45,750 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2726220.0, ans=0.0 2024-08-14 15:39:52,789 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 22 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-14 15:40:14,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2726320.0, ans=0.125 2024-08-14 15:40:23,017 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-14 15:40:32,276 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 11800, loss[loss=0.1068, beats_loss=0.008765, ecapa_loss=0.0001638, whisper_loss=0.09636, over 19384.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01078, ecapa_loss=0.000154, whisper_loss=0.09084, over 3913678.79 frames. ], batch size: 77, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:40:39,822 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 24 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-14 15:40:45,180 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.963e+01 2.511e+01 2.719e+01 3.108e+01 4.014e+02, threshold=5.439e+01, percent-clipped=2.0 2024-08-14 15:40:47,115 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 19 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-14 15:40:54,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2726620.0, ans=0.125 2024-08-14 15:40:57,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2726620.0, ans=0.2 2024-08-14 15:41:01,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2726720.0, ans=0.125 2024-08-14 15:41:25,536 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2726820.0, ans=0.1 2024-08-14 15:41:30,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2726920.0, ans=0.0 2024-08-14 15:41:44,905 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 11850, loss[loss=0.1214, beats_loss=0.009326, ecapa_loss=0.0001581, whisper_loss=0.1105, over 15236.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01081, ecapa_loss=0.0001538, whisper_loss=0.09067, over 3926192.56 frames. ], batch size: 59, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:42:12,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2727220.0, ans=0.0 2024-08-14 15:42:15,293 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten.whitening_limit, batch_count=2727220.0, ans=22.5 2024-08-14 15:42:20,686 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2727220.0, ans=0.2 2024-08-14 15:42:23,762 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 19 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-14 15:42:54,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2727420.0, ans=0.2 2024-08-14 15:42:56,344 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 11900, loss[loss=0.1391, beats_loss=0.01005, ecapa_loss=0.0001285, whisper_loss=0.1278, over 23788.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01087, ecapa_loss=0.0001537, whisper_loss=0.09053, over 3945402.86 frames. ], batch size: 90, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:43:02,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2727520.0, ans=0.125 2024-08-14 15:43:09,809 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.301e+01 2.664e+01 2.917e+01 5.181e+01, threshold=5.327e+01, percent-clipped=0.0 2024-08-14 15:43:15,130 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2727620.0, ans=0.125 2024-08-14 15:43:16,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2727620.0, ans=0.04949747468305833 2024-08-14 15:43:24,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2727720.0, ans=0.2 2024-08-14 15:43:30,357 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 25 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-14 15:43:47,180 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.20 vs. limit=15.0 2024-08-14 15:43:57,401 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=2727920.0, ans=0.05 2024-08-14 15:43:57,411 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2727920.0, ans=0.0 2024-08-14 15:44:09,913 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 11950, loss[loss=0.1069, beats_loss=0.01216, ecapa_loss=0.0001724, whisper_loss=0.09299, over 18572.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01083, ecapa_loss=0.0001554, whisper_loss=0.09014, over 3896080.84 frames. ], batch size: 78, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:44:19,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2728020.0, ans=0.2 2024-08-14 15:44:20,944 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-14 15:44:32,123 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-14 15:44:38,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2728220.0, ans=0.0 2024-08-14 15:44:51,488 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2728220.0, ans=0.125 2024-08-14 15:45:10,520 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 13 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-14 15:45:12,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2728420.0, ans=0.2 2024-08-14 15:45:16,114 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-14 15:45:23,063 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 12000, loss[loss=0.08833, beats_loss=0.01222, ecapa_loss=0.0001533, whisper_loss=0.07458, over 16903.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01075, ecapa_loss=0.0001549, whisper_loss=0.09078, over 3893949.20 frames. ], batch size: 69, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:45:23,064 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-14 15:46:00,617 INFO [train_multi_KD3.py:1149] (0/4) Epoch 19, validation on ASR_libri: loss=0.2528, beats_loss=0, ecapa_loss=0.000545, whisper_loss=0.2473, over 922467.00 frames. 2024-08-14 15:46:18,357 INFO [train_multi_KD3.py:1149] (0/4) Epoch 19, validation on SV_voxceleb1: loss=0.004271, beats_loss=0, ecapa_loss=0.0004271, whisper_loss=0, over 939242.00 frames. 2024-08-14 15:48:10,020 INFO [train_multi_KD3.py:1149] (0/4) Epoch 19, validation on AT_audioset: loss=0.0235, beats_loss=0.0235, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 15:48:10,025 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32601MB 2024-08-14 15:48:21,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2728520.0, ans=0.0 2024-08-14 15:48:23,906 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.742e+01 2.361e+01 2.603e+01 2.893e+01 4.151e+01, threshold=5.206e+01, percent-clipped=0.0 2024-08-14 15:48:30,732 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2728620.0, ans=0.125 2024-08-14 15:48:31,871 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 25 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-14 15:48:41,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2728720.0, ans=0.1 2024-08-14 15:48:42,470 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2728720.0, ans=0.0 2024-08-14 15:48:45,060 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-14 15:48:51,161 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 33 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-14 15:49:09,239 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 23 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-14 15:49:15,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2728920.0, ans=0.0 2024-08-14 15:49:25,035 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 12050, loss[loss=0.1005, beats_loss=0.0123, ecapa_loss=0.00015, whisper_loss=0.08668, over 21585.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01075, ecapa_loss=0.000155, whisper_loss=0.09055, over 3857284.29 frames. ], batch size: 89, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:49:30,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2729020.0, ans=0.1 2024-08-14 15:49:56,510 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 16 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-14 15:50:00,223 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.89 vs. limit=15.0 2024-08-14 15:50:01,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2729220.0, ans=0.0 2024-08-14 15:50:10,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2729320.0, ans=0.125 2024-08-14 15:50:17,398 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 17 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-14 15:50:23,790 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2729420.0, ans=0.2 2024-08-14 15:50:26,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2729420.0, ans=0.0 2024-08-14 15:50:39,231 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 12100, loss[loss=0.1268, beats_loss=0.01065, ecapa_loss=0.0001763, whisper_loss=0.1144, over 23230.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0107, ecapa_loss=0.0001559, whisper_loss=0.09085, over 3849871.10 frames. ], batch size: 96, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:50:45,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2729520.0, ans=0.0 2024-08-14 15:50:52,540 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.911e+01 2.283e+01 2.551e+01 2.892e+01 3.951e+01, threshold=5.101e+01, percent-clipped=0.0 2024-08-14 15:50:54,339 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 12 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-14 15:50:54,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2729620.0, ans=0.125 2024-08-14 15:51:07,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2729720.0, ans=0.125 2024-08-14 15:51:14,685 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2729720.0, ans=0.125 2024-08-14 15:51:26,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2729820.0, ans=0.0 2024-08-14 15:51:51,837 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 12150, loss[loss=0.09904, beats_loss=0.01233, ecapa_loss=0.0001807, whisper_loss=0.0849, over 19693.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01075, ecapa_loss=0.0001551, whisper_loss=0.09084, over 3860160.73 frames. ], batch size: 85, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:52:22,514 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.74 vs. limit=10.0 2024-08-14 15:52:27,591 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 36 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-14 15:52:28,316 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.53 vs. limit=12.0 2024-08-14 15:52:33,682 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2730220.0, ans=0.2 2024-08-14 15:52:35,179 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2730320.0, ans=0.0 2024-08-14 15:52:36,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2730320.0, ans=0.0 2024-08-14 15:52:50,786 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.13 vs. limit=22.5 2024-08-14 15:52:59,539 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2730420.0, ans=0.125 2024-08-14 15:53:06,504 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 12200, loss[loss=0.1191, beats_loss=0.01066, ecapa_loss=0.0001362, whisper_loss=0.1071, over 17218.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01071, ecapa_loss=0.0001548, whisper_loss=0.09088, over 3819834.46 frames. ], batch size: 67, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:53:11,632 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.17 vs. limit=15.0 2024-08-14 15:53:19,561 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.701e+01 2.397e+01 2.639e+01 2.869e+01 4.830e+01, threshold=5.277e+01, percent-clipped=0.0 2024-08-14 15:53:20,129 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2730620.0, ans=0.09899494936611666 2024-08-14 15:53:21,159 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-14 15:53:22,872 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 19 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-14 15:53:23,095 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2730620.0, ans=0.125 2024-08-14 15:53:24,530 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2730620.0, ans=0.125 2024-08-14 15:53:25,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2730620.0, ans=0.125 2024-08-14 15:53:34,755 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-14 15:53:46,830 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2730720.0, ans=0.0 2024-08-14 15:53:52,700 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2730820.0, ans=0.0 2024-08-14 15:54:04,465 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2730920.0, ans=0.0 2024-08-14 15:54:09,976 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 25 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-14 15:54:14,739 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.83 vs. limit=12.0 2024-08-14 15:54:14,877 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.96 vs. limit=22.5 2024-08-14 15:54:19,673 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 12250, loss[loss=0.09407, beats_loss=0.01119, ecapa_loss=0.0001598, whisper_loss=0.08129, over 22089.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01067, ecapa_loss=0.0001554, whisper_loss=0.09138, over 3844121.86 frames. ], batch size: 92, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:54:24,164 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-14 15:54:47,108 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.20 vs. limit=22.5 2024-08-14 15:54:58,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2731220.0, ans=0.2 2024-08-14 15:55:00,836 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 22 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-14 15:55:02,314 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 19 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-14 15:55:05,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2731320.0, ans=0.09899494936611666 2024-08-14 15:55:18,593 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 35 from Vox, 29 fro AS 2024-08-14 15:55:32,677 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 12300, loss[loss=0.09283, beats_loss=0.01235, ecapa_loss=0.0001639, whisper_loss=0.07884, over 22459.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01071, ecapa_loss=0.000156, whisper_loss=0.09131, over 3881614.53 frames. ], batch size: 92, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:55:41,903 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2731520.0, ans=0.125 2024-08-14 15:55:46,099 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.098e+01 2.391e+01 2.726e+01 3.127e+01 1.434e+02, threshold=5.452e+01, percent-clipped=1.0 2024-08-14 15:56:09,189 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.48 vs. limit=15.0 2024-08-14 15:56:40,815 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2731920.0, ans=0.0 2024-08-14 15:56:46,345 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 12350, loss[loss=0.1088, beats_loss=0.008908, ecapa_loss=0.0001829, whisper_loss=0.09808, over 16683.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01073, ecapa_loss=0.0001563, whisper_loss=0.09116, over 3907991.24 frames. ], batch size: 68, lr: 3.26e-03, grad_scale: 1.152921504606847e+18 2024-08-14 15:56:50,981 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-14 15:56:59,354 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 19 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-14 15:57:15,971 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-14 15:57:45,764 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2732420.0, ans=0.0 2024-08-14 15:57:49,812 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 16 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-14 15:58:00,299 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 12400, loss[loss=0.07743, beats_loss=0.01338, ecapa_loss=0.0001426, whisper_loss=0.06263, over 21724.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01078, ecapa_loss=0.000155, whisper_loss=0.09053, over 3906758.43 frames. ], batch size: 93, lr: 3.26e-03, grad_scale: 1.152921504606847e+18 2024-08-14 15:58:00,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2732520.0, ans=0.0 2024-08-14 15:58:05,337 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2732520.0, ans=0.0 2024-08-14 15:58:08,019 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 21 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-14 15:58:13,950 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.330e+01 2.578e+01 2.980e+01 5.348e+02, threshold=5.156e+01, percent-clipped=2.0 2024-08-14 15:58:29,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2732720.0, ans=0.0 2024-08-14 15:58:30,974 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 31 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-14 15:58:33,141 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.06 vs. limit=22.5 2024-08-14 15:58:45,774 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-14 15:58:49,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2732820.0, ans=0.125 2024-08-14 15:59:09,682 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2732920.0, ans=0.125 2024-08-14 15:59:13,846 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2733020.0, ans=0.07 2024-08-14 15:59:14,829 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 12450, loss[loss=0.09635, beats_loss=0.01141, ecapa_loss=0.0001457, whisper_loss=0.08348, over 21875.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01075, ecapa_loss=0.0001555, whisper_loss=0.09056, over 3916086.67 frames. ], batch size: 90, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:59:19,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2733020.0, ans=0.2 2024-08-14 15:59:51,799 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 36 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-14 15:59:55,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2733220.0, ans=0.125 2024-08-14 15:59:56,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2733220.0, ans=0.125 2024-08-14 16:00:08,374 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 35 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-14 16:00:17,001 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.65 vs. limit=22.5 2024-08-14 16:00:25,837 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.00 vs. limit=15.0 2024-08-14 16:00:30,568 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 12500, loss[loss=0.09114, beats_loss=0.0128, ecapa_loss=0.0001434, whisper_loss=0.0769, over 15424.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01073, ecapa_loss=0.000155, whisper_loss=0.09065, over 3912550.11 frames. ], batch size: 64, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:00:45,938 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.755e+01 2.338e+01 2.506e+01 2.817e+01 7.820e+01, threshold=5.011e+01, percent-clipped=1.0 2024-08-14 16:01:00,253 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2733720.0, ans=0.125 2024-08-14 16:01:06,084 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2733720.0, ans=0.035 2024-08-14 16:01:06,435 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.55 vs. limit=12.0 2024-08-14 16:01:10,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2733720.0, ans=0.125 2024-08-14 16:01:28,647 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.68 vs. limit=12.0 2024-08-14 16:01:37,121 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 23 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-14 16:01:42,815 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.31 vs. limit=15.0 2024-08-14 16:01:46,191 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 12550, loss[loss=0.09451, beats_loss=0.01079, ecapa_loss=0.0001802, whisper_loss=0.08192, over 14592.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01068, ecapa_loss=0.0001559, whisper_loss=0.09164, over 3917792.01 frames. ], batch size: 59, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:01:46,548 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-14 16:02:08,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2734120.0, ans=0.125 2024-08-14 16:02:14,174 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-14 16:02:17,404 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 34 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-14 16:02:32,336 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-14 16:02:50,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2734420.0, ans=0.125 2024-08-14 16:02:53,735 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 25 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-14 16:03:00,582 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 12600, loss[loss=0.1005, beats_loss=0.01197, ecapa_loss=0.0001408, whisper_loss=0.08713, over 23123.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01069, ecapa_loss=0.0001561, whisper_loss=0.09156, over 3905285.73 frames. ], batch size: 91, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:03:10,820 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2734520.0, ans=0.1 2024-08-14 16:03:14,516 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.923e+01 2.270e+01 2.592e+01 3.036e+01 4.281e+01, threshold=5.185e+01, percent-clipped=0.0 2024-08-14 16:03:18,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2734620.0, ans=0.125 2024-08-14 16:03:22,543 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2734620.0, ans=0.0 2024-08-14 16:03:26,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2734620.0, ans=0.1 2024-08-14 16:03:39,317 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.56 vs. limit=15.0 2024-08-14 16:03:47,955 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-14 16:03:56,727 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 30 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-14 16:04:01,375 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 29 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-14 16:04:07,265 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 24 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-14 16:04:08,668 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-14 16:04:14,031 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 12650, loss[loss=0.07744, beats_loss=0.01403, ecapa_loss=0.0001481, whisper_loss=0.06194, over 20469.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01063, ecapa_loss=0.0001564, whisper_loss=0.09206, over 3898247.38 frames. ], batch size: 88, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:04:34,118 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.12 vs. limit=15.0 2024-08-14 16:04:39,730 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 29 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-14 16:04:42,893 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 14 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-14 16:04:50,992 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.15 vs. limit=15.0 2024-08-14 16:04:59,146 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2735320.0, ans=0.1 2024-08-14 16:05:12,674 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.37 vs. limit=15.0 2024-08-14 16:05:22,367 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-14 16:05:27,829 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 12700, loss[loss=0.1111, beats_loss=0.01062, ecapa_loss=0.000138, whisper_loss=0.0991, over 20876.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01064, ecapa_loss=0.000156, whisper_loss=0.09214, over 3884650.55 frames. ], batch size: 79, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:05:34,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2735520.0, ans=0.125 2024-08-14 16:05:42,523 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.717e+01 2.369e+01 2.524e+01 2.927e+01 4.569e+01, threshold=5.048e+01, percent-clipped=0.0 2024-08-14 16:05:44,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2735620.0, ans=0.2 2024-08-14 16:05:53,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2735620.0, ans=0.125 2024-08-14 16:05:55,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2735720.0, ans=0.125 2024-08-14 16:06:16,427 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 18 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-14 16:06:17,922 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 13 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-14 16:06:25,069 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-14 16:06:26,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2735920.0, ans=0.0 2024-08-14 16:06:32,447 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 30 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-14 16:06:41,528 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 12750, loss[loss=0.07888, beats_loss=0.01415, ecapa_loss=0.0001292, whisper_loss=0.06344, over 15302.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.0107, ecapa_loss=0.0001571, whisper_loss=0.09163, over 3873911.36 frames. ], batch size: 63, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:07:07,451 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 29 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-14 16:07:10,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2736220.0, ans=0.2 2024-08-14 16:07:14,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=2736220.0, ans=0.02 2024-08-14 16:07:37,213 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2736320.0, ans=0.125 2024-08-14 16:07:40,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2736420.0, ans=0.125 2024-08-14 16:07:55,024 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 12800, loss[loss=0.1001, beats_loss=0.01084, ecapa_loss=0.000156, whisper_loss=0.08772, over 22427.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01082, ecapa_loss=0.0001573, whisper_loss=0.09085, over 3878336.89 frames. ], batch size: 94, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:08:09,457 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.938e+01 2.300e+01 2.515e+01 2.756e+01 3.404e+01, threshold=5.031e+01, percent-clipped=0.0 2024-08-14 16:08:23,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2736720.0, ans=0.125 2024-08-14 16:08:24,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2736720.0, ans=0.035 2024-08-14 16:08:27,172 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 30 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-14 16:08:33,141 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-14 16:09:01,621 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 26 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-14 16:09:01,984 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2736920.0, ans=0.0 2024-08-14 16:09:06,968 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.75 vs. limit=15.0 2024-08-14 16:09:09,124 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 12850, loss[loss=0.09928, beats_loss=0.01023, ecapa_loss=0.0001283, whisper_loss=0.08776, over 18235.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01082, ecapa_loss=0.0001579, whisper_loss=0.09019, over 3871078.12 frames. ], batch size: 69, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:09:43,854 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.26 vs. limit=22.5 2024-08-14 16:10:07,597 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2737420.0, ans=0.125 2024-08-14 16:10:13,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2737420.0, ans=0.125 2024-08-14 16:10:13,179 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2737420.0, ans=0.0 2024-08-14 16:10:21,420 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 12900, loss[loss=0.06938, beats_loss=0.01373, ecapa_loss=0.0001081, whisper_loss=0.05457, over 14923.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01087, ecapa_loss=0.0001582, whisper_loss=0.08895, over 3858171.42 frames. ], batch size: 57, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:10:23,025 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 35 from LS+wenet, 35 from Vox, 22 fro AS 2024-08-14 16:10:25,961 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 14 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-14 16:10:35,672 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.790e+01 2.212e+01 2.559e+01 2.809e+01 4.062e+01, threshold=5.118e+01, percent-clipped=0.0 2024-08-14 16:10:36,210 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 32 from Vox, 34 fro AS 2024-08-14 16:10:47,423 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.31 vs. limit=15.0 2024-08-14 16:10:55,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2737720.0, ans=0.1 2024-08-14 16:11:05,094 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 26 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-14 16:11:34,560 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 12950, loss[loss=0.1211, beats_loss=0.008916, ecapa_loss=0.0001579, whisper_loss=0.1106, over 14822.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01072, ecapa_loss=0.0001584, whisper_loss=0.08904, over 3844969.55 frames. ], batch size: 57, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:11:55,689 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 22 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-14 16:11:55,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2738120.0, ans=0.2 2024-08-14 16:12:01,614 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.34 vs. limit=15.0 2024-08-14 16:12:03,253 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2738220.0, ans=0.5 2024-08-14 16:12:08,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2738220.0, ans=0.125 2024-08-14 16:12:27,393 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 27 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-14 16:12:49,597 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 13000, loss[loss=0.07701, beats_loss=0.01107, ecapa_loss=0.000148, whisper_loss=0.06446, over 16178.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01068, ecapa_loss=0.0001572, whisper_loss=0.09001, over 3868155.83 frames. ], batch size: 63, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:12:49,928 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 22 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-14 16:12:51,955 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.73 vs. limit=15.0 2024-08-14 16:12:55,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2738520.0, ans=0.07 2024-08-14 16:13:02,782 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.46 vs. limit=12.0 2024-08-14 16:13:04,746 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.729e+01 2.365e+01 2.543e+01 2.775e+01 1.627e+02, threshold=5.086e+01, percent-clipped=3.0 2024-08-14 16:13:09,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2738620.0, ans=0.125 2024-08-14 16:13:16,922 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2738620.0, ans=0.125 2024-08-14 16:13:19,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2738720.0, ans=0.125 2024-08-14 16:13:21,162 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-14 16:13:23,057 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2738720.0, ans=0.0 2024-08-14 16:13:43,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2738820.0, ans=0.0 2024-08-14 16:13:55,482 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.31 vs. limit=10.0 2024-08-14 16:14:04,893 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.78 vs. limit=15.0 2024-08-14 16:14:05,334 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 13050, loss[loss=0.1089, beats_loss=0.01028, ecapa_loss=0.0001172, whisper_loss=0.09746, over 15814.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0107, ecapa_loss=0.0001563, whisper_loss=0.0905, over 3862443.08 frames. ], batch size: 59, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:14:13,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2739020.0, ans=0.0 2024-08-14 16:14:19,410 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2739120.0, ans=0.125 2024-08-14 16:14:39,651 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 27 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-14 16:15:10,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2739420.0, ans=0.125 2024-08-14 16:15:12,592 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.39 vs. limit=10.0 2024-08-14 16:15:14,883 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=2739420.0, ans=0.02 2024-08-14 16:15:18,529 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 13100, loss[loss=0.1246, beats_loss=0.009597, ecapa_loss=0.0002208, whisper_loss=0.1128, over 21016.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01071, ecapa_loss=0.0001552, whisper_loss=0.09031, over 3850759.37 frames. ], batch size: 90, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:15:27,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2739520.0, ans=0.125 2024-08-14 16:15:33,721 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.901e+01 2.291e+01 2.498e+01 2.880e+01 4.346e+01, threshold=4.996e+01, percent-clipped=0.0 2024-08-14 16:15:52,109 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-14 16:15:55,091 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-14 16:15:59,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2739720.0, ans=0.04949747468305833 2024-08-14 16:16:00,901 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2739720.0, ans=0.0 2024-08-14 16:16:08,668 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-14 16:16:17,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2739920.0, ans=0.0 2024-08-14 16:16:17,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2739920.0, ans=0.125 2024-08-14 16:16:17,408 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2739920.0, ans=0.2 2024-08-14 16:16:21,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2739920.0, ans=0.2 2024-08-14 16:16:23,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2739920.0, ans=0.125 2024-08-14 16:16:27,183 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-14 16:16:29,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2739920.0, ans=0.125 2024-08-14 16:16:33,377 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 13150, loss[loss=0.07828, beats_loss=0.01258, ecapa_loss=0.0001317, whisper_loss=0.06438, over 21265.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01072, ecapa_loss=0.0001554, whisper_loss=0.09054, over 3875991.49 frames. ], batch size: 86, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:16:50,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2740120.0, ans=0.125 2024-08-14 16:16:56,285 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 15 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-14 16:17:02,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2740220.0, ans=0.2 2024-08-14 16:17:04,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2740220.0, ans=0.125 2024-08-14 16:17:09,661 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 27 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-14 16:17:10,045 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 16:17:27,658 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.83 vs. limit=15.0 2024-08-14 16:17:47,433 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 13200, loss[loss=0.1089, beats_loss=0.01078, ecapa_loss=0.0001928, whisper_loss=0.09615, over 15426.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01061, ecapa_loss=0.0001562, whisper_loss=0.09086, over 3862273.33 frames. ], batch size: 63, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:17:49,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2740520.0, ans=0.125 2024-08-14 16:17:52,633 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-14 16:17:56,812 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2740520.0, ans=0.125 2024-08-14 16:17:58,721 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.68 vs. limit=6.0 2024-08-14 16:18:02,626 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.908e+01 2.404e+01 2.825e+01 3.249e+01 1.605e+02, threshold=5.649e+01, percent-clipped=1.0 2024-08-14 16:18:18,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2740720.0, ans=0.1 2024-08-14 16:18:18,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2740720.0, ans=0.125 2024-08-14 16:18:25,773 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-14 16:18:45,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2740920.0, ans=0.125 2024-08-14 16:18:49,919 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 18 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-14 16:18:55,482 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-14 16:19:00,921 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 13250, loss[loss=0.1087, beats_loss=0.01131, ecapa_loss=0.0001603, whisper_loss=0.09579, over 20613.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01065, ecapa_loss=0.0001555, whisper_loss=0.0907, over 3844078.79 frames. ], batch size: 83, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:19:01,955 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.90 vs. limit=15.0 2024-08-14 16:19:10,904 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 16 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-14 16:19:15,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2741120.0, ans=0.125 2024-08-14 16:19:30,294 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2741220.0, ans=0.1 2024-08-14 16:19:34,500 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2741220.0, ans=0.2 2024-08-14 16:19:42,393 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.58 vs. limit=15.0 2024-08-14 16:19:54,229 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2741320.0, ans=0.125 2024-08-14 16:19:55,915 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2741320.0, ans=0.125 2024-08-14 16:20:12,811 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 13300, loss[loss=0.1039, beats_loss=0.01086, ecapa_loss=0.0001715, whisper_loss=0.09129, over 19803.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01068, ecapa_loss=0.0001547, whisper_loss=0.09096, over 3848159.56 frames. ], batch size: 81, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:20:25,776 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.04 vs. limit=15.0 2024-08-14 16:20:27,886 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.762e+01 2.357e+01 2.636e+01 2.927e+01 4.489e+01, threshold=5.273e+01, percent-clipped=0.0 2024-08-14 16:20:29,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2741620.0, ans=0.125 2024-08-14 16:20:51,891 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2741720.0, ans=0.1 2024-08-14 16:21:00,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2741820.0, ans=0.0 2024-08-14 16:21:21,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2741920.0, ans=0.125 2024-08-14 16:21:26,656 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 13350, loss[loss=0.1289, beats_loss=0.007476, ecapa_loss=0.0001669, whisper_loss=0.1197, over 23381.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01062, ecapa_loss=0.0001559, whisper_loss=0.09121, over 3836657.49 frames. ], batch size: 91, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:21:26,854 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 20 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-14 16:21:33,693 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.05 vs. limit=15.0 2024-08-14 16:21:37,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2742020.0, ans=0.125 2024-08-14 16:21:39,044 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-14 16:22:08,854 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2742220.0, ans=0.125 2024-08-14 16:22:26,998 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.59 vs. limit=6.0 2024-08-14 16:22:31,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2742420.0, ans=0.05 2024-08-14 16:22:41,002 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 13400, loss[loss=0.1036, beats_loss=0.007862, ecapa_loss=0.0001958, whisper_loss=0.09378, over 16607.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01061, ecapa_loss=0.0001567, whisper_loss=0.09081, over 3840240.47 frames. ], batch size: 67, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:22:45,209 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.52 vs. limit=10.0 2024-08-14 16:22:55,932 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.916e+01 2.422e+01 2.683e+01 3.045e+01 1.877e+02, threshold=5.367e+01, percent-clipped=2.0 2024-08-14 16:23:03,974 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2742620.0, ans=0.2 2024-08-14 16:23:13,168 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2742720.0, ans=0.0 2024-08-14 16:23:27,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2742820.0, ans=0.2 2024-08-14 16:23:30,739 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.31 vs. limit=22.5 2024-08-14 16:23:35,870 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2742820.0, ans=0.2 2024-08-14 16:23:37,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2742820.0, ans=0.05 2024-08-14 16:23:54,578 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 13450, loss[loss=0.1131, beats_loss=0.01122, ecapa_loss=0.0001148, whisper_loss=0.1007, over 23347.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01057, ecapa_loss=0.0001558, whisper_loss=0.09174, over 3866667.61 frames. ], batch size: 90, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:24:03,660 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 25 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-14 16:24:11,071 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2743120.0, ans=0.0 2024-08-14 16:24:19,304 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.94 vs. limit=15.0 2024-08-14 16:24:23,106 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 37 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 16:24:41,529 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2743320.0, ans=0.125 2024-08-14 16:24:50,022 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 18 from LS+wenet, 14 from Vox, 43 fro AS 2024-08-14 16:25:02,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2743420.0, ans=0.0 2024-08-14 16:25:03,465 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2743420.0, ans=0.125 2024-08-14 16:25:05,171 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.89 vs. limit=22.5 2024-08-14 16:25:07,508 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 14 from Vox, 51 fro AS 2024-08-14 16:25:07,831 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2743520.0, ans=0.1 2024-08-14 16:25:08,616 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 13500, loss[loss=0.09723, beats_loss=0.01443, ecapa_loss=0.0001044, whisper_loss=0.08176, over 23284.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01065, ecapa_loss=0.0001556, whisper_loss=0.09103, over 3869443.78 frames. ], batch size: 93, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:25:14,806 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 16 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-14 16:25:17,697 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-14 16:25:23,156 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.695e+01 2.281e+01 2.536e+01 2.815e+01 4.454e+01, threshold=5.072e+01, percent-clipped=0.0 2024-08-14 16:25:38,367 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2743720.0, ans=0.0 2024-08-14 16:25:39,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2743720.0, ans=0.1 2024-08-14 16:25:41,009 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 24 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-14 16:26:04,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2743820.0, ans=0.125 2024-08-14 16:26:22,144 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 13550, loss[loss=0.09939, beats_loss=0.009907, ecapa_loss=0.0001532, whisper_loss=0.08795, over 17124.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01075, ecapa_loss=0.0001546, whisper_loss=0.09069, over 3861777.43 frames. ], batch size: 66, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:26:27,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2744020.0, ans=0.0 2024-08-14 16:26:31,677 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2744020.0, ans=0.2 2024-08-14 16:26:48,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2744120.0, ans=0.1 2024-08-14 16:26:53,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2744220.0, ans=0.2 2024-08-14 16:27:06,218 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-14 16:27:18,162 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2744320.0, ans=0.035 2024-08-14 16:27:21,002 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2744420.0, ans=0.125 2024-08-14 16:27:24,069 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2744420.0, ans=0.125 2024-08-14 16:27:30,316 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.16 vs. limit=15.0 2024-08-14 16:27:34,899 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 13600, loss[loss=0.1108, beats_loss=0.01021, ecapa_loss=0.0001341, whisper_loss=0.09925, over 16479.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01076, ecapa_loss=0.0001554, whisper_loss=0.09086, over 3855757.95 frames. ], batch size: 64, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:27:49,307 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.289e+01 2.556e+01 2.921e+01 4.683e+01, threshold=5.111e+01, percent-clipped=0.0 2024-08-14 16:27:49,819 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2744620.0, ans=0.0 2024-08-14 16:27:58,334 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 22 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-14 16:28:17,944 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 27 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-14 16:28:30,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2744820.0, ans=0.125 2024-08-14 16:28:30,696 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.30 vs. limit=15.0 2024-08-14 16:28:34,216 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 19 from LS+wenet, 30 from Vox, 44 fro AS 2024-08-14 16:28:41,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2744920.0, ans=0.125 2024-08-14 16:28:48,840 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 13650, loss[loss=0.09567, beats_loss=0.0135, ecapa_loss=0.0001288, whisper_loss=0.08087, over 22354.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01076, ecapa_loss=0.000156, whisper_loss=0.09018, over 3834989.20 frames. ], batch size: 88, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:28:49,237 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-14 16:28:54,582 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 26 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-14 16:29:05,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2745120.0, ans=0.0 2024-08-14 16:29:27,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2745220.0, ans=0.0 2024-08-14 16:29:37,916 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 26 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-14 16:29:50,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2745420.0, ans=0.0 2024-08-14 16:29:53,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2745420.0, ans=0.125 2024-08-14 16:29:58,936 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.57 vs. limit=10.0 2024-08-14 16:30:00,515 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.03 vs. limit=22.5 2024-08-14 16:30:02,229 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 13700, loss[loss=0.1055, beats_loss=0.01103, ecapa_loss=0.0001311, whisper_loss=0.09311, over 23595.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01074, ecapa_loss=0.0001562, whisper_loss=0.09059, over 3873206.84 frames. ], batch size: 90, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:30:11,493 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 20 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-14 16:30:16,393 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.16 vs. limit=15.0 2024-08-14 16:30:16,807 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.735e+01 2.313e+01 2.534e+01 2.793e+01 4.098e+01, threshold=5.069e+01, percent-clipped=0.0 2024-08-14 16:30:18,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2745620.0, ans=0.0 2024-08-14 16:30:24,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2745620.0, ans=0.125 2024-08-14 16:30:31,556 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 23 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-14 16:30:34,636 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2745720.0, ans=0.125 2024-08-14 16:30:37,478 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2745720.0, ans=0.0 2024-08-14 16:30:42,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2745720.0, ans=0.0 2024-08-14 16:30:44,222 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 23 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-14 16:30:54,313 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 24 from LS+wenet, 18 from Vox, 16 fro AS 2024-08-14 16:30:59,224 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.35 vs. limit=15.0 2024-08-14 16:31:03,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2745920.0, ans=0.2 2024-08-14 16:31:04,514 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2745920.0, ans=0.125 2024-08-14 16:31:09,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2745920.0, ans=0.0 2024-08-14 16:31:14,672 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 13750, loss[loss=0.1275, beats_loss=0.009867, ecapa_loss=0.0001768, whisper_loss=0.1159, over 16250.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01064, ecapa_loss=0.0001568, whisper_loss=0.09116, over 3878397.79 frames. ], batch size: 65, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:31:21,563 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.18 vs. limit=12.0 2024-08-14 16:31:23,232 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.19 vs. limit=12.0 2024-08-14 16:31:41,707 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.63 vs. limit=22.5 2024-08-14 16:31:45,953 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 28 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-14 16:31:47,165 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 21 from LS+wenet, 23 from Vox, 19 fro AS 2024-08-14 16:31:57,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2746320.0, ans=0.0 2024-08-14 16:32:26,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2746420.0, ans=0.0 2024-08-14 16:32:28,914 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 13800, loss[loss=0.1154, beats_loss=0.009734, ecapa_loss=0.0001323, whisper_loss=0.1044, over 13768.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01067, ecapa_loss=0.0001567, whisper_loss=0.0909, over 3872908.91 frames. ], batch size: 54, lr: 3.25e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:32:29,233 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 24 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-14 16:32:32,155 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 21 from LS+wenet, 23 from Vox, 16 fro AS 2024-08-14 16:32:45,694 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.356e+01 2.629e+01 2.983e+01 1.767e+02, threshold=5.258e+01, percent-clipped=3.0 2024-08-14 16:32:58,225 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2746720.0, ans=0.0 2024-08-14 16:33:05,680 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.18 vs. limit=15.0 2024-08-14 16:33:21,550 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 24 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-14 16:33:27,343 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 21 from LS+wenet, 9 from Vox, 27 fro AS 2024-08-14 16:33:40,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2746920.0, ans=0.125 2024-08-14 16:33:43,052 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 13850, loss[loss=0.1219, beats_loss=0.008681, ecapa_loss=0.0001781, whisper_loss=0.1115, over 20387.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01064, ecapa_loss=0.0001565, whisper_loss=0.09073, over 3838317.87 frames. ], batch size: 81, lr: 3.25e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:33:54,024 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-14 16:34:04,298 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-14 16:34:26,307 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2747320.0, ans=0.125 2024-08-14 16:34:56,688 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 13900, loss[loss=0.07315, beats_loss=0.01292, ecapa_loss=0.0001728, whisper_loss=0.0585, over 18871.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01063, ecapa_loss=0.0001567, whisper_loss=0.09124, over 3861994.36 frames. ], batch size: 81, lr: 3.25e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:34:59,826 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 23 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-14 16:35:12,345 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.429e+01 2.660e+01 3.144e+01 1.636e+02, threshold=5.320e+01, percent-clipped=3.0 2024-08-14 16:35:15,682 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2747620.0, ans=0.035 2024-08-14 16:35:31,098 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 28 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-14 16:35:32,917 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2747720.0, ans=0.0 2024-08-14 16:35:42,810 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_na.min_abs, batch_count=2747820.0, ans=0.02 2024-08-14 16:36:09,811 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 13950, loss[loss=0.1022, beats_loss=0.01098, ecapa_loss=0.0001504, whisper_loss=0.0897, over 15356.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01066, ecapa_loss=0.0001555, whisper_loss=0.09112, over 3871776.79 frames. ], batch size: 60, lr: 3.25e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:36:22,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2748020.0, ans=0.1 2024-08-14 16:36:41,177 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-14 16:36:50,498 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.29 vs. limit=22.5 2024-08-14 16:36:52,848 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-14 16:37:02,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2748320.0, ans=0.1 2024-08-14 16:37:04,591 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.99 vs. limit=10.0 2024-08-14 16:37:08,449 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2748420.0, ans=0.125 2024-08-14 16:37:15,254 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 26 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-14 16:37:20,173 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2748420.0, ans=0.1 2024-08-14 16:37:22,352 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 14000, loss[loss=0.09654, beats_loss=0.01135, ecapa_loss=0.0001866, whisper_loss=0.08332, over 18886.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01071, ecapa_loss=0.0001545, whisper_loss=0.09111, over 3875765.20 frames. ], batch size: 79, lr: 3.25e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:37:28,373 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-14 16:37:38,819 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.942e+01 2.341e+01 2.629e+01 3.019e+01 1.116e+02, threshold=5.259e+01, percent-clipped=1.0 2024-08-14 16:37:39,624 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.40 vs. limit=22.5 2024-08-14 16:38:11,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2748820.0, ans=0.04949747468305833 2024-08-14 16:38:19,405 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2748820.0, ans=0.125 2024-08-14 16:38:34,425 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2748920.0, ans=0.125 2024-08-14 16:38:36,757 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 14050, loss[loss=0.09796, beats_loss=0.008981, ecapa_loss=0.0001814, whisper_loss=0.08716, over 14293.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01069, ecapa_loss=0.0001538, whisper_loss=0.09193, over 3868296.58 frames. ], batch size: 61, lr: 3.25e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:38:37,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2749020.0, ans=0.125 2024-08-14 16:38:49,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2749020.0, ans=0.1 2024-08-14 16:38:50,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2749120.0, ans=0.0 2024-08-14 16:39:09,120 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.63 vs. limit=15.0 2024-08-14 16:39:11,127 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 25 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-14 16:39:11,694 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.94 vs. limit=15.0 2024-08-14 16:39:15,570 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 22 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-14 16:39:20,315 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 16:39:24,435 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2749320.0, ans=0.2 2024-08-14 16:39:25,799 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.372e+05 2024-08-14 16:39:30,123 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-14 16:39:42,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2749420.0, ans=0.125 2024-08-14 16:39:49,137 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2749520.0, ans=0.125 2024-08-14 16:39:50,079 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 14100, loss[loss=0.1039, beats_loss=0.008309, ecapa_loss=0.0001635, whisper_loss=0.09398, over 16830.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01071, ecapa_loss=0.0001539, whisper_loss=0.09215, over 3896077.63 frames. ], batch size: 66, lr: 3.25e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:39:56,740 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.96 vs. limit=15.0 2024-08-14 16:40:02,754 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.99 vs. limit=15.0 2024-08-14 16:40:06,749 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.961e+01 2.359e+01 2.545e+01 2.723e+01 7.272e+01, threshold=5.090e+01, percent-clipped=1.0 2024-08-14 16:40:30,782 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2749720.0, ans=0.125 2024-08-14 16:40:46,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2749820.0, ans=0.125 2024-08-14 16:40:51,051 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 37 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-14 16:40:51,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2749920.0, ans=0.0 2024-08-14 16:41:04,225 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 14150, loss[loss=0.0983, beats_loss=0.009816, ecapa_loss=0.0001468, whisper_loss=0.08701, over 13739.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01066, ecapa_loss=0.0001544, whisper_loss=0.09222, over 3864936.01 frames. ], batch size: 53, lr: 3.25e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:41:27,292 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.11 vs. limit=15.0 2024-08-14 16:41:44,920 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2750220.0, ans=0.2 2024-08-14 16:41:56,273 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-14 16:42:00,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2750320.0, ans=0.0 2024-08-14 16:42:05,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2750420.0, ans=0.125 2024-08-14 16:42:08,333 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 16:42:18,197 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 14200, loss[loss=0.1077, beats_loss=0.01002, ecapa_loss=0.0001549, whisper_loss=0.09613, over 16275.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01072, ecapa_loss=0.0001539, whisper_loss=0.09221, over 3856428.47 frames. ], batch size: 68, lr: 3.25e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:42:21,813 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2750520.0, ans=0.1 2024-08-14 16:42:25,799 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 17 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-14 16:42:34,489 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.745e+01 2.417e+01 2.674e+01 2.957e+01 3.053e+02, threshold=5.348e+01, percent-clipped=2.0 2024-08-14 16:42:34,778 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 23 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-14 16:42:42,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2750620.0, ans=0.125 2024-08-14 16:42:45,592 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2750620.0, ans=0.0 2024-08-14 16:42:51,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2750720.0, ans=0.0 2024-08-14 16:43:26,876 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 31 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-14 16:43:32,658 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 14250, loss[loss=0.1049, beats_loss=0.0116, ecapa_loss=0.0001772, whisper_loss=0.09156, over 21583.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01071, ecapa_loss=0.0001533, whisper_loss=0.09148, over 3856757.50 frames. ], batch size: 91, lr: 3.25e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:43:34,784 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.48 vs. limit=15.0 2024-08-14 16:43:37,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2751020.0, ans=0.0 2024-08-14 16:43:40,220 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2751020.0, ans=0.125 2024-08-14 16:43:58,214 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 24 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-14 16:44:00,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2751120.0, ans=0.2 2024-08-14 16:44:07,873 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.12 vs. limit=15.0 2024-08-14 16:44:08,761 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 16:44:21,483 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 24 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-14 16:44:31,906 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2751420.0, ans=0.2 2024-08-14 16:44:45,261 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 14300, loss[loss=0.1072, beats_loss=0.00969, ecapa_loss=0.0001394, whisper_loss=0.0961, over 20782.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01073, ecapa_loss=0.0001529, whisper_loss=0.09102, over 3867704.00 frames. ], batch size: 80, lr: 3.25e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:45:02,426 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.810e+01 2.444e+01 2.637e+01 2.966e+01 4.430e+01, threshold=5.274e+01, percent-clipped=0.0 2024-08-14 16:45:04,365 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 24 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-14 16:45:29,091 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 23 from LS+wenet, 14 from Vox, 44 fro AS 2024-08-14 16:45:54,071 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 25 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-14 16:45:57,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2751920.0, ans=0.0 2024-08-14 16:45:59,939 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 14350, loss[loss=0.107, beats_loss=0.01138, ecapa_loss=9.886e-05, whisper_loss=0.09463, over 17455.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01077, ecapa_loss=0.0001522, whisper_loss=0.0907, over 3901694.73 frames. ], batch size: 64, lr: 3.24e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:46:04,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2752020.0, ans=0.5 2024-08-14 16:46:18,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2752120.0, ans=0.125 2024-08-14 16:46:28,494 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 19 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-14 16:46:32,744 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 18 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-14 16:47:01,404 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-14 16:47:10,169 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 22 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-14 16:47:16,529 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 14400, loss[loss=0.08172, beats_loss=0.008699, ecapa_loss=0.0002209, whisper_loss=0.07081, over 13399.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01069, ecapa_loss=0.0001532, whisper_loss=0.09111, over 3906297.22 frames. ], batch size: 56, lr: 3.24e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:47:18,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff3.min_abs, batch_count=2752520.0, ans=0.2 2024-08-14 16:47:18,737 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.10 vs. limit=22.5 2024-08-14 16:47:26,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2752520.0, ans=0.0 2024-08-14 16:47:31,209 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 26 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-14 16:47:34,139 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.965e+01 2.337e+01 2.636e+01 2.855e+01 4.364e+01, threshold=5.273e+01, percent-clipped=0.0 2024-08-14 16:47:43,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2752620.0, ans=0.2 2024-08-14 16:47:43,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2752620.0, ans=0.0 2024-08-14 16:48:08,464 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 30 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-14 16:48:10,095 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2752820.0, ans=0.125 2024-08-14 16:48:34,305 INFO [train_multi_KD3.py:1116] (0/4) Epoch 19, batch 14450, loss[loss=0.1007, beats_loss=0.01195, ecapa_loss=0.0001415, whisper_loss=0.08735, over 14722.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01082, ecapa_loss=0.0001532, whisper_loss=0.0901, over 3919005.99 frames. ], batch size: 57, lr: 3.24e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:48:41,726 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-14 16:48:52,470 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2753120.0, ans=0.125 2024-08-14 16:48:59,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2753120.0, ans=0.0 2024-08-14 16:49:11,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2753220.0, ans=0.125 2024-08-14 16:49:11,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2753220.0, ans=0.2 2024-08-14 16:49:20,593 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 22 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-14 16:49:34,404 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/epoch-19.pt 2024-08-14 16:50:13,403 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 0, loss[loss=0.09992, beats_loss=0.01123, ecapa_loss=0.0001369, whisper_loss=0.08732, over 19154.00 frames. ], tot_loss[loss=0.09992, beats_loss=0.01123, ecapa_loss=0.0001369, whisper_loss=0.08732, over 19154.00 frames. ], batch size: 78, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:50:13,405 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-14 16:50:23,160 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.5934, 4.0195, 4.3344, 4.5087], device='cuda:0') 2024-08-14 16:50:50,475 INFO [train_multi_KD3.py:1149] (0/4) Epoch 20, validation on ASR_libri: loss=0.2532, beats_loss=0, ecapa_loss=0.0005431, whisper_loss=0.2478, over 922467.00 frames. 2024-08-14 16:51:07,196 INFO [train_multi_KD3.py:1149] (0/4) Epoch 20, validation on SV_voxceleb1: loss=0.004351, beats_loss=0, ecapa_loss=0.0004351, whisper_loss=0, over 939242.00 frames. 2024-08-14 16:52:53,091 INFO [train_multi_KD3.py:1149] (0/4) Epoch 20, validation on AT_audioset: loss=0.02356, beats_loss=0.02356, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 16:52:53,095 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32601MB 2024-08-14 16:53:06,722 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.79 vs. limit=12.0 2024-08-14 16:53:42,854 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-14 16:53:46,781 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.701e+01 2.322e+01 2.623e+01 2.945e+01 5.325e+01, threshold=5.246e+01, percent-clipped=1.0 2024-08-14 16:54:07,335 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 30 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-14 16:54:07,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2753720.0, ans=0.1 2024-08-14 16:54:56,910 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 50, loss[loss=0.09023, beats_loss=0.01056, ecapa_loss=0.0001538, whisper_loss=0.07813, over 19198.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01021, ecapa_loss=0.000156, whisper_loss=0.08891, over 890602.45 frames. ], batch size: 76, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:55:56,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2754120.0, ans=0.125 2024-08-14 16:56:03,337 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.17 vs. limit=15.0 2024-08-14 16:56:38,840 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 25 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-14 16:56:50,527 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2754420.0, ans=0.2 2024-08-14 16:56:52,180 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 100, loss[loss=0.1021, beats_loss=0.0105, ecapa_loss=0.0001441, whisper_loss=0.09018, over 21816.00 frames. ], tot_loss[loss=0.09995, beats_loss=0.009792, ecapa_loss=0.000154, whisper_loss=0.08862, over 1532485.63 frames. ], batch size: 87, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:57:38,226 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.993e+01 2.579e+01 2.856e+01 3.069e+01 3.660e+02, threshold=5.711e+01, percent-clipped=1.0 2024-08-14 16:57:45,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2754620.0, ans=0.125 2024-08-14 16:58:09,405 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2754720.0, ans=0.125 2024-08-14 16:58:13,395 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2754720.0, ans=0.1 2024-08-14 16:58:21,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2754820.0, ans=0.0 2024-08-14 16:58:38,844 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 150, loss[loss=0.1149, beats_loss=0.009139, ecapa_loss=0.0001948, whisper_loss=0.1038, over 17591.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.009735, ecapa_loss=0.0001554, whisper_loss=0.0891, over 2026390.75 frames. ], batch size: 68, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:58:55,886 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2754920.0, ans=0.0 2024-08-14 16:58:58,947 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 22 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-14 16:59:02,574 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 18 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-14 16:59:11,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=2755020.0, ans=10.0 2024-08-14 16:59:24,585 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2755120.0, ans=0.125 2024-08-14 16:59:34,295 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 20 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-14 16:59:36,314 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.34 vs. limit=22.5 2024-08-14 17:00:03,756 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 200, loss[loss=0.08287, beats_loss=0.01226, ecapa_loss=0.0001512, whisper_loss=0.0691, over 19820.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.009877, ecapa_loss=0.000155, whisper_loss=0.08997, over 2449869.68 frames. ], batch size: 82, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:00:09,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2755420.0, ans=0.125 2024-08-14 17:00:19,218 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2755520.0, ans=0.0 2024-08-14 17:00:22,174 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-14 17:00:33,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2755620.0, ans=0.1 2024-08-14 17:00:36,647 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.015e+01 2.515e+01 2.860e+01 3.195e+01 5.864e+01, threshold=5.719e+01, percent-clipped=1.0 2024-08-14 17:00:37,043 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2755620.0, ans=0.125 2024-08-14 17:00:37,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2755620.0, ans=0.1 2024-08-14 17:00:44,686 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.97 vs. limit=15.0 2024-08-14 17:00:45,911 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2755620.0, ans=0.0 2024-08-14 17:00:47,008 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 28 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-14 17:00:48,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2755720.0, ans=0.0 2024-08-14 17:01:03,326 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=5.157e-03 2024-08-14 17:01:08,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2755820.0, ans=0.2 2024-08-14 17:01:18,195 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 250, loss[loss=0.1114, beats_loss=0.01039, ecapa_loss=0.0001032, whisper_loss=0.09996, over 18022.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01014, ecapa_loss=0.0001533, whisper_loss=0.08943, over 2762321.48 frames. ], batch size: 64, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:01:24,354 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2755920.0, ans=0.0 2024-08-14 17:01:25,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2755920.0, ans=0.125 2024-08-14 17:01:44,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2756020.0, ans=0.0 2024-08-14 17:01:52,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2756120.0, ans=0.1 2024-08-14 17:01:54,985 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2756120.0, ans=0.125 2024-08-14 17:02:01,484 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.76 vs. limit=15.0 2024-08-14 17:02:01,998 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 19 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-14 17:02:11,157 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 25 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-14 17:02:13,417 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.20 vs. limit=15.0 2024-08-14 17:02:15,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2756320.0, ans=0.125 2024-08-14 17:02:19,695 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 24 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-14 17:02:24,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2756320.0, ans=0.125 2024-08-14 17:02:30,030 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 300, loss[loss=0.08827, beats_loss=0.009616, ecapa_loss=0.0001584, whisper_loss=0.07707, over 15089.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01021, ecapa_loss=0.0001551, whisper_loss=0.08928, over 2967685.06 frames. ], batch size: 59, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:02:50,758 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2756520.0, ans=0.1 2024-08-14 17:02:56,020 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-14 17:03:00,062 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.823e+01 2.324e+01 2.572e+01 2.876e+01 1.018e+02, threshold=5.143e+01, percent-clipped=1.0 2024-08-14 17:03:17,208 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 18 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-14 17:03:28,673 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 18 from LS+wenet, 25 from Vox, 20 fro AS 2024-08-14 17:03:41,673 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 350, loss[loss=0.1005, beats_loss=0.01277, ecapa_loss=0.0001498, whisper_loss=0.08627, over 18787.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01018, ecapa_loss=0.0001548, whisper_loss=0.09009, over 3142943.93 frames. ], batch size: 74, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:04:03,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2757020.0, ans=0.125 2024-08-14 17:04:10,010 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=2757120.0, ans=0.025 2024-08-14 17:04:11,774 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.48 vs. limit=15.0 2024-08-14 17:04:20,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=2757120.0, ans=15.0 2024-08-14 17:04:20,833 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2757120.0, ans=0.125 2024-08-14 17:04:24,713 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 31 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-14 17:04:30,205 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 26 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-14 17:04:48,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2757320.0, ans=0.2 2024-08-14 17:04:52,631 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 400, loss[loss=0.1067, beats_loss=0.009721, ecapa_loss=0.0001468, whisper_loss=0.09548, over 19268.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0103, ecapa_loss=0.0001533, whisper_loss=0.08964, over 3309472.44 frames. ], batch size: 75, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:05:03,318 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.83 vs. limit=22.5 2024-08-14 17:05:23,257 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.797e+01 2.267e+01 2.550e+01 2.888e+01 2.244e+02, threshold=5.100e+01, percent-clipped=1.0 2024-08-14 17:05:35,697 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-14 17:05:50,947 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 19 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-14 17:05:52,756 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.06 vs. limit=22.5 2024-08-14 17:06:07,304 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 450, loss[loss=0.1039, beats_loss=0.008799, ecapa_loss=0.0001563, whisper_loss=0.09351, over 15808.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01027, ecapa_loss=0.0001542, whisper_loss=0.08985, over 3436297.91 frames. ], batch size: 61, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:06:07,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2757920.0, ans=0.0 2024-08-14 17:06:13,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2757920.0, ans=0.125 2024-08-14 17:06:15,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=2757920.0, ans=22.5 2024-08-14 17:06:21,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2758020.0, ans=0.2 2024-08-14 17:06:29,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2758020.0, ans=0.0 2024-08-14 17:06:36,245 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 27 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-14 17:06:45,586 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 28 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-14 17:06:48,947 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-14 17:06:56,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2758220.0, ans=0.125 2024-08-14 17:06:58,545 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 30 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-14 17:07:14,825 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.10 vs. limit=15.0 2024-08-14 17:07:26,192 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2758320.0, ans=0.2 2024-08-14 17:07:29,544 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 500, loss[loss=0.1266, beats_loss=0.006355, ecapa_loss=0.0001698, whisper_loss=0.1186, over 20125.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01026, ecapa_loss=0.0001545, whisper_loss=0.0905, over 3523107.02 frames. ], batch size: 77, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:07:38,634 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-14 17:08:03,988 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.300e+01 2.536e+01 2.836e+01 8.494e+01, threshold=5.071e+01, percent-clipped=3.0 2024-08-14 17:08:13,641 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 26 from LS+wenet, 10 from Vox, 21 fro AS 2024-08-14 17:08:36,242 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.41 vs. limit=22.5 2024-08-14 17:08:40,454 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 15 from Vox, 48 fro AS 2024-08-14 17:08:51,816 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 550, loss[loss=0.1054, beats_loss=0.008487, ecapa_loss=0.000173, whisper_loss=0.09514, over 22705.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0103, ecapa_loss=0.000154, whisper_loss=0.09064, over 3622819.37 frames. ], batch size: 89, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:08:51,915 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 37 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-14 17:08:54,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2758920.0, ans=0.0 2024-08-14 17:09:05,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2758920.0, ans=0.2 2024-08-14 17:09:15,914 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2759020.0, ans=0.2 2024-08-14 17:09:23,349 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 29 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-14 17:09:40,097 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 19 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-14 17:09:41,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2759120.0, ans=0.1 2024-08-14 17:10:06,046 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 20 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-14 17:10:18,420 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 600, loss[loss=0.107, beats_loss=0.00873, ecapa_loss=0.0001845, whisper_loss=0.0964, over 22525.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01039, ecapa_loss=0.000152, whisper_loss=0.09014, over 3664087.16 frames. ], batch size: 92, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:10:34,820 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2759520.0, ans=0.0 2024-08-14 17:10:38,717 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2759520.0, ans=0.09899494936611666 2024-08-14 17:10:47,293 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.05 vs. limit=12.0 2024-08-14 17:10:55,786 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.774e+01 2.286e+01 2.611e+01 2.966e+01 2.824e+02, threshold=5.221e+01, percent-clipped=2.0 2024-08-14 17:10:57,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2759620.0, ans=0.125 2024-08-14 17:11:03,295 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.288e+01 2024-08-14 17:11:06,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2759620.0, ans=0.125 2024-08-14 17:11:15,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2759720.0, ans=0.125 2024-08-14 17:11:20,625 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.35 vs. limit=15.0 2024-08-14 17:11:23,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2759720.0, ans=0.1 2024-08-14 17:11:29,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2759820.0, ans=0.125 2024-08-14 17:11:30,109 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.90 vs. limit=15.0 2024-08-14 17:11:45,495 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 650, loss[loss=0.09918, beats_loss=0.01121, ecapa_loss=0.0001509, whisper_loss=0.08647, over 22239.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01039, ecapa_loss=0.0001523, whisper_loss=0.0906, over 3730922.74 frames. ], batch size: 90, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:11:59,505 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-276000.pt 2024-08-14 17:12:02,454 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 23 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-14 17:12:08,804 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.59 vs. limit=15.0 2024-08-14 17:12:32,376 WARNING [optim.py:496] (0/4) Scaling gradients by 0.059259023517370224, model_norm_threshold=52.210243225097656 2024-08-14 17:12:32,555 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.11, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.800e+04, grad_sumsq=2.525e+04, orig_rms_sq=3.485e+00 2024-08-14 17:12:41,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2760220.0, ans=0.1 2024-08-14 17:12:46,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2760220.0, ans=0.0 2024-08-14 17:13:11,504 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 700, loss[loss=0.08974, beats_loss=0.009612, ecapa_loss=0.00017, whisper_loss=0.07843, over 21404.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01036, ecapa_loss=0.0001533, whisper_loss=0.09081, over 3763514.48 frames. ], batch size: 88, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:13:12,315 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.37 vs. limit=15.0 2024-08-14 17:13:15,617 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.19 vs. limit=12.0 2024-08-14 17:13:31,166 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 26 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-14 17:13:46,876 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.684e+01 2.378e+01 2.624e+01 2.914e+01 8.811e+02, threshold=5.248e+01, percent-clipped=3.0 2024-08-14 17:14:07,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2760720.0, ans=0.0 2024-08-14 17:14:14,426 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.79 vs. limit=15.0 2024-08-14 17:14:28,181 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 21 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-14 17:14:28,435 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2760820.0, ans=0.2 2024-08-14 17:14:31,418 INFO [train_multi_KD3.py:844] (0/4) A total of 96 cuts. 18 from LS+wenet, 27 from Vox, 51 fro AS 2024-08-14 17:14:34,866 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 35 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-14 17:14:36,173 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 750, loss[loss=0.1174, beats_loss=0.01054, ecapa_loss=0.0001443, whisper_loss=0.1054, over 22206.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01044, ecapa_loss=0.0001522, whisper_loss=0.09062, over 3774369.26 frames. ], batch size: 89, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:14:39,923 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-14 17:14:45,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2760920.0, ans=0.07 2024-08-14 17:14:46,226 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.57 vs. limit=15.0 2024-08-14 17:14:51,633 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 17:14:58,489 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.10 vs. limit=15.0 2024-08-14 17:15:00,490 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.21 vs. limit=22.5 2024-08-14 17:15:03,914 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2761020.0, ans=0.125 2024-08-14 17:15:13,912 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.47 vs. limit=22.5 2024-08-14 17:15:30,179 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2761220.0, ans=0.0 2024-08-14 17:15:40,242 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2761220.0, ans=0.125 2024-08-14 17:15:47,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2761320.0, ans=0.0 2024-08-14 17:15:52,522 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.52 vs. limit=15.0 2024-08-14 17:15:54,016 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 16 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-14 17:15:55,911 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2761320.0, ans=0.125 2024-08-14 17:15:57,621 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.04 vs. limit=15.0 2024-08-14 17:16:00,682 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 800, loss[loss=0.106, beats_loss=0.01014, ecapa_loss=0.0001224, whisper_loss=0.09462, over 21759.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01051, ecapa_loss=0.0001508, whisper_loss=0.09049, over 3802066.92 frames. ], batch size: 83, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:16:01,482 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 25 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-14 17:16:05,686 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2761420.0, ans=0.0 2024-08-14 17:16:07,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2761420.0, ans=0.05 2024-08-14 17:16:29,132 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2761520.0, ans=0.125 2024-08-14 17:16:32,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2761620.0, ans=0.04949747468305833 2024-08-14 17:16:33,156 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.798e+01 2.284e+01 2.457e+01 2.814e+01 4.816e+01, threshold=4.915e+01, percent-clipped=0.0 2024-08-14 17:16:45,514 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2761720.0, ans=0.0 2024-08-14 17:16:48,623 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2761720.0, ans=0.125 2024-08-14 17:17:06,503 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2761820.0, ans=0.125 2024-08-14 17:17:14,833 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2761820.0, ans=0.125 2024-08-14 17:17:16,334 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.26 vs. limit=15.0 2024-08-14 17:17:18,551 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 850, loss[loss=0.1122, beats_loss=0.008866, ecapa_loss=0.0001271, whisper_loss=0.1021, over 21987.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01055, ecapa_loss=0.0001498, whisper_loss=0.08995, over 3786353.34 frames. ], batch size: 80, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:17:44,979 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 23 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-14 17:17:48,603 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2762020.0, ans=0.0 2024-08-14 17:17:50,663 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.26 vs. limit=12.0 2024-08-14 17:17:52,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2762120.0, ans=0.0 2024-08-14 17:17:56,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2762120.0, ans=0.125 2024-08-14 17:18:03,094 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 16 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-14 17:18:05,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2762120.0, ans=0.1 2024-08-14 17:18:07,106 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.05 vs. limit=10.0 2024-08-14 17:18:43,485 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 900, loss[loss=0.1012, beats_loss=0.01199, ecapa_loss=0.0001192, whisper_loss=0.08804, over 22959.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01057, ecapa_loss=0.0001495, whisper_loss=0.08993, over 3811976.15 frames. ], batch size: 88, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:18:51,598 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 21 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-14 17:18:57,256 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 25 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-14 17:19:00,392 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 16 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-14 17:19:05,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2762520.0, ans=0.1 2024-08-14 17:19:13,313 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2762520.0, ans=0.125 2024-08-14 17:19:21,199 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.817e+01 2.291e+01 2.536e+01 2.780e+01 9.206e+01, threshold=5.071e+01, percent-clipped=1.0 2024-08-14 17:19:21,633 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2762620.0, ans=0.125 2024-08-14 17:19:22,646 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 20 from LS+wenet, 26 from Vox, 46 fro AS 2024-08-14 17:19:24,634 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 27 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-14 17:19:29,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2762620.0, ans=0.125 2024-08-14 17:19:30,520 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 24 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-14 17:19:31,282 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.24 vs. limit=22.5 2024-08-14 17:19:32,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2762620.0, ans=0.0 2024-08-14 17:19:37,494 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.21 vs. limit=22.5 2024-08-14 17:19:44,474 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=6.618e+01 2024-08-14 17:19:56,284 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 16 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-14 17:19:59,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2762820.0, ans=0.0 2024-08-14 17:20:06,499 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 950, loss[loss=0.09288, beats_loss=0.01099, ecapa_loss=0.0001315, whisper_loss=0.08057, over 18096.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01051, ecapa_loss=0.0001499, whisper_loss=0.08984, over 3772143.16 frames. ], batch size: 72, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:20:22,242 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.64 vs. limit=15.0 2024-08-14 17:20:35,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2763020.0, ans=0.0 2024-08-14 17:20:43,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2763020.0, ans=0.0 2024-08-14 17:20:51,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2763120.0, ans=0.125 2024-08-14 17:21:17,234 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2763220.0, ans=0.04949747468305833 2024-08-14 17:21:24,692 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 22 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-14 17:21:37,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2763320.0, ans=0.2 2024-08-14 17:21:47,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2763320.0, ans=0.0 2024-08-14 17:21:47,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2763320.0, ans=0.125 2024-08-14 17:21:54,388 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 1000, loss[loss=0.1151, beats_loss=0.007699, ecapa_loss=0.0001895, whisper_loss=0.1055, over 17412.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01056, ecapa_loss=0.0001494, whisper_loss=0.08953, over 3782590.79 frames. ], batch size: 71, lr: 3.15e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:22:05,988 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2763420.0, ans=0.2 2024-08-14 17:22:15,785 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 28 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-14 17:22:16,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2763520.0, ans=0.5 2024-08-14 17:22:27,058 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2763520.0, ans=0.1 2024-08-14 17:22:30,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2763620.0, ans=0.125 2024-08-14 17:22:31,865 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 18 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-14 17:22:33,128 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.913e+01 2.269e+01 2.540e+01 2.773e+01 4.748e+01, threshold=5.079e+01, percent-clipped=0.0 2024-08-14 17:23:03,120 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.472e-01 2024-08-14 17:23:08,595 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.38 vs. limit=15.0 2024-08-14 17:23:36,637 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 1050, loss[loss=0.1263, beats_loss=0.007429, ecapa_loss=0.0001467, whisper_loss=0.1174, over 15088.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01044, ecapa_loss=0.0001504, whisper_loss=0.08952, over 3770135.48 frames. ], batch size: 57, lr: 3.15e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:23:42,339 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.30 vs. limit=6.0 2024-08-14 17:23:58,242 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 30 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-14 17:23:58,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2764020.0, ans=0.1 2024-08-14 17:25:01,528 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2764220.0, ans=0.0 2024-08-14 17:25:19,945 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2764320.0, ans=0.0 2024-08-14 17:25:34,429 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 25 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-14 17:25:36,579 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 1100, loss[loss=0.09853, beats_loss=0.009969, ecapa_loss=0.0001113, whisper_loss=0.08745, over 23052.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01042, ecapa_loss=0.0001496, whisper_loss=0.09014, over 3772776.62 frames. ], batch size: 84, lr: 3.15e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:25:42,106 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-14 17:26:25,034 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-14 17:26:29,712 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.924e+01 2.358e+01 2.558e+01 2.908e+01 1.671e+02, threshold=5.116e+01, percent-clipped=1.0 2024-08-14 17:26:37,446 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.67 vs. limit=6.0 2024-08-14 17:26:37,551 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.77 vs. limit=22.5 2024-08-14 17:26:39,172 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 30 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-14 17:27:39,314 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 1150, loss[loss=0.1131, beats_loss=0.01074, ecapa_loss=0.0001338, whisper_loss=0.101, over 18067.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01038, ecapa_loss=0.0001495, whisper_loss=0.09097, over 3773782.20 frames. ], batch size: 70, lr: 3.15e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:28:12,495 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=13.13 vs. limit=15.0 2024-08-14 17:28:13,373 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 16 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-14 17:28:19,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2765020.0, ans=0.1 2024-08-14 17:28:49,530 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2765220.0, ans=0.0 2024-08-14 17:29:08,004 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.97 vs. limit=10.0 2024-08-14 17:29:12,758 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-14 17:29:18,031 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.67 vs. limit=15.0 2024-08-14 17:29:24,253 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 1200, loss[loss=0.08052, beats_loss=0.01199, ecapa_loss=0.0001666, whisper_loss=0.06686, over 19200.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01045, ecapa_loss=0.0001504, whisper_loss=0.09032, over 3780412.45 frames. ], batch size: 82, lr: 3.15e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:29:54,669 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.780e+01 2.420e+01 2.672e+01 3.121e+01 5.638e+01, threshold=5.344e+01, percent-clipped=1.0 2024-08-14 17:30:00,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2765620.0, ans=0.125 2024-08-14 17:30:09,524 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2765720.0, ans=0.125 2024-08-14 17:30:12,572 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 23 from LS+wenet, 23 from Vox, 47 fro AS 2024-08-14 17:30:23,481 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 25 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-14 17:30:25,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2765820.0, ans=0.125 2024-08-14 17:30:38,751 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 1250, loss[loss=0.08384, beats_loss=0.01483, ecapa_loss=9.685e-05, whisper_loss=0.06804, over 20365.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01059, ecapa_loss=0.0001502, whisper_loss=0.08988, over 3807131.79 frames. ], batch size: 81, lr: 3.15e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:31:04,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2766020.0, ans=0.0 2024-08-14 17:31:05,974 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-14 17:31:35,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2766220.0, ans=0.0 2024-08-14 17:31:36,833 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.952e+01 2024-08-14 17:31:58,374 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 1300, loss[loss=0.09754, beats_loss=0.01042, ecapa_loss=0.0001539, whisper_loss=0.08558, over 19937.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01058, ecapa_loss=0.0001495, whisper_loss=0.09013, over 3796460.81 frames. ], batch size: 81, lr: 3.15e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:32:26,136 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 15 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-14 17:32:31,528 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.917e+01 2.335e+01 2.518e+01 2.895e+01 4.834e+01, threshold=5.035e+01, percent-clipped=0.0 2024-08-14 17:32:45,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2766720.0, ans=0.0 2024-08-14 17:33:17,798 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 1350, loss[loss=0.0992, beats_loss=0.01028, ecapa_loss=0.0001738, whisper_loss=0.08718, over 21201.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01063, ecapa_loss=0.0001498, whisper_loss=0.08954, over 3789098.60 frames. ], batch size: 89, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:33:18,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2766920.0, ans=0.125 2024-08-14 17:33:19,872 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 17 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-14 17:33:26,318 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 19 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-14 17:33:28,257 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.46 vs. limit=15.0 2024-08-14 17:34:01,311 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.55 vs. limit=10.0 2024-08-14 17:34:03,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2767120.0, ans=0.125 2024-08-14 17:34:08,879 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.61 vs. limit=15.0 2024-08-14 17:34:10,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2767220.0, ans=0.1 2024-08-14 17:34:20,869 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.51 vs. limit=15.0 2024-08-14 17:34:41,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2767420.0, ans=0.0 2024-08-14 17:34:42,726 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 1400, loss[loss=0.09474, beats_loss=0.01182, ecapa_loss=0.0001738, whisper_loss=0.08118, over 18899.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01061, ecapa_loss=0.0001511, whisper_loss=0.08937, over 3799326.44 frames. ], batch size: 80, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:35:08,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2767520.0, ans=0.0 2024-08-14 17:35:14,599 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2767620.0, ans=0.0 2024-08-14 17:35:16,775 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 15 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-14 17:35:18,459 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.871e+01 2.291e+01 2.563e+01 2.822e+01 1.881e+02, threshold=5.126e+01, percent-clipped=2.0 2024-08-14 17:35:27,685 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2767620.0, ans=0.125 2024-08-14 17:35:43,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2767720.0, ans=0.125 2024-08-14 17:35:51,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2767820.0, ans=0.05 2024-08-14 17:36:00,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2767820.0, ans=0.2 2024-08-14 17:36:02,608 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2767820.0, ans=0.0 2024-08-14 17:36:41,307 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 1450, loss[loss=0.07973, beats_loss=0.01303, ecapa_loss=0.0001425, whisper_loss=0.06528, over 16584.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01063, ecapa_loss=0.00015, whisper_loss=0.08938, over 3815796.14 frames. ], batch size: 68, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:36:48,988 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-14 17:36:49,440 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.36 vs. limit=15.0 2024-08-14 17:37:14,080 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 21 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-14 17:37:16,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2768120.0, ans=0.0 2024-08-14 17:37:37,822 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2768220.0, ans=0.125 2024-08-14 17:37:39,471 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 24 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-14 17:37:45,516 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 22 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-14 17:37:50,226 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 20 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-14 17:38:01,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2768420.0, ans=0.0 2024-08-14 17:38:03,763 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 1500, loss[loss=0.09126, beats_loss=0.008746, ecapa_loss=0.0001734, whisper_loss=0.08078, over 15903.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01057, ecapa_loss=0.0001509, whisper_loss=0.08932, over 3809904.77 frames. ], batch size: 61, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:38:05,723 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2768420.0, ans=0.1 2024-08-14 17:38:05,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2768420.0, ans=0.125 2024-08-14 17:38:09,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2768420.0, ans=0.125 2024-08-14 17:38:16,872 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 23 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-14 17:38:24,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2768520.0, ans=0.125 2024-08-14 17:38:37,798 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.749e+01 2.247e+01 2.495e+01 2.740e+01 8.085e+01, threshold=4.990e+01, percent-clipped=1.0 2024-08-14 17:38:38,135 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2768620.0, ans=0.125 2024-08-14 17:38:41,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2768620.0, ans=0.0 2024-08-14 17:38:54,620 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 10 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-14 17:39:07,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2768820.0, ans=0.1 2024-08-14 17:39:07,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2768820.0, ans=0.125 2024-08-14 17:39:09,616 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2768820.0, ans=0.125 2024-08-14 17:39:25,537 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.28 vs. limit=22.5 2024-08-14 17:39:26,049 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 1550, loss[loss=0.1216, beats_loss=0.008491, ecapa_loss=0.0001492, whisper_loss=0.1116, over 18252.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01047, ecapa_loss=0.0001518, whisper_loss=0.09061, over 3847843.19 frames. ], batch size: 66, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:39:31,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2768920.0, ans=0.1 2024-08-14 17:39:34,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2768920.0, ans=0.2 2024-08-14 17:39:39,380 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.553e-02 2024-08-14 17:39:40,594 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 28 from LS+wenet, 18 from Vox, 15 fro AS 2024-08-14 17:39:42,585 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2769020.0, ans=0.0 2024-08-14 17:39:49,820 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-14 17:39:59,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2769120.0, ans=0.0 2024-08-14 17:40:05,044 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2769120.0, ans=0.0 2024-08-14 17:40:17,983 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2769220.0, ans=0.1 2024-08-14 17:40:27,725 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 18 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-14 17:40:31,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2769320.0, ans=0.125 2024-08-14 17:40:45,603 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 1600, loss[loss=0.09324, beats_loss=0.01155, ecapa_loss=0.0001557, whisper_loss=0.08014, over 17746.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01053, ecapa_loss=0.0001503, whisper_loss=0.09007, over 3830861.64 frames. ], batch size: 72, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:40:46,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2769420.0, ans=10.0 2024-08-14 17:40:50,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2769420.0, ans=0.125 2024-08-14 17:40:55,138 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 18 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-14 17:40:56,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2769420.0, ans=0.0 2024-08-14 17:41:01,237 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 24 from LS+wenet, 14 from Vox, 15 fro AS 2024-08-14 17:41:04,725 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-14 17:41:17,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2769620.0, ans=0.2 2024-08-14 17:41:17,983 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.041e+01 2.359e+01 2.603e+01 2.856e+01 4.128e+01, threshold=5.205e+01, percent-clipped=0.0 2024-08-14 17:41:21,414 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-14 17:41:48,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2769820.0, ans=0.0 2024-08-14 17:42:01,591 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 1650, loss[loss=0.09115, beats_loss=0.01261, ecapa_loss=0.0001598, whisper_loss=0.07694, over 23466.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01054, ecapa_loss=0.0001491, whisper_loss=0.09005, over 3796796.03 frames. ], batch size: 97, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:42:14,158 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 23 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-14 17:42:18,599 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 12 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-14 17:42:39,495 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2770120.0, ans=0.0 2024-08-14 17:42:40,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2770120.0, ans=0.035 2024-08-14 17:42:56,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2770220.0, ans=0.0 2024-08-14 17:43:04,926 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 39 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-14 17:43:18,620 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 1700, loss[loss=0.0641, beats_loss=0.01217, ecapa_loss=0.0001317, whisper_loss=0.05062, over 16265.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01046, ecapa_loss=0.0001502, whisper_loss=0.09001, over 3797362.31 frames. ], batch size: 62, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:43:23,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2770420.0, ans=0.125 2024-08-14 17:43:39,274 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.13 vs. limit=22.5 2024-08-14 17:43:47,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2770620.0, ans=0.125 2024-08-14 17:43:49,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2770620.0, ans=0.125 2024-08-14 17:43:51,111 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.801e+01 2.335e+01 2.576e+01 2.933e+01 1.462e+02, threshold=5.153e+01, percent-clipped=1.0 2024-08-14 17:44:13,919 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=5.402e-03 2024-08-14 17:44:18,366 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 36 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-14 17:44:18,499 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2770820.0, ans=0.125 2024-08-14 17:44:19,527 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-14 17:44:33,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2770920.0, ans=0.2 2024-08-14 17:44:33,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2770920.0, ans=0.125 2024-08-14 17:44:34,722 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 1750, loss[loss=0.1245, beats_loss=0.006437, ecapa_loss=0.0001428, whisper_loss=0.1166, over 16431.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01048, ecapa_loss=0.0001498, whisper_loss=0.09028, over 3821898.82 frames. ], batch size: 58, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:44:51,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2771020.0, ans=0.125 2024-08-14 17:44:51,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2771020.0, ans=0.035 2024-08-14 17:44:53,087 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2771020.0, ans=0.1 2024-08-14 17:45:03,578 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 21 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-14 17:45:09,604 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 23 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-14 17:45:35,510 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 27 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-14 17:45:35,752 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2771320.0, ans=0.0 2024-08-14 17:45:50,239 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 1800, loss[loss=0.1034, beats_loss=0.009522, ecapa_loss=0.0001676, whisper_loss=0.09216, over 16328.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01044, ecapa_loss=0.0001496, whisper_loss=0.09057, over 3822824.97 frames. ], batch size: 64, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:45:56,403 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 17 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-14 17:46:03,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2771520.0, ans=0.0 2024-08-14 17:46:05,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2771520.0, ans=0.1 2024-08-14 17:46:15,446 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.54 vs. limit=12.0 2024-08-14 17:46:18,387 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2771520.0, ans=0.1 2024-08-14 17:46:21,134 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 32 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-14 17:46:22,146 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.297e+01 2.564e+01 2.917e+01 8.345e+01, threshold=5.127e+01, percent-clipped=1.0 2024-08-14 17:46:23,422 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2771620.0, ans=0.0 2024-08-14 17:46:30,256 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 19 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-14 17:46:30,524 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2771620.0, ans=0.125 2024-08-14 17:46:35,265 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-14 17:46:35,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2771720.0, ans=0.0 2024-08-14 17:46:35,924 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.91 vs. limit=15.0 2024-08-14 17:46:59,836 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2771820.0, ans=0.125 2024-08-14 17:46:59,853 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2771820.0, ans=0.1 2024-08-14 17:47:06,292 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 1850, loss[loss=0.1103, beats_loss=0.0111, ecapa_loss=0.000144, whisper_loss=0.09778, over 23017.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0104, ecapa_loss=0.0001508, whisper_loss=0.09071, over 3856333.80 frames. ], batch size: 91, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:47:10,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2771920.0, ans=0.125 2024-08-14 17:47:29,648 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2772020.0, ans=0.1 2024-08-14 17:48:21,548 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 1900, loss[loss=0.1092, beats_loss=0.007953, ecapa_loss=0.0001523, whisper_loss=0.09973, over 17266.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01052, ecapa_loss=0.0001498, whisper_loss=0.09008, over 3860661.26 frames. ], batch size: 63, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:48:27,132 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2772420.0, ans=0.0 2024-08-14 17:48:41,198 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2772520.0, ans=0.0 2024-08-14 17:48:48,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2772520.0, ans=0.0 2024-08-14 17:48:50,018 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 18 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-14 17:48:50,412 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=15.0 2024-08-14 17:48:53,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2772620.0, ans=0.2 2024-08-14 17:48:54,134 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.844e+01 2.272e+01 2.538e+01 2.800e+01 8.979e+01, threshold=5.075e+01, percent-clipped=2.0 2024-08-14 17:48:57,688 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2772620.0, ans=0.125 2024-08-14 17:49:12,810 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 27 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-14 17:49:21,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2772720.0, ans=0.125 2024-08-14 17:49:37,921 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 1950, loss[loss=0.1015, beats_loss=0.01012, ecapa_loss=0.0001906, whisper_loss=0.08947, over 21721.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0105, ecapa_loss=0.0001499, whisper_loss=0.09014, over 3883676.59 frames. ], batch size: 91, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:49:46,281 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-14 17:49:54,210 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-14 17:50:11,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2773120.0, ans=0.125 2024-08-14 17:50:11,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2773120.0, ans=0.125 2024-08-14 17:50:18,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2773120.0, ans=0.0 2024-08-14 17:50:25,243 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 18 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-14 17:50:31,730 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 30 from LS+wenet, 13 from Vox, 40 fro AS 2024-08-14 17:50:44,575 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2773320.0, ans=0.1 2024-08-14 17:50:45,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2773320.0, ans=0.125 2024-08-14 17:50:51,963 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2773320.0, ans=0.125 2024-08-14 17:50:56,290 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 2000, loss[loss=0.09814, beats_loss=0.01121, ecapa_loss=0.0001099, whisper_loss=0.08583, over 17339.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0105, ecapa_loss=0.0001499, whisper_loss=0.09021, over 3879755.38 frames. ], batch size: 66, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:50:56,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2773420.0, ans=0.0 2024-08-14 17:51:04,245 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 23 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-14 17:51:29,331 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.744e+01 2.380e+01 2.636e+01 2.886e+01 1.186e+02, threshold=5.271e+01, percent-clipped=1.0 2024-08-14 17:51:45,359 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 33 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-14 17:51:46,244 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.79 vs. limit=5.0 2024-08-14 17:52:00,379 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.75 vs. limit=12.0 2024-08-14 17:52:07,155 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 31 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-14 17:52:10,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=2773820.0, ans=0.95 2024-08-14 17:52:14,402 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 2050, loss[loss=0.09081, beats_loss=0.01182, ecapa_loss=0.0001485, whisper_loss=0.0775, over 19630.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01051, ecapa_loss=0.0001512, whisper_loss=0.09039, over 3862732.65 frames. ], batch size: 76, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:52:29,225 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2774020.0, ans=0.0 2024-08-14 17:52:53,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2774120.0, ans=0.125 2024-08-14 17:53:03,861 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 25 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-14 17:53:09,879 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 21 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-14 17:53:13,275 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.68 vs. limit=12.0 2024-08-14 17:53:20,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2774320.0, ans=0.125 2024-08-14 17:53:30,565 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 2100, loss[loss=0.1041, beats_loss=0.01093, ecapa_loss=0.0001425, whisper_loss=0.09179, over 18178.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0106, ecapa_loss=0.0001496, whisper_loss=0.09011, over 3850072.17 frames. ], batch size: 73, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:53:31,478 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.47 vs. limit=15.0 2024-08-14 17:53:32,507 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 32 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-14 17:53:51,043 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 17:53:53,899 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2774520.0, ans=0.0 2024-08-14 17:53:53,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2774520.0, ans=0.0 2024-08-14 17:53:57,255 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.99 vs. limit=15.0 2024-08-14 17:53:58,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2774520.0, ans=0.0 2024-08-14 17:54:03,472 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.339e+01 2.575e+01 2.832e+01 4.254e+01, threshold=5.150e+01, percent-clipped=0.0 2024-08-14 17:54:14,751 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 22 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-14 17:54:31,591 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-14 17:54:41,284 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 18 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-14 17:54:41,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2774820.0, ans=0.2 2024-08-14 17:54:48,973 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 2150, loss[loss=0.1037, beats_loss=0.01202, ecapa_loss=0.0001335, whisper_loss=0.09029, over 22393.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01062, ecapa_loss=0.0001497, whisper_loss=0.0899, over 3842333.87 frames. ], batch size: 89, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:55:33,539 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2775220.0, ans=0.125 2024-08-14 17:55:33,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2775220.0, ans=0.95 2024-08-14 17:55:54,354 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2775320.0, ans=0.125 2024-08-14 17:56:06,075 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 2200, loss[loss=0.09441, beats_loss=0.01296, ecapa_loss=0.0001374, whisper_loss=0.08008, over 16217.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01063, ecapa_loss=0.0001499, whisper_loss=0.09023, over 3850270.55 frames. ], batch size: 63, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:56:12,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2775420.0, ans=0.07 2024-08-14 17:56:22,438 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2775520.0, ans=0.0 2024-08-14 17:56:36,798 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.971e+01 2.388e+01 2.686e+01 3.163e+01 6.240e+01, threshold=5.371e+01, percent-clipped=1.0 2024-08-14 17:56:49,033 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 22 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-14 17:56:56,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2775720.0, ans=0.2 2024-08-14 17:56:56,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2775720.0, ans=0.0 2024-08-14 17:57:01,598 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 30 from LS+wenet, 29 from Vox, 22 fro AS 2024-08-14 17:57:08,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2775820.0, ans=0.125 2024-08-14 17:57:14,343 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 23 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-14 17:57:21,388 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 2250, loss[loss=0.1018, beats_loss=0.0114, ecapa_loss=0.0001565, whisper_loss=0.08888, over 20440.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01058, ecapa_loss=0.0001506, whisper_loss=0.09124, over 3862625.32 frames. ], batch size: 82, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:57:24,899 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2775920.0, ans=0.0 2024-08-14 17:57:36,540 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2776020.0, ans=0.125 2024-08-14 17:57:39,838 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 17:57:41,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2776020.0, ans=0.125 2024-08-14 17:57:42,449 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2776020.0, ans=0.125 2024-08-14 17:57:45,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2776020.0, ans=0.125 2024-08-14 17:57:47,625 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2776020.0, ans=0.125 2024-08-14 17:57:55,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2776120.0, ans=0.125 2024-08-14 17:57:59,021 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 15 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-14 17:58:06,215 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 27 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-14 17:58:09,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2776220.0, ans=0.125 2024-08-14 17:58:40,934 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 2300, loss[loss=0.0913, beats_loss=0.01092, ecapa_loss=0.000138, whisper_loss=0.079, over 22689.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01063, ecapa_loss=0.0001505, whisper_loss=0.09079, over 3876835.22 frames. ], batch size: 91, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:58:49,497 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.50 vs. limit=15.0 2024-08-14 17:58:59,037 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 19 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-14 17:59:12,861 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.946e+01 2.401e+01 2.652e+01 3.055e+01 1.168e+02, threshold=5.304e+01, percent-clipped=4.0 2024-08-14 17:59:19,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2776620.0, ans=0.2 2024-08-14 17:59:34,325 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 28 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-14 17:59:40,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2776820.0, ans=0.125 2024-08-14 17:59:42,201 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2776820.0, ans=0.1 2024-08-14 17:59:52,875 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-14 17:59:53,129 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2776820.0, ans=0.2 2024-08-14 17:59:57,810 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 2350, loss[loss=0.1148, beats_loss=0.009003, ecapa_loss=0.0001282, whisper_loss=0.1045, over 17929.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01056, ecapa_loss=0.0001504, whisper_loss=0.09118, over 3867723.74 frames. ], batch size: 66, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:00:03,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2776920.0, ans=0.125 2024-08-14 18:00:06,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2776920.0, ans=0.1 2024-08-14 18:00:08,752 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 30 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-14 18:00:10,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2776920.0, ans=0.0 2024-08-14 18:00:24,099 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.70 vs. limit=22.5 2024-08-14 18:00:33,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2777120.0, ans=0.2 2024-08-14 18:00:42,117 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2777120.0, ans=0.125 2024-08-14 18:00:44,994 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 27 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-14 18:00:51,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2777220.0, ans=0.1 2024-08-14 18:00:53,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2777220.0, ans=0.125 2024-08-14 18:00:59,466 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 35 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-14 18:01:08,241 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 19 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-14 18:01:11,278 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2777320.0, ans=0.2 2024-08-14 18:01:19,107 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 2400, loss[loss=0.09325, beats_loss=0.01182, ecapa_loss=8.737e-05, whisper_loss=0.08056, over 21069.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01053, ecapa_loss=0.0001515, whisper_loss=0.09129, over 3883533.71 frames. ], batch size: 77, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:01:28,623 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2777420.0, ans=0.1 2024-08-14 18:01:52,702 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.865e+01 2.341e+01 2.588e+01 3.015e+01 2.629e+02, threshold=5.175e+01, percent-clipped=2.0 2024-08-14 18:02:05,882 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.71 vs. limit=15.0 2024-08-14 18:02:19,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2777720.0, ans=0.125 2024-08-14 18:02:39,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2777820.0, ans=0.0 2024-08-14 18:02:42,129 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 2450, loss[loss=0.08564, beats_loss=0.01229, ecapa_loss=0.0001737, whisper_loss=0.07161, over 14399.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01051, ecapa_loss=0.0001519, whisper_loss=0.0913, over 3850435.39 frames. ], batch size: 61, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:02:49,707 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 16 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-14 18:02:49,981 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2777920.0, ans=0.1 2024-08-14 18:03:04,145 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2778020.0, ans=0.125 2024-08-14 18:03:10,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2778020.0, ans=0.0 2024-08-14 18:03:25,696 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 31 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-14 18:03:28,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2778120.0, ans=0.04949747468305833 2024-08-14 18:03:36,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2778220.0, ans=0.0 2024-08-14 18:03:46,478 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2778320.0, ans=0.1 2024-08-14 18:03:46,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2778320.0, ans=0.0 2024-08-14 18:03:47,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2778320.0, ans=0.0 2024-08-14 18:03:52,982 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 13 from LS+wenet, 10 from Vox, 33 fro AS 2024-08-14 18:04:03,848 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 2500, loss[loss=0.09258, beats_loss=0.01054, ecapa_loss=0.0001466, whisper_loss=0.08058, over 17207.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01057, ecapa_loss=0.0001522, whisper_loss=0.09039, over 3829271.08 frames. ], batch size: 67, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:04:05,768 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 18 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-14 18:04:09,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=2778420.0, ans=0.05 2024-08-14 18:04:11,579 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.75 vs. limit=15.0 2024-08-14 18:04:11,631 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.06 vs. limit=15.0 2024-08-14 18:04:16,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2778420.0, ans=0.125 2024-08-14 18:04:24,158 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 17 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-14 18:04:36,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2778620.0, ans=0.125 2024-08-14 18:04:36,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2778620.0, ans=0.0 2024-08-14 18:04:39,420 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.915e+01 2.393e+01 2.682e+01 2.958e+01 4.919e+01, threshold=5.365e+01, percent-clipped=0.0 2024-08-14 18:04:39,894 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2778620.0, ans=0.125 2024-08-14 18:05:12,605 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 27 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-14 18:05:14,207 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 16 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-14 18:05:24,979 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 2550, loss[loss=0.08393, beats_loss=0.01384, ecapa_loss=0.000131, whisper_loss=0.06878, over 17382.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01062, ecapa_loss=0.0001515, whisper_loss=0.0902, over 3816570.82 frames. ], batch size: 70, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:05:33,965 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 22 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-14 18:05:46,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2779020.0, ans=0.1 2024-08-14 18:05:51,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2779020.0, ans=0.0 2024-08-14 18:05:52,970 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 27 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-14 18:06:05,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2779120.0, ans=0.2 2024-08-14 18:06:10,773 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.81 vs. limit=10.0 2024-08-14 18:06:16,968 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 22 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-14 18:06:35,004 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 22 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-14 18:06:42,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2779320.0, ans=0.2 2024-08-14 18:06:43,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2779320.0, ans=0.2 2024-08-14 18:06:46,533 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 2600, loss[loss=0.07704, beats_loss=0.01013, ecapa_loss=0.0001468, whisper_loss=0.06544, over 16315.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01058, ecapa_loss=0.0001517, whisper_loss=0.09052, over 3840843.38 frames. ], batch size: 64, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:06:46,738 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 24 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-14 18:06:51,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2779420.0, ans=0.125 2024-08-14 18:07:02,218 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-14 18:07:08,493 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2779520.0, ans=0.125 2024-08-14 18:07:11,959 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 18:07:13,194 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-14 18:07:13,672 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2779520.0, ans=0.125 2024-08-14 18:07:15,218 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 24 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-14 18:07:16,717 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-14 18:07:16,910 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2779520.0, ans=0.2 2024-08-14 18:07:21,161 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.268e+01 2.541e+01 2.782e+01 4.582e+01, threshold=5.082e+01, percent-clipped=0.0 2024-08-14 18:07:23,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2779620.0, ans=0.1 2024-08-14 18:07:25,732 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2779620.0, ans=0.125 2024-08-14 18:07:26,895 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 21 from LS+wenet, 8 from Vox, 41 fro AS 2024-08-14 18:07:28,698 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2779620.0, ans=0.0 2024-08-14 18:08:07,908 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 2650, loss[loss=0.07858, beats_loss=0.01088, ecapa_loss=0.00019, whisper_loss=0.06579, over 16529.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01067, ecapa_loss=0.0001522, whisper_loss=0.08972, over 3837713.92 frames. ], batch size: 72, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:08:09,806 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2779920.0, ans=0.125 2024-08-14 18:08:37,616 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2780020.0, ans=0.2 2024-08-14 18:09:09,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2780220.0, ans=0.0 2024-08-14 18:09:10,669 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-14 18:09:25,003 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 17 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-14 18:09:29,936 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 2700, loss[loss=0.1192, beats_loss=0.00922, ecapa_loss=0.0001646, whisper_loss=0.1084, over 21889.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.0107, ecapa_loss=0.0001513, whisper_loss=0.08934, over 3833484.04 frames. ], batch size: 86, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:09:36,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2780420.0, ans=0.125 2024-08-14 18:09:39,086 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 24 from LS+wenet, 11 from Vox, 22 fro AS 2024-08-14 18:09:49,705 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.99 vs. limit=10.0 2024-08-14 18:09:52,974 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.94 vs. limit=22.5 2024-08-14 18:09:55,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2780520.0, ans=0.05 2024-08-14 18:10:03,550 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.995e+01 2.336e+01 2.550e+01 2.927e+01 5.134e+01, threshold=5.101e+01, percent-clipped=1.0 2024-08-14 18:10:04,121 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2780620.0, ans=0.0 2024-08-14 18:10:10,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2780620.0, ans=0.125 2024-08-14 18:10:15,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2780620.0, ans=0.125 2024-08-14 18:10:15,433 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2780620.0, ans=0.125 2024-08-14 18:10:35,769 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2780820.0, ans=0.09899494936611666 2024-08-14 18:10:43,884 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 12 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-14 18:10:52,248 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 2750, loss[loss=0.07719, beats_loss=0.01121, ecapa_loss=0.0001341, whisper_loss=0.06464, over 16702.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01068, ecapa_loss=0.0001519, whisper_loss=0.08948, over 3844078.96 frames. ], batch size: 62, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:10:59,158 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.29 vs. limit=22.5 2024-08-14 18:11:09,177 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-14 18:11:13,184 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.18 vs. limit=15.0 2024-08-14 18:11:16,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2781020.0, ans=0.1 2024-08-14 18:11:25,201 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2781020.0, ans=0.2 2024-08-14 18:11:30,505 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2781120.0, ans=0.125 2024-08-14 18:11:34,530 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2781120.0, ans=0.0 2024-08-14 18:11:35,812 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 23 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-14 18:11:44,854 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2781220.0, ans=0.1 2024-08-14 18:11:49,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2781220.0, ans=0.125 2024-08-14 18:11:49,891 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2781220.0, ans=0.0 2024-08-14 18:11:57,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2781220.0, ans=0.125 2024-08-14 18:12:14,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2781420.0, ans=0.1 2024-08-14 18:12:16,192 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 2800, loss[loss=0.1045, beats_loss=0.009558, ecapa_loss=0.0001406, whisper_loss=0.09351, over 20028.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0106, ecapa_loss=0.0001517, whisper_loss=0.09065, over 3857527.49 frames. ], batch size: 76, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:12:29,813 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2781520.0, ans=0.1 2024-08-14 18:12:39,873 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.03 vs. limit=15.0 2024-08-14 18:12:48,040 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.654e+01 2.377e+01 2.677e+01 2.938e+01 4.458e+01, threshold=5.354e+01, percent-clipped=0.0 2024-08-14 18:12:52,572 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-14 18:12:54,016 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2781620.0, ans=0.0 2024-08-14 18:13:22,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2781820.0, ans=0.125 2024-08-14 18:13:27,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2781820.0, ans=0.04949747468305833 2024-08-14 18:13:27,225 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2781820.0, ans=0.1 2024-08-14 18:13:28,808 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 20 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-14 18:13:33,339 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 2850, loss[loss=0.08855, beats_loss=0.01417, ecapa_loss=0.000116, whisper_loss=0.07321, over 19853.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01066, ecapa_loss=0.0001509, whisper_loss=0.09098, over 3851620.22 frames. ], batch size: 79, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:13:35,205 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2781920.0, ans=0.125 2024-08-14 18:13:35,257 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=9.865e+00 2024-08-14 18:13:38,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2781920.0, ans=0.1 2024-08-14 18:13:39,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2781920.0, ans=0.125 2024-08-14 18:13:41,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=2781920.0, ans=0.95 2024-08-14 18:14:09,103 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 40 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-14 18:14:48,072 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 2900, loss[loss=0.098, beats_loss=0.01135, ecapa_loss=0.0001227, whisper_loss=0.08542, over 14203.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01067, ecapa_loss=0.0001523, whisper_loss=0.09113, over 3870459.07 frames. ], batch size: 55, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:14:58,966 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 22 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-14 18:14:59,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2782420.0, ans=0.0 2024-08-14 18:15:08,856 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 19 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-14 18:15:18,370 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.302e+01 2.501e+01 2.806e+01 3.501e+01, threshold=5.003e+01, percent-clipped=0.0 2024-08-14 18:15:44,525 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2782720.0, ans=0.0 2024-08-14 18:15:58,168 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 24 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-14 18:16:03,391 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 2950, loss[loss=0.0908, beats_loss=0.01289, ecapa_loss=0.0001293, whisper_loss=0.07662, over 17660.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01063, ecapa_loss=0.0001529, whisper_loss=0.09108, over 3883948.18 frames. ], batch size: 68, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:16:25,815 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2783020.0, ans=0.125 2024-08-14 18:16:34,511 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.42 vs. limit=15.0 2024-08-14 18:16:39,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2783120.0, ans=0.125 2024-08-14 18:16:41,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2783120.0, ans=0.0 2024-08-14 18:17:00,721 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 26 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-14 18:17:12,786 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.88 vs. limit=15.0 2024-08-14 18:17:17,545 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.13 vs. limit=22.5 2024-08-14 18:17:18,135 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 3000, loss[loss=0.09878, beats_loss=0.01002, ecapa_loss=0.0001857, whisper_loss=0.0869, over 18230.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0106, ecapa_loss=0.0001537, whisper_loss=0.09092, over 3909088.73 frames. ], batch size: 75, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:17:18,137 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-14 18:17:58,795 INFO [train_multi_KD3.py:1149] (0/4) Epoch 20, validation on ASR_libri: loss=0.2511, beats_loss=0, ecapa_loss=0.0005401, whisper_loss=0.2457, over 922467.00 frames. 2024-08-14 18:18:19,604 INFO [train_multi_KD3.py:1149] (0/4) Epoch 20, validation on SV_voxceleb1: loss=0.004329, beats_loss=0, ecapa_loss=0.0004329, whisper_loss=0, over 939242.00 frames. 2024-08-14 18:20:16,584 INFO [train_multi_KD3.py:1149] (0/4) Epoch 20, validation on AT_audioset: loss=0.02338, beats_loss=0.02338, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 18:20:16,589 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32601MB 2024-08-14 18:20:33,569 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2783520.0, ans=0.125 2024-08-14 18:20:38,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2783520.0, ans=0.0 2024-08-14 18:20:48,256 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.794e+01 2.403e+01 2.631e+01 2.938e+01 2.975e+02, threshold=5.261e+01, percent-clipped=1.0 2024-08-14 18:20:52,431 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 21 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-14 18:21:04,787 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.44 vs. limit=10.0 2024-08-14 18:21:19,697 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 23 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-14 18:21:25,723 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.484e+00 2024-08-14 18:21:30,867 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 3050, loss[loss=0.09262, beats_loss=0.01156, ecapa_loss=0.0001219, whisper_loss=0.07985, over 20080.00 frames. ], tot_loss[loss=0.104, beats_loss=0.0106, ecapa_loss=0.000154, whisper_loss=0.09189, over 3946453.94 frames. ], batch size: 76, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:21:35,821 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2783920.0, ans=0.125 2024-08-14 18:22:07,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2784120.0, ans=0.1 2024-08-14 18:22:16,625 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.38 vs. limit=15.0 2024-08-14 18:22:25,735 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 30 from Vox, 28 fro AS 2024-08-14 18:22:27,297 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2784220.0, ans=0.0 2024-08-14 18:22:35,773 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2784320.0, ans=10.0 2024-08-14 18:22:37,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2784320.0, ans=0.125 2024-08-14 18:22:43,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2784320.0, ans=0.125 2024-08-14 18:22:46,334 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 3100, loss[loss=0.1065, beats_loss=0.01069, ecapa_loss=0.0001522, whisper_loss=0.0943, over 16132.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01058, ecapa_loss=0.0001549, whisper_loss=0.09176, over 3926207.56 frames. ], batch size: 66, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:22:48,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2784420.0, ans=0.95 2024-08-14 18:22:58,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2784420.0, ans=0.125 2024-08-14 18:23:04,953 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2784520.0, ans=0.125 2024-08-14 18:23:13,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2784620.0, ans=0.125 2024-08-14 18:23:14,938 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 22 from LS+wenet, 36 from Vox, 36 fro AS 2024-08-14 18:23:16,109 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.964e+01 2.365e+01 2.545e+01 2.848e+01 4.706e+01, threshold=5.089e+01, percent-clipped=0.0 2024-08-14 18:23:17,746 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 26 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-14 18:23:24,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2784620.0, ans=0.125 2024-08-14 18:23:28,559 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.84 vs. limit=10.0 2024-08-14 18:23:29,375 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-14 18:23:35,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2784720.0, ans=0.125 2024-08-14 18:23:56,750 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 3150, loss[loss=0.08319, beats_loss=0.0102, ecapa_loss=0.0001654, whisper_loss=0.07133, over 14936.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01059, ecapa_loss=0.0001554, whisper_loss=0.09116, over 3903922.91 frames. ], batch size: 60, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:24:09,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2785020.0, ans=0.2 2024-08-14 18:24:12,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2785020.0, ans=0.0 2024-08-14 18:24:13,971 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2785020.0, ans=0.1 2024-08-14 18:24:23,270 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 30 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-14 18:24:23,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2785120.0, ans=10.0 2024-08-14 18:24:49,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2785220.0, ans=0.125 2024-08-14 18:24:50,376 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-14 18:25:06,062 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 3200, loss[loss=0.09043, beats_loss=0.01153, ecapa_loss=0.0001577, whisper_loss=0.07732, over 21498.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01068, ecapa_loss=0.0001555, whisper_loss=0.09154, over 3909992.89 frames. ], batch size: 87, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:25:10,474 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 27 from LS+wenet, 12 from Vox, 38 fro AS 2024-08-14 18:25:11,729 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 29 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-14 18:25:12,195 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.22 vs. limit=22.5 2024-08-14 18:25:20,422 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2785520.0, ans=0.0 2024-08-14 18:25:27,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2785520.0, ans=0.125 2024-08-14 18:25:29,932 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2785520.0, ans=0.125 2024-08-14 18:25:35,319 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.274e+01 2.528e+01 2.834e+01 7.598e+01, threshold=5.056e+01, percent-clipped=2.0 2024-08-14 18:25:43,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2785620.0, ans=0.2 2024-08-14 18:25:48,149 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 27 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-14 18:25:54,753 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 28 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-14 18:25:55,314 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.72 vs. limit=15.0 2024-08-14 18:26:04,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2785820.0, ans=0.125 2024-08-14 18:26:07,561 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.30 vs. limit=22.5 2024-08-14 18:26:15,108 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 3250, loss[loss=0.1041, beats_loss=0.01246, ecapa_loss=0.0001279, whisper_loss=0.09037, over 20772.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01069, ecapa_loss=0.0001558, whisper_loss=0.09227, over 3907129.33 frames. ], batch size: 82, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:26:16,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2785920.0, ans=0.125 2024-08-14 18:26:19,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2785920.0, ans=0.125 2024-08-14 18:26:20,443 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 22 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-14 18:26:26,042 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-14 18:27:05,206 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 23 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-14 18:27:22,324 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 3300, loss[loss=0.09951, beats_loss=0.01021, ecapa_loss=0.0002026, whisper_loss=0.08727, over 18260.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01066, ecapa_loss=0.0001562, whisper_loss=0.09242, over 3917022.13 frames. ], batch size: 75, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:27:28,314 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 24 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-14 18:27:31,327 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-14 18:27:35,876 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2786520.0, ans=0.125 2024-08-14 18:27:51,324 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.770e+01 2.316e+01 2.463e+01 2.771e+01 4.814e+01, threshold=4.926e+01, percent-clipped=0.0 2024-08-14 18:28:22,536 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2786820.0, ans=0.0 2024-08-14 18:28:25,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2786820.0, ans=0.125 2024-08-14 18:28:30,082 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 3350, loss[loss=0.1249, beats_loss=0.008351, ecapa_loss=0.000145, whisper_loss=0.1151, over 22169.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01064, ecapa_loss=0.0001557, whisper_loss=0.09224, over 3941584.86 frames. ], batch size: 83, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:28:37,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2786920.0, ans=0.0 2024-08-14 18:28:50,052 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2787020.0, ans=0.125 2024-08-14 18:28:52,983 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-14 18:28:58,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2787120.0, ans=0.0 2024-08-14 18:29:02,403 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 19 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-14 18:29:16,817 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.41 vs. limit=10.0 2024-08-14 18:29:18,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2787220.0, ans=0.0 2024-08-14 18:29:39,679 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 3400, loss[loss=0.1137, beats_loss=0.009769, ecapa_loss=0.0001407, whisper_loss=0.1025, over 23658.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01055, ecapa_loss=0.0001553, whisper_loss=0.0926, over 3928855.17 frames. ], batch size: 93, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:29:42,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2787420.0, ans=0.125 2024-08-14 18:29:43,005 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2787420.0, ans=0.125 2024-08-14 18:29:49,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2787420.0, ans=0.2 2024-08-14 18:30:07,958 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.775e+01 2.360e+01 2.659e+01 3.040e+01 2.409e+02, threshold=5.318e+01, percent-clipped=1.0 2024-08-14 18:30:24,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2787720.0, ans=0.0 2024-08-14 18:30:34,683 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 27 from LS+wenet, 13 from Vox, 43 fro AS 2024-08-14 18:30:48,134 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 3450, loss[loss=0.1286, beats_loss=0.01001, ecapa_loss=0.0001277, whisper_loss=0.1173, over 24269.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01058, ecapa_loss=0.0001563, whisper_loss=0.09181, over 3908215.35 frames. ], batch size: 89, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:30:53,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2787920.0, ans=0.125 2024-08-14 18:30:56,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2787920.0, ans=0.125 2024-08-14 18:31:07,988 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 32 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-14 18:31:09,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2788020.0, ans=0.0 2024-08-14 18:31:20,079 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 32 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-14 18:31:29,597 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2788220.0, ans=0.0 2024-08-14 18:31:38,001 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2788220.0, ans=0.125 2024-08-14 18:31:46,332 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.20 vs. limit=10.0 2024-08-14 18:31:55,065 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 3500, loss[loss=0.1, beats_loss=0.01107, ecapa_loss=0.0001458, whisper_loss=0.08748, over 22815.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01058, ecapa_loss=0.0001565, whisper_loss=0.09142, over 3874837.33 frames. ], batch size: 93, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:31:58,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2788420.0, ans=0.0 2024-08-14 18:32:09,806 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2788520.0, ans=0.0 2024-08-14 18:32:13,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2788520.0, ans=0.125 2024-08-14 18:32:23,466 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.371e+01 2.585e+01 2.886e+01 6.376e+01, threshold=5.170e+01, percent-clipped=1.0 2024-08-14 18:32:25,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2788620.0, ans=0.2 2024-08-14 18:32:29,686 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.14 vs. limit=22.5 2024-08-14 18:32:41,665 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2788720.0, ans=0.0 2024-08-14 18:32:44,398 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2788720.0, ans=0.0 2024-08-14 18:32:44,479 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2788720.0, ans=0.0 2024-08-14 18:32:45,866 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2788720.0, ans=0.125 2024-08-14 18:32:47,642 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.30 vs. limit=15.0 2024-08-14 18:32:53,705 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 17 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-14 18:32:54,292 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.83 vs. limit=6.0 2024-08-14 18:32:55,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2788820.0, ans=0.0 2024-08-14 18:32:55,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2788820.0, ans=0.125 2024-08-14 18:33:03,407 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 3550, loss[loss=0.09208, beats_loss=0.01159, ecapa_loss=0.0001553, whisper_loss=0.07894, over 18669.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01055, ecapa_loss=0.0001555, whisper_loss=0.09109, over 3879641.57 frames. ], batch size: 75, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:33:14,332 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-14 18:33:17,536 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.39 vs. limit=15.0 2024-08-14 18:33:21,327 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 20 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-14 18:33:43,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2789220.0, ans=0.1 2024-08-14 18:33:43,243 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2789220.0, ans=0.125 2024-08-14 18:34:06,172 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 37 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-14 18:34:11,514 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 3600, loss[loss=0.1202, beats_loss=0.007523, ecapa_loss=0.0001768, whisper_loss=0.1109, over 17967.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01052, ecapa_loss=0.0001558, whisper_loss=0.09109, over 3854910.57 frames. ], batch size: 69, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:34:16,064 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.65 vs. limit=12.0 2024-08-14 18:34:33,498 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 19 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-14 18:34:39,591 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.737e+01 2.323e+01 2.540e+01 2.892e+01 4.287e+01, threshold=5.080e+01, percent-clipped=0.0 2024-08-14 18:34:52,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2789720.0, ans=0.0 2024-08-14 18:35:13,629 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.65 vs. limit=22.5 2024-08-14 18:35:14,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2789820.0, ans=0.1 2024-08-14 18:35:17,160 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2789820.0, ans=0.0 2024-08-14 18:35:19,210 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 3650, loss[loss=0.09093, beats_loss=0.01224, ecapa_loss=0.0001761, whisper_loss=0.07692, over 13807.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0106, ecapa_loss=0.0001541, whisper_loss=0.09088, over 3841963.71 frames. ], batch size: 59, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:36:14,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2790320.0, ans=0.125 2024-08-14 18:36:25,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2790420.0, ans=0.125 2024-08-14 18:36:26,220 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 3700, loss[loss=0.08812, beats_loss=0.01085, ecapa_loss=0.0001266, whisper_loss=0.07601, over 14603.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01065, ecapa_loss=0.0001537, whisper_loss=0.0906, over 3824821.02 frames. ], batch size: 58, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:36:27,973 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2790420.0, ans=0.125 2024-08-14 18:36:31,851 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 29 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-14 18:36:33,186 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 35 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-14 18:36:54,538 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.766e+01 2.282e+01 2.542e+01 2.895e+01 4.405e+01, threshold=5.084e+01, percent-clipped=0.0 2024-08-14 18:36:55,619 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.96 vs. limit=15.0 2024-08-14 18:36:59,525 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2790620.0, ans=0.0 2024-08-14 18:37:03,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2790620.0, ans=0.0 2024-08-14 18:37:04,405 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 20 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-14 18:37:32,326 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 39 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-14 18:37:33,787 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 3750, loss[loss=0.1384, beats_loss=0.009518, ecapa_loss=0.0001548, whisper_loss=0.1274, over 23659.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01059, ecapa_loss=0.0001536, whisper_loss=0.0912, over 3865583.62 frames. ], batch size: 90, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:37:40,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2790920.0, ans=0.1 2024-08-14 18:37:47,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2791020.0, ans=0.0 2024-08-14 18:37:58,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2791020.0, ans=0.1 2024-08-14 18:38:06,695 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 29 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-14 18:38:08,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2791120.0, ans=0.2 2024-08-14 18:38:09,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2791120.0, ans=0.0 2024-08-14 18:38:09,939 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.05 vs. limit=15.0 2024-08-14 18:38:13,415 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2791220.0, ans=0.125 2024-08-14 18:38:41,678 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 3800, loss[loss=0.07103, beats_loss=0.01061, ecapa_loss=0.0001964, whisper_loss=0.05846, over 14447.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01072, ecapa_loss=0.0001529, whisper_loss=0.09048, over 3843137.32 frames. ], batch size: 61, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:38:45,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2791420.0, ans=0.2 2024-08-14 18:39:00,146 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 20 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-14 18:39:03,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2791520.0, ans=0.0 2024-08-14 18:39:06,369 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.38 vs. limit=15.0 2024-08-14 18:39:09,493 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.945e+01 2.383e+01 2.672e+01 2.913e+01 4.805e+01, threshold=5.345e+01, percent-clipped=0.0 2024-08-14 18:39:15,281 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2791620.0, ans=0.0 2024-08-14 18:39:36,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2791820.0, ans=0.025 2024-08-14 18:39:48,313 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 3850, loss[loss=0.09425, beats_loss=0.01062, ecapa_loss=0.0001681, whisper_loss=0.08195, over 20740.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01075, ecapa_loss=0.0001527, whisper_loss=0.09072, over 3879779.43 frames. ], batch size: 84, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:39:55,278 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 36 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-14 18:40:04,304 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-14 18:40:24,431 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.83 vs. limit=15.0 2024-08-14 18:40:30,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2792220.0, ans=0.0 2024-08-14 18:40:30,780 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.94 vs. limit=10.0 2024-08-14 18:40:38,166 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-14 18:40:42,473 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 40 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-14 18:40:48,032 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 23 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-14 18:40:55,869 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 3900, loss[loss=0.06977, beats_loss=0.01164, ecapa_loss=0.0001558, whisper_loss=0.05657, over 14198.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01071, ecapa_loss=0.0001545, whisper_loss=0.09071, over 3891253.90 frames. ], batch size: 59, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:41:00,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2792420.0, ans=0.125 2024-08-14 18:41:16,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2792520.0, ans=0.125 2024-08-14 18:41:17,086 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.37 vs. limit=12.0 2024-08-14 18:41:17,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2792520.0, ans=0.125 2024-08-14 18:41:24,887 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.416e+01 2.719e+01 3.088e+01 3.540e+02, threshold=5.437e+01, percent-clipped=1.0 2024-08-14 18:41:28,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2792620.0, ans=0.0 2024-08-14 18:41:48,122 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 22 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-14 18:41:54,906 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2792820.0, ans=0.125 2024-08-14 18:42:03,903 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 3950, loss[loss=0.08255, beats_loss=0.01289, ecapa_loss=0.0001475, whisper_loss=0.06819, over 22049.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01071, ecapa_loss=0.0001544, whisper_loss=0.09056, over 3914184.11 frames. ], batch size: 92, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:42:04,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2792920.0, ans=0.125 2024-08-14 18:42:40,053 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2793120.0, ans=0.1 2024-08-14 18:42:47,115 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 34 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-14 18:42:47,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2793220.0, ans=0.125 2024-08-14 18:42:50,501 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.14 vs. limit=22.5 2024-08-14 18:42:51,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2793220.0, ans=0.07 2024-08-14 18:43:09,425 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-14 18:43:09,720 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2793420.0, ans=0.125 2024-08-14 18:43:10,701 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 4000, loss[loss=0.111, beats_loss=0.01159, ecapa_loss=0.0001179, whisper_loss=0.09819, over 23652.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01063, ecapa_loss=0.000154, whisper_loss=0.09133, over 3925917.30 frames. ], batch size: 90, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:43:11,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2793420.0, ans=0.0 2024-08-14 18:43:12,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2793420.0, ans=0.09899494936611666 2024-08-14 18:43:34,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2793520.0, ans=0.125 2024-08-14 18:43:39,516 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.723e+01 2.386e+01 2.659e+01 3.102e+01 4.594e+01, threshold=5.318e+01, percent-clipped=0.0 2024-08-14 18:43:42,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2793620.0, ans=0.1 2024-08-14 18:43:59,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2793720.0, ans=0.125 2024-08-14 18:44:05,959 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 33 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-14 18:44:16,010 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2793820.0, ans=0.5 2024-08-14 18:44:19,640 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 4050, loss[loss=0.08174, beats_loss=0.01238, ecapa_loss=0.0001302, whisper_loss=0.06806, over 20895.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01062, ecapa_loss=0.0001532, whisper_loss=0.09163, over 3930802.46 frames. ], batch size: 85, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:44:23,075 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.39 vs. limit=15.0 2024-08-14 18:44:40,029 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 20 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-14 18:44:44,310 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 28 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-14 18:44:53,751 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 33 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-14 18:45:00,737 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.01 vs. limit=22.5 2024-08-14 18:45:21,094 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 27 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-14 18:45:27,672 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 4100, loss[loss=0.1052, beats_loss=0.01075, ecapa_loss=0.0001894, whisper_loss=0.09257, over 21696.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01058, ecapa_loss=0.0001537, whisper_loss=0.09205, over 3930426.22 frames. ], batch size: 87, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:45:30,877 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 18 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-14 18:45:35,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2794420.0, ans=0.1 2024-08-14 18:45:46,458 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2794520.0, ans=0.2 2024-08-14 18:45:53,176 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 23 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-14 18:45:57,074 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.054e+01 2.354e+01 2.603e+01 2.918e+01 6.130e+01, threshold=5.207e+01, percent-clipped=1.0 2024-08-14 18:46:06,888 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2794620.0, ans=10.0 2024-08-14 18:46:12,084 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2794720.0, ans=0.125 2024-08-14 18:46:30,546 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 26 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-14 18:46:36,144 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 4150, loss[loss=0.1174, beats_loss=0.009964, ecapa_loss=0.0001542, whisper_loss=0.1059, over 23190.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0106, ecapa_loss=0.0001548, whisper_loss=0.09153, over 3891803.49 frames. ], batch size: 91, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:46:39,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2794920.0, ans=0.0 2024-08-14 18:46:42,113 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.41 vs. limit=15.0 2024-08-14 18:46:51,461 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2795020.0, ans=0.0 2024-08-14 18:46:51,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2795020.0, ans=0.0 2024-08-14 18:47:02,468 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.92 vs. limit=15.0 2024-08-14 18:47:03,283 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2795120.0, ans=0.125 2024-08-14 18:47:08,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2795120.0, ans=0.0 2024-08-14 18:47:10,627 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.71 vs. limit=15.0 2024-08-14 18:47:19,971 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2795220.0, ans=0.0 2024-08-14 18:47:30,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2795320.0, ans=0.2 2024-08-14 18:47:38,745 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2795320.0, ans=0.0 2024-08-14 18:47:44,041 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 4200, loss[loss=0.1053, beats_loss=0.01134, ecapa_loss=0.0001213, whisper_loss=0.09272, over 20591.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01069, ecapa_loss=0.000154, whisper_loss=0.0906, over 3878911.69 frames. ], batch size: 81, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:47:44,223 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-14 18:47:55,782 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.42 vs. limit=15.0 2024-08-14 18:48:12,376 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.893e+01 2.416e+01 2.672e+01 2.930e+01 6.892e+01, threshold=5.345e+01, percent-clipped=1.0 2024-08-14 18:48:25,249 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2795720.0, ans=0.1 2024-08-14 18:48:43,473 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.93 vs. limit=15.0 2024-08-14 18:48:44,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2795820.0, ans=0.125 2024-08-14 18:48:48,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2795820.0, ans=0.125 2024-08-14 18:48:51,157 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 22 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-14 18:48:51,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2795920.0, ans=0.04949747468305833 2024-08-14 18:48:52,308 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 4250, loss[loss=0.112, beats_loss=0.01091, ecapa_loss=0.0001562, whisper_loss=0.09955, over 15189.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01075, ecapa_loss=0.0001537, whisper_loss=0.0904, over 3863207.08 frames. ], batch size: 58, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:48:58,221 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2795920.0, ans=0.125 2024-08-14 18:48:59,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2795920.0, ans=0.2 2024-08-14 18:49:12,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2796020.0, ans=0.125 2024-08-14 18:49:22,648 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 28 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-14 18:49:30,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2796120.0, ans=0.09899494936611666 2024-08-14 18:49:42,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2796220.0, ans=0.0 2024-08-14 18:49:53,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2796320.0, ans=0.0 2024-08-14 18:49:56,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2796320.0, ans=0.125 2024-08-14 18:50:02,756 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 4300, loss[loss=0.1026, beats_loss=0.008046, ecapa_loss=0.0002008, whisper_loss=0.09253, over 15826.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01061, ecapa_loss=0.0001549, whisper_loss=0.09069, over 3849465.51 frames. ], batch size: 66, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:50:15,548 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-14 18:50:21,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=2796520.0, ans=0.5 2024-08-14 18:50:21,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2796520.0, ans=0.0 2024-08-14 18:50:27,863 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 26 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-14 18:50:34,937 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.793e+01 2.393e+01 2.675e+01 3.079e+01 4.317e+01, threshold=5.351e+01, percent-clipped=0.0 2024-08-14 18:50:40,338 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 14 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-14 18:50:43,354 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2796620.0, ans=0.125 2024-08-14 18:50:43,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2796620.0, ans=0.125 2024-08-14 18:50:59,500 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 32 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-14 18:50:59,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2796720.0, ans=0.0 2024-08-14 18:51:02,835 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.86 vs. limit=10.0 2024-08-14 18:51:18,070 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 4350, loss[loss=0.1066, beats_loss=0.01131, ecapa_loss=0.0001397, whisper_loss=0.09386, over 15833.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01062, ecapa_loss=0.0001549, whisper_loss=0.09094, over 3855370.36 frames. ], batch size: 62, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:51:23,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2796920.0, ans=0.125 2024-08-14 18:51:28,364 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.44 vs. limit=15.0 2024-08-14 18:51:33,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2797020.0, ans=0.0 2024-08-14 18:51:57,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2797120.0, ans=0.0 2024-08-14 18:52:28,750 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 14 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-14 18:52:29,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2797320.0, ans=0.1 2024-08-14 18:52:29,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2797320.0, ans=0.125 2024-08-14 18:52:32,677 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 4400, loss[loss=0.1085, beats_loss=0.01094, ecapa_loss=0.0001564, whisper_loss=0.09603, over 21702.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01062, ecapa_loss=0.0001537, whisper_loss=0.09138, over 3866555.71 frames. ], batch size: 89, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:52:33,104 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 31 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-14 18:52:39,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2797420.0, ans=0.2 2024-08-14 18:52:52,604 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.96 vs. limit=12.0 2024-08-14 18:53:04,932 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.379e+01 2.659e+01 2.952e+01 7.187e+01, threshold=5.319e+01, percent-clipped=1.0 2024-08-14 18:53:08,717 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2797620.0, ans=0.1 2024-08-14 18:53:39,823 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2797820.0, ans=0.2 2024-08-14 18:53:41,226 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 22 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-14 18:53:42,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2797820.0, ans=0.0 2024-08-14 18:53:48,938 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 4450, loss[loss=0.1224, beats_loss=0.008603, ecapa_loss=0.0001475, whisper_loss=0.1123, over 16010.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01059, ecapa_loss=0.0001542, whisper_loss=0.09134, over 3876862.44 frames. ], batch size: 61, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:53:50,715 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 23 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-14 18:54:01,594 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2797920.0, ans=0.0 2024-08-14 18:54:04,319 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-14 18:54:07,615 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.282e+01 2024-08-14 18:54:13,018 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.53 vs. limit=15.0 2024-08-14 18:54:26,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2798120.0, ans=0.125 2024-08-14 18:54:34,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2798220.0, ans=0.04949747468305833 2024-08-14 18:54:40,916 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 22 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-14 18:54:56,044 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-14 18:55:01,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2798320.0, ans=0.09899494936611666 2024-08-14 18:55:04,263 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2798320.0, ans=0.0 2024-08-14 18:55:06,959 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 4500, loss[loss=0.1236, beats_loss=0.008475, ecapa_loss=0.000198, whisper_loss=0.1132, over 21623.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0106, ecapa_loss=0.0001553, whisper_loss=0.091, over 3877886.22 frames. ], batch size: 87, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:55:16,195 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-14 18:55:22,822 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2798520.0, ans=0.0 2024-08-14 18:55:42,777 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.761e+01 2.288e+01 2.644e+01 2.918e+01 3.847e+02, threshold=5.287e+01, percent-clipped=3.0 2024-08-14 18:55:49,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2798620.0, ans=0.1 2024-08-14 18:56:26,119 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 4550, loss[loss=0.08448, beats_loss=0.01311, ecapa_loss=0.0001434, whisper_loss=0.06994, over 20937.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01059, ecapa_loss=0.000157, whisper_loss=0.09069, over 3862020.16 frames. ], batch size: 86, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:56:36,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2798920.0, ans=0.125 2024-08-14 18:56:37,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2798920.0, ans=0.1 2024-08-14 18:56:39,490 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2798920.0, ans=0.0 2024-08-14 18:56:50,819 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.49 vs. limit=15.0 2024-08-14 18:56:52,892 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-14 18:56:56,498 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2799120.0, ans=0.1 2024-08-14 18:56:58,263 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2799120.0, ans=0.125 2024-08-14 18:57:01,851 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.742e+05 2024-08-14 18:57:06,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2799120.0, ans=0.125 2024-08-14 18:57:11,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2799120.0, ans=0.125 2024-08-14 18:57:30,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2799320.0, ans=0.0 2024-08-14 18:57:43,386 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 4600, loss[loss=0.07699, beats_loss=0.01326, ecapa_loss=0.0001786, whisper_loss=0.06195, over 21426.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01057, ecapa_loss=0.0001587, whisper_loss=0.09025, over 3841491.05 frames. ], batch size: 93, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:57:48,053 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 23 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-14 18:57:48,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2799420.0, ans=0.2 2024-08-14 18:57:56,866 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 18 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-14 18:58:03,495 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.90 vs. limit=15.0 2024-08-14 18:58:15,989 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.853e+01 2.391e+01 2.667e+01 2.855e+01 4.020e+01, threshold=5.333e+01, percent-clipped=0.0 2024-08-14 18:58:26,754 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.25 vs. limit=15.0 2024-08-14 18:58:32,137 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2799720.0, ans=0.125 2024-08-14 18:58:35,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2799720.0, ans=0.0 2024-08-14 18:58:46,315 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 13 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-14 18:58:50,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2799820.0, ans=0.2 2024-08-14 18:58:58,042 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 4650, loss[loss=0.1021, beats_loss=0.01197, ecapa_loss=0.0001389, whisper_loss=0.08879, over 22714.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01059, ecapa_loss=0.000157, whisper_loss=0.09046, over 3840556.98 frames. ], batch size: 91, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:59:02,653 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 35 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-14 18:59:09,137 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-280000.pt 2024-08-14 18:59:13,915 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-14 18:59:16,585 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 16 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-14 18:59:18,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2800020.0, ans=0.125 2024-08-14 18:59:30,427 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2800120.0, ans=0.1 2024-08-14 18:59:41,473 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-14 18:59:45,906 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2800220.0, ans=0.125 2024-08-14 18:59:57,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2800220.0, ans=0.1 2024-08-14 19:00:14,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2800320.0, ans=0.1 2024-08-14 19:00:17,640 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 4700, loss[loss=0.1108, beats_loss=0.01089, ecapa_loss=0.0001459, whisper_loss=0.09844, over 20361.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01053, ecapa_loss=0.0001568, whisper_loss=0.09143, over 3867271.35 frames. ], batch size: 81, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:00:21,975 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 21 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-14 19:00:24,907 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-14 19:00:26,926 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.14 vs. limit=15.0 2024-08-14 19:00:38,266 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-14 19:00:50,851 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.338e+01 2.588e+01 2.905e+01 3.899e+01, threshold=5.177e+01, percent-clipped=0.0 2024-08-14 19:00:55,690 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-14 19:01:02,058 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2800720.0, ans=0.0 2024-08-14 19:01:12,435 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2800720.0, ans=0.125 2024-08-14 19:01:18,221 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-14 19:01:18,935 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.64 vs. limit=12.0 2024-08-14 19:01:27,254 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 25 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-14 19:01:29,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2800820.0, ans=0.0 2024-08-14 19:01:30,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2800820.0, ans=0.2 2024-08-14 19:01:31,170 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.14 vs. limit=22.5 2024-08-14 19:01:33,179 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 4750, loss[loss=0.1138, beats_loss=0.01164, ecapa_loss=0.0001357, whisper_loss=0.1008, over 23073.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01066, ecapa_loss=0.0001545, whisper_loss=0.09117, over 3899357.99 frames. ], batch size: 93, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:01:35,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2800920.0, ans=0.0 2024-08-14 19:02:14,520 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 22 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-14 19:02:17,575 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2801220.0, ans=0.0 2024-08-14 19:02:23,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2801220.0, ans=0.125 2024-08-14 19:02:33,046 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2801320.0, ans=0.0 2024-08-14 19:02:41,932 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 23 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-14 19:02:47,803 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 4800, loss[loss=0.08206, beats_loss=0.01245, ecapa_loss=0.0001386, whisper_loss=0.06823, over 17262.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01076, ecapa_loss=0.000155, whisper_loss=0.09067, over 3927769.28 frames. ], batch size: 68, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:03:04,423 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 24 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-14 19:03:13,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2801520.0, ans=0.125 2024-08-14 19:03:19,731 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.31 vs. limit=15.0 2024-08-14 19:03:20,141 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.921e+01 2.344e+01 2.546e+01 2.876e+01 4.578e+02, threshold=5.092e+01, percent-clipped=1.0 2024-08-14 19:03:30,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2801720.0, ans=0.0 2024-08-14 19:03:36,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2801720.0, ans=0.125 2024-08-14 19:03:45,426 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 20 from LS+wenet, 20 from Vox, 13 fro AS 2024-08-14 19:03:45,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2801820.0, ans=0.0 2024-08-14 19:04:01,156 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 4850, loss[loss=0.07905, beats_loss=0.01454, ecapa_loss=0.0001239, whisper_loss=0.06327, over 18641.00 frames. ], tot_loss[loss=0.104, beats_loss=0.0107, ecapa_loss=0.0001553, whisper_loss=0.09171, over 3943376.53 frames. ], batch size: 76, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:04:07,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2801920.0, ans=0.125 2024-08-14 19:04:34,437 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.91 vs. limit=15.0 2024-08-14 19:04:47,367 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2802220.0, ans=0.1 2024-08-14 19:04:48,599 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2802220.0, ans=0.125 2024-08-14 19:04:51,652 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-14 19:04:56,873 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 23 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-14 19:05:12,672 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2802320.0, ans=0.125 2024-08-14 19:05:17,501 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 4900, loss[loss=0.1256, beats_loss=0.01042, ecapa_loss=0.000127, whisper_loss=0.1139, over 24676.00 frames. ], tot_loss[loss=0.104, beats_loss=0.0107, ecapa_loss=0.0001551, whisper_loss=0.09173, over 3923609.45 frames. ], batch size: 94, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:05:35,460 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 31 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-14 19:05:49,649 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.57 vs. limit=15.0 2024-08-14 19:05:51,103 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 31 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-14 19:05:52,201 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.355e+01 2.636e+01 2.883e+01 6.029e+01, threshold=5.271e+01, percent-clipped=1.0 2024-08-14 19:06:07,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2802720.0, ans=0.125 2024-08-14 19:06:17,135 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.66 vs. limit=15.0 2024-08-14 19:06:38,433 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 4950, loss[loss=0.09772, beats_loss=0.0125, ecapa_loss=0.0001236, whisper_loss=0.08398, over 23169.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01077, ecapa_loss=0.000154, whisper_loss=0.09123, over 3921453.53 frames. ], batch size: 90, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:06:39,071 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2802920.0, ans=0.1 2024-08-14 19:06:41,337 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.02 vs. limit=10.0 2024-08-14 19:06:53,039 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 22 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-14 19:07:14,637 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-14 19:07:29,270 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2803220.0, ans=0.125 2024-08-14 19:07:39,079 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.44 vs. limit=8.0 2024-08-14 19:07:46,350 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2024-08-14 19:07:54,044 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 5000, loss[loss=0.1242, beats_loss=0.009825, ecapa_loss=0.0001607, whisper_loss=0.1128, over 21875.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01079, ecapa_loss=0.0001545, whisper_loss=0.09113, over 3897513.31 frames. ], batch size: 84, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:08:05,853 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 18 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-14 19:08:07,116 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-14 19:08:08,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2803520.0, ans=0.0 2024-08-14 19:08:10,082 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 36 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-14 19:08:25,939 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.838e+01 2.350e+01 2.620e+01 2.995e+01 1.741e+02, threshold=5.241e+01, percent-clipped=2.0 2024-08-14 19:08:30,834 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 23 from LS+wenet, 20 from Vox, 50 fro AS 2024-08-14 19:08:32,417 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 24 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-14 19:08:49,488 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-14 19:08:52,235 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2803820.0, ans=0.0 2024-08-14 19:08:53,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2803820.0, ans=0.09899494936611666 2024-08-14 19:09:03,393 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 21 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-14 19:09:06,238 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 5050, loss[loss=0.1081, beats_loss=0.01137, ecapa_loss=0.0001407, whisper_loss=0.09529, over 18969.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01084, ecapa_loss=0.0001547, whisper_loss=0.09118, over 3907109.08 frames. ], batch size: 74, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:09:17,024 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 33 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-14 19:09:20,068 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 26 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-14 19:09:22,589 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 24 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-14 19:09:45,896 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-14 19:09:58,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2804220.0, ans=0.0 2024-08-14 19:10:01,078 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 29 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-14 19:10:16,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2804320.0, ans=0.125 2024-08-14 19:10:19,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2804420.0, ans=0.125 2024-08-14 19:10:21,044 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 5100, loss[loss=0.1098, beats_loss=0.01125, ecapa_loss=0.0001366, whisper_loss=0.09715, over 23168.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01085, ecapa_loss=0.0001533, whisper_loss=0.09097, over 3908924.27 frames. ], batch size: 90, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:10:21,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2804420.0, ans=0.125 2024-08-14 19:10:37,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2804520.0, ans=0.125 2024-08-14 19:10:42,696 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.89 vs. limit=15.0 2024-08-14 19:10:49,204 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2804520.0, ans=0.125 2024-08-14 19:10:56,831 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.976e+01 2.368e+01 2.597e+01 2.934e+01 4.134e+01, threshold=5.194e+01, percent-clipped=0.0 2024-08-14 19:11:03,891 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 20 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-14 19:11:27,911 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2804820.0, ans=0.125 2024-08-14 19:11:30,445 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 21 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-14 19:11:32,880 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.24 vs. limit=22.5 2024-08-14 19:11:39,667 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 19:11:40,534 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 5150, loss[loss=0.1172, beats_loss=0.01116, ecapa_loss=0.0001856, whisper_loss=0.1041, over 17300.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01084, ecapa_loss=0.0001533, whisper_loss=0.09075, over 3904061.69 frames. ], batch size: 68, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:11:41,410 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.01 vs. limit=15.0 2024-08-14 19:11:43,022 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.35 vs. limit=6.0 2024-08-14 19:12:04,628 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2805020.0, ans=0.1 2024-08-14 19:12:18,218 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.94 vs. limit=12.0 2024-08-14 19:12:19,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2805120.0, ans=0.125 2024-08-14 19:12:23,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2805220.0, ans=0.05 2024-08-14 19:12:29,356 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 25 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-14 19:12:34,408 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2805220.0, ans=0.0 2024-08-14 19:12:38,496 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 19 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-14 19:12:52,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2805320.0, ans=0.125 2024-08-14 19:12:54,906 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 5200, loss[loss=0.1032, beats_loss=0.009252, ecapa_loss=0.0001351, whisper_loss=0.09263, over 16483.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01074, ecapa_loss=0.0001522, whisper_loss=0.09159, over 3909536.13 frames. ], batch size: 60, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:13:01,205 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2805420.0, ans=0.0 2024-08-14 19:13:03,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2805420.0, ans=0.1 2024-08-14 19:13:14,544 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.93 vs. limit=15.0 2024-08-14 19:13:28,242 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.784e+01 2.358e+01 2.582e+01 2.808e+01 4.877e+01, threshold=5.164e+01, percent-clipped=0.0 2024-08-14 19:13:53,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2805720.0, ans=0.2 2024-08-14 19:14:10,338 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 5250, loss[loss=0.105, beats_loss=0.01052, ecapa_loss=0.0001573, whisper_loss=0.09295, over 15175.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01076, ecapa_loss=0.0001533, whisper_loss=0.09066, over 3873977.98 frames. ], batch size: 61, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:14:13,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2805920.0, ans=0.125 2024-08-14 19:14:14,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2805920.0, ans=0.0 2024-08-14 19:14:29,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2806020.0, ans=0.0 2024-08-14 19:14:43,010 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 28 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-14 19:15:21,745 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 22 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-14 19:15:22,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2806320.0, ans=0.125 2024-08-14 19:15:27,935 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 5300, loss[loss=0.1088, beats_loss=0.009178, ecapa_loss=0.0001569, whisper_loss=0.09807, over 21226.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0107, ecapa_loss=0.0001541, whisper_loss=0.09103, over 3882189.11 frames. ], batch size: 84, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:15:29,054 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.93 vs. limit=15.0 2024-08-14 19:15:30,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2806420.0, ans=0.1 2024-08-14 19:15:52,177 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 19:15:52,438 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.42 vs. limit=15.0 2024-08-14 19:16:02,015 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.957e+01 2.268e+01 2.456e+01 2.845e+01 4.034e+01, threshold=4.912e+01, percent-clipped=0.0 2024-08-14 19:16:14,509 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 25 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-14 19:16:31,862 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2806820.0, ans=0.0 2024-08-14 19:16:36,785 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 38 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-14 19:16:40,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2806820.0, ans=0.125 2024-08-14 19:16:45,786 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 5350, loss[loss=0.09935, beats_loss=0.009317, ecapa_loss=0.0001504, whisper_loss=0.08853, over 14356.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.0107, ecapa_loss=0.0001533, whisper_loss=0.09126, over 3867962.89 frames. ], batch size: 53, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:17:27,193 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 34 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-14 19:17:43,804 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-14 19:18:00,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2807320.0, ans=0.1 2024-08-14 19:18:04,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2807320.0, ans=0.125 2024-08-14 19:18:13,089 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 5400, loss[loss=0.1032, beats_loss=0.01009, ecapa_loss=0.0001758, whisper_loss=0.09132, over 22358.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.0107, ecapa_loss=0.0001527, whisper_loss=0.09129, over 3893789.70 frames. ], batch size: 90, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:18:22,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2807420.0, ans=0.0 2024-08-14 19:18:28,290 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-14 19:18:29,846 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 26 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-14 19:18:50,156 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.977e+01 2.371e+01 2.761e+01 3.113e+01 5.866e+01, threshold=5.523e+01, percent-clipped=1.0 2024-08-14 19:18:50,672 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2807620.0, ans=0.125 2024-08-14 19:19:04,874 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 21 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-14 19:19:34,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2807820.0, ans=0.125 2024-08-14 19:19:43,241 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 5450, loss[loss=0.1092, beats_loss=0.009053, ecapa_loss=0.0001905, whisper_loss=0.09825, over 21291.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01065, ecapa_loss=0.0001531, whisper_loss=0.09175, over 3871924.42 frames. ], batch size: 88, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:20:20,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2808020.0, ans=0.125 2024-08-14 19:20:24,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2808120.0, ans=0.125 2024-08-14 19:20:30,970 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.65 vs. limit=6.0 2024-08-14 19:20:34,742 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2024-08-14 19:20:52,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2808220.0, ans=0.1 2024-08-14 19:20:56,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2808220.0, ans=0.125 2024-08-14 19:21:00,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2808220.0, ans=0.125 2024-08-14 19:21:10,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2808320.0, ans=0.125 2024-08-14 19:21:14,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2808320.0, ans=0.2 2024-08-14 19:21:18,338 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-14 19:21:23,766 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 5500, loss[loss=0.09114, beats_loss=0.01088, ecapa_loss=0.0001604, whisper_loss=0.07866, over 22063.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0107, ecapa_loss=0.0001538, whisper_loss=0.09132, over 3907556.79 frames. ], batch size: 90, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:21:50,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2808520.0, ans=0.0 2024-08-14 19:22:03,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=2808620.0, ans=22.5 2024-08-14 19:22:05,266 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2808620.0, ans=0.0 2024-08-14 19:22:05,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2808620.0, ans=0.125 2024-08-14 19:22:06,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2808620.0, ans=0.125 2024-08-14 19:22:09,216 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.011e+01 2.424e+01 2.779e+01 3.099e+01 3.330e+02, threshold=5.557e+01, percent-clipped=2.0 2024-08-14 19:22:35,937 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2808720.0, ans=0.0 2024-08-14 19:22:37,840 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2808720.0, ans=0.0 2024-08-14 19:22:52,221 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=2808820.0, ans=0.02 2024-08-14 19:23:08,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2808820.0, ans=0.125 2024-08-14 19:23:11,639 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 5550, loss[loss=0.1114, beats_loss=0.009991, ecapa_loss=0.0001431, whisper_loss=0.1, over 22841.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01071, ecapa_loss=0.0001533, whisper_loss=0.09134, over 3921974.17 frames. ], batch size: 90, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:23:20,588 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 23 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-14 19:23:34,955 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 19 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-14 19:23:41,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2809020.0, ans=0.125 2024-08-14 19:23:44,073 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 33 from Vox, 31 fro AS 2024-08-14 19:23:51,949 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-14 19:23:53,569 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 23 from LS+wenet, 32 from Vox, 39 fro AS 2024-08-14 19:24:06,255 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 22 from LS+wenet, 29 from Vox, 21 fro AS 2024-08-14 19:24:29,556 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 37 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-14 19:24:51,575 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-14 19:24:52,701 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 5600, loss[loss=0.1071, beats_loss=0.01112, ecapa_loss=0.0001309, whisper_loss=0.0947, over 23177.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01072, ecapa_loss=0.0001529, whisper_loss=0.09173, over 3947247.67 frames. ], batch size: 92, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:25:06,376 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2809520.0, ans=0.95 2024-08-14 19:25:24,319 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.944e+01 2.292e+01 2.694e+01 2.993e+01 3.874e+01, threshold=5.387e+01, percent-clipped=0.0 2024-08-14 19:25:29,064 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-14 19:25:40,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2809720.0, ans=0.0 2024-08-14 19:26:03,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2809920.0, ans=0.0 2024-08-14 19:26:04,201 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.42 vs. limit=15.0 2024-08-14 19:26:04,762 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 5650, loss[loss=0.09272, beats_loss=0.01048, ecapa_loss=0.0001836, whisper_loss=0.0804, over 17240.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01061, ecapa_loss=0.0001529, whisper_loss=0.09224, over 3953700.70 frames. ], batch size: 75, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:26:17,358 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2809920.0, ans=0.0 2024-08-14 19:26:21,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2810020.0, ans=0.05 2024-08-14 19:26:33,304 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2810120.0, ans=0.125 2024-08-14 19:26:36,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2810120.0, ans=0.2 2024-08-14 19:26:42,903 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2810120.0, ans=0.0 2024-08-14 19:26:50,001 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 21 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-14 19:26:54,257 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-14 19:27:00,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2810220.0, ans=0.125 2024-08-14 19:27:08,204 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 19:27:19,467 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 5700, loss[loss=0.1125, beats_loss=0.01065, ecapa_loss=0.0001569, whisper_loss=0.1003, over 21102.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.0106, ecapa_loss=0.0001542, whisper_loss=0.09231, over 3961680.64 frames. ], batch size: 87, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:27:26,358 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2810420.0, ans=0.125 2024-08-14 19:27:31,061 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.75 vs. limit=15.0 2024-08-14 19:27:50,923 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2810620.0, ans=0.1 2024-08-14 19:27:51,796 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.948e+01 2.307e+01 2.514e+01 2.816e+01 4.087e+01, threshold=5.028e+01, percent-clipped=0.0 2024-08-14 19:28:02,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2810720.0, ans=0.125 2024-08-14 19:28:08,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2810720.0, ans=0.05 2024-08-14 19:28:14,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2810720.0, ans=0.125 2024-08-14 19:28:15,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2810720.0, ans=0.05 2024-08-14 19:28:32,842 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 5750, loss[loss=0.128, beats_loss=0.007857, ecapa_loss=0.0001673, whisper_loss=0.1185, over 14197.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01053, ecapa_loss=0.0001535, whisper_loss=0.09238, over 3928720.69 frames. ], batch size: 56, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:28:49,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2811020.0, ans=0.0 2024-08-14 19:28:55,035 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.59 vs. limit=15.0 2024-08-14 19:28:59,826 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2811020.0, ans=0.1 2024-08-14 19:29:15,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2811120.0, ans=0.2 2024-08-14 19:29:21,086 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 14 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-14 19:29:25,821 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2811220.0, ans=0.125 2024-08-14 19:29:36,029 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-14 19:29:41,293 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.14 vs. limit=15.0 2024-08-14 19:29:49,366 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 5800, loss[loss=0.1008, beats_loss=0.01055, ecapa_loss=0.0001475, whisper_loss=0.08878, over 18595.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.0105, ecapa_loss=0.0001535, whisper_loss=0.09228, over 3905462.17 frames. ], batch size: 75, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:29:51,901 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2811420.0, ans=0.2 2024-08-14 19:30:02,156 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 23 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-14 19:30:22,216 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.769e+01 2.246e+01 2.501e+01 2.765e+01 4.187e+01, threshold=5.003e+01, percent-clipped=0.0 2024-08-14 19:30:28,116 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 27 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-14 19:30:36,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2811720.0, ans=0.125 2024-08-14 19:30:38,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2811720.0, ans=0.125 2024-08-14 19:30:57,323 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 36 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-14 19:30:57,537 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2811820.0, ans=0.0 2024-08-14 19:31:03,318 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 5850, loss[loss=0.101, beats_loss=0.01372, ecapa_loss=0.0001166, whisper_loss=0.08615, over 22663.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01057, ecapa_loss=0.0001537, whisper_loss=0.09192, over 3922765.54 frames. ], batch size: 90, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:31:22,469 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 20 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-14 19:31:25,485 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 26 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-14 19:31:37,427 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2812120.0, ans=0.0 2024-08-14 19:31:38,644 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 31 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-14 19:31:42,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2812120.0, ans=0.0 2024-08-14 19:31:49,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2812220.0, ans=0.125 2024-08-14 19:31:53,735 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2812220.0, ans=0.0 2024-08-14 19:32:16,369 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 5900, loss[loss=0.09335, beats_loss=0.01096, ecapa_loss=0.0001335, whisper_loss=0.08106, over 18684.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01059, ecapa_loss=0.0001531, whisper_loss=0.09199, over 3909265.50 frames. ], batch size: 73, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:32:23,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2812420.0, ans=0.125 2024-08-14 19:32:25,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2812420.0, ans=0.125 2024-08-14 19:32:33,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2812520.0, ans=0.1 2024-08-14 19:32:49,364 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.347e+01 2.667e+01 3.027e+01 4.357e+01, threshold=5.334e+01, percent-clipped=0.0 2024-08-14 19:32:49,749 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-14 19:33:00,818 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 18 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-14 19:33:07,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2812720.0, ans=0.05 2024-08-14 19:33:10,579 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.87 vs. limit=22.5 2024-08-14 19:33:11,420 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2812720.0, ans=0.0 2024-08-14 19:33:24,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2812820.0, ans=0.0 2024-08-14 19:33:26,896 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 13 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-14 19:33:30,016 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2812920.0, ans=0.0 2024-08-14 19:33:30,999 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 5950, loss[loss=0.1102, beats_loss=0.01037, ecapa_loss=0.0001366, whisper_loss=0.09849, over 16232.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01063, ecapa_loss=0.0001537, whisper_loss=0.0913, over 3869425.77 frames. ], batch size: 63, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:33:38,580 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 21 from LS+wenet, 29 from Vox, 45 fro AS 2024-08-14 19:34:05,910 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 23 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-14 19:34:07,404 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 27 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-14 19:34:25,428 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.89 vs. limit=15.0 2024-08-14 19:34:44,117 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-14 19:34:45,148 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 6000, loss[loss=0.1082, beats_loss=0.01022, ecapa_loss=0.0001705, whisper_loss=0.09626, over 21604.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01072, ecapa_loss=0.0001538, whisper_loss=0.09117, over 3876673.80 frames. ], batch size: 91, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:34:45,149 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-14 19:35:23,084 INFO [train_multi_KD3.py:1149] (0/4) Epoch 20, validation on ASR_libri: loss=0.2526, beats_loss=0, ecapa_loss=0.0005442, whisper_loss=0.2472, over 922467.00 frames. 2024-08-14 19:35:42,588 INFO [train_multi_KD3.py:1149] (0/4) Epoch 20, validation on SV_voxceleb1: loss=0.004201, beats_loss=0, ecapa_loss=0.0004201, whisper_loss=0, over 939242.00 frames. 2024-08-14 19:37:36,255 INFO [train_multi_KD3.py:1149] (0/4) Epoch 20, validation on AT_audioset: loss=0.02345, beats_loss=0.02345, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 19:37:36,259 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32601MB 2024-08-14 19:37:40,797 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-14 19:37:50,819 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2813520.0, ans=0.0 2024-08-14 19:37:54,149 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.27 vs. limit=15.0 2024-08-14 19:38:10,590 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.756e+01 2.295e+01 2.518e+01 2.791e+01 2.335e+02, threshold=5.037e+01, percent-clipped=2.0 2024-08-14 19:38:53,072 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 6050, loss[loss=0.1079, beats_loss=0.01057, ecapa_loss=0.0001257, whisper_loss=0.09607, over 18955.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01066, ecapa_loss=0.000155, whisper_loss=0.09185, over 3907560.79 frames. ], batch size: 74, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:39:01,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2813920.0, ans=0.0 2024-08-14 19:39:11,005 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2814020.0, ans=0.125 2024-08-14 19:39:18,198 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2814020.0, ans=0.0 2024-08-14 19:39:21,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2814120.0, ans=0.125 2024-08-14 19:39:52,779 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2814320.0, ans=0.1 2024-08-14 19:40:06,336 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 6100, loss[loss=0.1218, beats_loss=0.006267, ecapa_loss=0.0001822, whisper_loss=0.1137, over 20794.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01067, ecapa_loss=0.0001545, whisper_loss=0.09206, over 3919093.44 frames. ], batch size: 78, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:40:15,562 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 18 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-14 19:40:38,724 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.752e+01 2.270e+01 2.572e+01 2.867e+01 4.147e+01, threshold=5.145e+01, percent-clipped=0.0 2024-08-14 19:40:42,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2814620.0, ans=0.0 2024-08-14 19:40:44,952 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.68 vs. limit=22.5 2024-08-14 19:40:49,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2814720.0, ans=0.2 2024-08-14 19:41:00,681 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 29 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-14 19:41:12,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=2814820.0, ans=0.025 2024-08-14 19:41:19,562 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 6150, loss[loss=0.08499, beats_loss=0.0116, ecapa_loss=0.0001572, whisper_loss=0.07181, over 18738.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01065, ecapa_loss=0.0001554, whisper_loss=0.09167, over 3895142.66 frames. ], batch size: 77, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:41:25,948 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2814920.0, ans=0.0 2024-08-14 19:41:28,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2814920.0, ans=0.125 2024-08-14 19:41:44,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2815020.0, ans=0.0 2024-08-14 19:41:55,649 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 26 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-14 19:42:00,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2815120.0, ans=0.125 2024-08-14 19:42:01,814 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2815120.0, ans=0.2 2024-08-14 19:42:05,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2815220.0, ans=0.0 2024-08-14 19:42:30,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2815320.0, ans=0.125 2024-08-14 19:42:32,965 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 6200, loss[loss=0.09166, beats_loss=0.01005, ecapa_loss=0.000144, whisper_loss=0.08016, over 15179.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01064, ecapa_loss=0.0001543, whisper_loss=0.09195, over 3913951.44 frames. ], batch size: 57, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:42:39,228 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-14 19:42:43,672 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 22 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-14 19:42:43,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2815420.0, ans=0.125 2024-08-14 19:42:45,444 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2815420.0, ans=0.125 2024-08-14 19:43:05,943 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.681e+01 2.332e+01 2.614e+01 2.876e+01 4.461e+01, threshold=5.229e+01, percent-clipped=0.0 2024-08-14 19:43:07,607 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 23 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-14 19:43:18,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2815720.0, ans=0.0 2024-08-14 19:43:28,877 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 34 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-14 19:43:29,143 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2815720.0, ans=0.2 2024-08-14 19:43:48,142 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 6250, loss[loss=0.1163, beats_loss=0.011, ecapa_loss=0.0001646, whisper_loss=0.1037, over 22330.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01071, ecapa_loss=0.0001537, whisper_loss=0.09133, over 3893036.96 frames. ], batch size: 90, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:43:50,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2815920.0, ans=0.2 2024-08-14 19:44:09,745 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 19 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-14 19:44:12,690 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 14 from Vox, 52 fro AS 2024-08-14 19:44:22,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2816120.0, ans=0.0 2024-08-14 19:44:37,042 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.66 vs. limit=15.0 2024-08-14 19:44:48,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2816320.0, ans=0.0 2024-08-14 19:44:51,764 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2816320.0, ans=0.125 2024-08-14 19:45:01,424 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 6300, loss[loss=0.1143, beats_loss=0.01017, ecapa_loss=0.0001421, whisper_loss=0.1027, over 23420.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01072, ecapa_loss=0.0001524, whisper_loss=0.09119, over 3877723.84 frames. ], batch size: 91, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:45:01,634 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 19 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-14 19:45:03,153 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-14 19:45:13,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2816420.0, ans=0.0 2024-08-14 19:45:33,110 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.966e+01 2.242e+01 2.428e+01 2.656e+01 5.822e+01, threshold=4.856e+01, percent-clipped=1.0 2024-08-14 19:45:37,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2816620.0, ans=0.0 2024-08-14 19:45:40,195 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-14 19:45:58,094 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 22 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-14 19:46:13,717 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 6350, loss[loss=0.101, beats_loss=0.01014, ecapa_loss=0.0001783, whisper_loss=0.08911, over 19973.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01066, ecapa_loss=0.0001527, whisper_loss=0.09121, over 3845006.77 frames. ], batch size: 84, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:46:17,143 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2816920.0, ans=0.125 2024-08-14 19:46:36,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2817020.0, ans=0.0 2024-08-14 19:46:42,283 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2817120.0, ans=0.125 2024-08-14 19:46:45,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2817120.0, ans=0.1 2024-08-14 19:46:46,879 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.29 vs. limit=15.0 2024-08-14 19:47:12,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2817320.0, ans=0.125 2024-08-14 19:47:26,015 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 29 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-14 19:47:28,800 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 6400, loss[loss=0.09838, beats_loss=0.007467, ecapa_loss=0.0001698, whisper_loss=0.08922, over 13957.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0107, ecapa_loss=0.0001531, whisper_loss=0.09073, over 3855116.66 frames. ], batch size: 56, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:47:40,770 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2817420.0, ans=0.125 2024-08-14 19:47:48,243 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2817520.0, ans=0.0 2024-08-14 19:48:01,125 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.350e+01 2.618e+01 2.916e+01 9.868e+01, threshold=5.236e+01, percent-clipped=1.0 2024-08-14 19:48:07,598 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2817620.0, ans=0.0 2024-08-14 19:48:14,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2817720.0, ans=0.1 2024-08-14 19:48:19,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2817720.0, ans=0.2 2024-08-14 19:48:25,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2817720.0, ans=0.0 2024-08-14 19:48:29,747 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.30 vs. limit=6.0 2024-08-14 19:48:42,694 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 6450, loss[loss=0.104, beats_loss=0.01054, ecapa_loss=0.0001884, whisper_loss=0.09158, over 17085.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01082, ecapa_loss=0.0001521, whisper_loss=0.09037, over 3894012.14 frames. ], batch size: 72, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:48:44,490 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-14 19:48:45,015 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.31 vs. limit=22.5 2024-08-14 19:48:56,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2818020.0, ans=0.125 2024-08-14 19:48:57,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2818020.0, ans=0.125 2024-08-14 19:49:16,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2818120.0, ans=0.125 2024-08-14 19:49:21,417 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2818120.0, ans=0.0 2024-08-14 19:49:26,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2818120.0, ans=0.2 2024-08-14 19:49:46,559 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 22 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-14 19:49:57,604 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 18 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-14 19:50:00,110 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 6500, loss[loss=0.1087, beats_loss=0.0106, ecapa_loss=0.0001716, whisper_loss=0.09641, over 17102.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01071, ecapa_loss=0.0001541, whisper_loss=0.09093, over 3883949.87 frames. ], batch size: 70, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:50:00,448 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 33 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-14 19:50:02,415 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 19:50:03,405 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 26 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-14 19:50:11,414 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 21 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-14 19:50:24,823 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 18 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-14 19:50:31,009 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 15 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-14 19:50:31,358 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2818620.0, ans=0.125 2024-08-14 19:50:35,118 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.71 vs. limit=15.0 2024-08-14 19:50:35,376 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.395e+01 2.629e+01 2.951e+01 4.669e+01, threshold=5.259e+01, percent-clipped=0.0 2024-08-14 19:50:59,256 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 15 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-14 19:51:16,474 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 6550, loss[loss=0.1312, beats_loss=0.008502, ecapa_loss=0.000146, whisper_loss=0.1213, over 17482.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01072, ecapa_loss=0.0001537, whisper_loss=0.09093, over 3855160.87 frames. ], batch size: 67, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:51:17,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2818920.0, ans=0.125 2024-08-14 19:51:23,175 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-14 19:51:23,538 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2818920.0, ans=0.125 2024-08-14 19:51:34,335 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 32 from Vox, 30 fro AS 2024-08-14 19:51:35,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2819020.0, ans=0.125 2024-08-14 19:51:41,541 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2819020.0, ans=0.2 2024-08-14 19:51:44,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2819020.0, ans=0.125 2024-08-14 19:51:50,955 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.39 vs. limit=22.5 2024-08-14 19:52:18,478 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 21 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-14 19:52:34,296 INFO [train_multi_KD3.py:844] (0/4) A total of 96 cuts. 34 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-14 19:52:36,024 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 6600, loss[loss=0.1077, beats_loss=0.01006, ecapa_loss=0.0001684, whisper_loss=0.09595, over 22873.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01057, ecapa_loss=0.0001555, whisper_loss=0.09176, over 3881404.70 frames. ], batch size: 96, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:52:36,863 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.92 vs. limit=22.5 2024-08-14 19:52:49,652 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-14 19:52:54,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2819520.0, ans=0.0 2024-08-14 19:52:56,164 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-14 19:52:56,841 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.32 vs. limit=15.0 2024-08-14 19:53:06,118 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=4.848e+00 2024-08-14 19:53:10,291 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2819620.0, ans=0.04949747468305833 2024-08-14 19:53:13,112 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.013e+01 2.460e+01 2.689e+01 3.191e+01 5.178e+01, threshold=5.378e+01, percent-clipped=0.0 2024-08-14 19:53:14,846 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 35 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-14 19:53:30,817 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 17 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-14 19:53:55,295 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 6650, loss[loss=0.1188, beats_loss=0.009523, ecapa_loss=0.0001527, whisper_loss=0.1078, over 22993.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01059, ecapa_loss=0.0001556, whisper_loss=0.09199, over 3900553.57 frames. ], batch size: 89, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:53:56,204 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2819920.0, ans=0.0 2024-08-14 19:54:04,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2819920.0, ans=0.2 2024-08-14 19:54:32,204 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 26 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-14 19:54:48,999 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 24 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-14 19:55:15,210 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 6700, loss[loss=0.09326, beats_loss=0.01047, ecapa_loss=0.0001567, whisper_loss=0.08122, over 17296.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.0106, ecapa_loss=0.0001554, whisper_loss=0.09138, over 3905767.76 frames. ], batch size: 71, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:55:27,879 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 23 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-14 19:55:29,658 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.36 vs. limit=15.0 2024-08-14 19:55:48,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2820620.0, ans=0.125 2024-08-14 19:55:50,059 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.753e+01 2.346e+01 2.527e+01 2.810e+01 4.755e+01, threshold=5.054e+01, percent-clipped=0.0 2024-08-14 19:55:53,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2820620.0, ans=0.1 2024-08-14 19:56:22,808 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.30 vs. limit=15.0 2024-08-14 19:56:31,738 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.84 vs. limit=15.0 2024-08-14 19:56:32,304 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 6750, loss[loss=0.09379, beats_loss=0.01209, ecapa_loss=0.0001484, whisper_loss=0.08022, over 21565.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.0106, ecapa_loss=0.000155, whisper_loss=0.09177, over 3947216.76 frames. ], batch size: 91, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:56:55,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2821020.0, ans=0.0 2024-08-14 19:57:01,484 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2821020.0, ans=0.035 2024-08-14 19:57:01,615 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2821020.0, ans=0.0 2024-08-14 19:57:07,443 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2821120.0, ans=0.0 2024-08-14 19:57:07,488 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2821120.0, ans=0.1 2024-08-14 19:57:13,653 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 35 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-14 19:57:29,383 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-14 19:57:30,938 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 28 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-14 19:57:41,209 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2821320.0, ans=0.125 2024-08-14 19:57:50,923 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 6800, loss[loss=0.0925, beats_loss=0.01267, ecapa_loss=0.0001318, whisper_loss=0.07851, over 21861.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01059, ecapa_loss=0.0001557, whisper_loss=0.09214, over 3955955.92 frames. ], batch size: 90, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:58:04,912 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 19:58:07,836 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 19 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-14 19:58:08,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2821520.0, ans=0.125 2024-08-14 19:58:09,676 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 36 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-14 19:58:10,305 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.30 vs. limit=12.0 2024-08-14 19:58:11,136 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 35 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-14 19:58:14,079 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 37 from LS+wenet, 12 from Vox, 40 fro AS 2024-08-14 19:58:27,157 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.777e+01 2.395e+01 2.601e+01 3.094e+01 9.420e+01, threshold=5.202e+01, percent-clipped=3.0 2024-08-14 19:58:38,618 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.59 vs. limit=15.0 2024-08-14 19:58:47,671 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-14 19:58:52,650 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.82 vs. limit=15.0 2024-08-14 19:58:55,280 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 19:59:02,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2821820.0, ans=0.125 2024-08-14 19:59:08,218 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 6850, loss[loss=0.07264, beats_loss=0.0117, ecapa_loss=0.0001963, whisper_loss=0.05898, over 14524.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01062, ecapa_loss=0.0001561, whisper_loss=0.09155, over 3938940.48 frames. ], batch size: 60, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:59:11,259 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 24 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-14 19:59:20,240 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 21 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-14 19:59:29,920 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2822020.0, ans=10.0 2024-08-14 19:59:32,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2822020.0, ans=0.125 2024-08-14 19:59:33,843 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 15 from Vox, 50 fro AS 2024-08-14 19:59:53,952 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 23 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-14 20:00:19,704 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2822320.0, ans=0.125 2024-08-14 20:00:23,627 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 6900, loss[loss=0.09938, beats_loss=0.009036, ecapa_loss=0.0001376, whisper_loss=0.08897, over 15081.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01073, ecapa_loss=0.0001551, whisper_loss=0.09109, over 3922671.49 frames. ], batch size: 57, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 20:00:33,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2822420.0, ans=0.125 2024-08-14 20:00:34,438 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2822420.0, ans=0.0 2024-08-14 20:00:41,208 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-14 20:00:44,053 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 24 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-14 20:00:45,632 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 21 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-14 20:00:49,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2822520.0, ans=0.05 2024-08-14 20:00:50,336 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 16 from LS+wenet, 27 from Vox, 15 fro AS 2024-08-14 20:00:59,560 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.809e+01 2.271e+01 2.537e+01 2.771e+01 4.123e+01, threshold=5.074e+01, percent-clipped=0.0 2024-08-14 20:01:03,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2822620.0, ans=0.0 2024-08-14 20:01:09,289 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 15 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-14 20:01:18,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2822720.0, ans=0.125 2024-08-14 20:01:18,386 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 20:01:19,569 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 14 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-14 20:01:21,770 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2822720.0, ans=0.1 2024-08-14 20:01:40,092 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 6950, loss[loss=0.1051, beats_loss=0.01116, ecapa_loss=0.0001411, whisper_loss=0.09249, over 20845.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01074, ecapa_loss=0.0001555, whisper_loss=0.09094, over 3926706.60 frames. ], batch size: 84, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 20:01:45,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2822920.0, ans=0.125 2024-08-14 20:01:52,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2822920.0, ans=0.125 2024-08-14 20:02:08,411 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 21 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 20:02:08,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2823020.0, ans=0.09899494936611666 2024-08-14 20:02:09,837 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 24 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-14 20:02:11,597 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2823120.0, ans=0.0 2024-08-14 20:02:22,270 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 33 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-14 20:02:27,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2823220.0, ans=0.0 2024-08-14 20:02:27,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2823220.0, ans=0.125 2024-08-14 20:02:32,815 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2823220.0, ans=0.015 2024-08-14 20:02:34,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2823220.0, ans=0.2 2024-08-14 20:02:35,983 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 25 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-14 20:02:41,154 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.08 vs. limit=15.0 2024-08-14 20:02:53,221 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 18 from LS+wenet, 9 from Vox, 29 fro AS 2024-08-14 20:02:55,630 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 7000, loss[loss=0.1102, beats_loss=0.01096, ecapa_loss=0.000164, whisper_loss=0.09756, over 23526.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01077, ecapa_loss=0.0001547, whisper_loss=0.0907, over 3907079.86 frames. ], batch size: 95, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 20:02:59,938 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.63 vs. limit=15.0 2024-08-14 20:03:01,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2823420.0, ans=0.04949747468305833 2024-08-14 20:03:03,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2823420.0, ans=0.2 2024-08-14 20:03:08,607 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2823420.0, ans=0.125 2024-08-14 20:03:09,124 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.59 vs. limit=22.5 2024-08-14 20:03:11,189 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2823520.0, ans=0.0 2024-08-14 20:03:29,702 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.975e+01 2.376e+01 2.618e+01 2.959e+01 4.269e+01, threshold=5.237e+01, percent-clipped=0.0 2024-08-14 20:03:37,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2823620.0, ans=0.1 2024-08-14 20:04:01,449 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.91 vs. limit=6.0 2024-08-14 20:04:09,285 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 7050, loss[loss=0.09824, beats_loss=0.009359, ecapa_loss=0.0001757, whisper_loss=0.08712, over 20896.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01078, ecapa_loss=0.0001548, whisper_loss=0.09036, over 3906436.38 frames. ], batch size: 83, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 20:04:18,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2823920.0, ans=0.125 2024-08-14 20:04:22,169 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.89 vs. limit=22.5 2024-08-14 20:04:23,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2824020.0, ans=0.2 2024-08-14 20:04:27,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2824020.0, ans=0.125 2024-08-14 20:04:30,153 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-14 20:04:44,320 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 31 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-14 20:05:00,084 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 17 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-14 20:05:09,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2824320.0, ans=0.125 2024-08-14 20:05:17,914 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2824320.0, ans=0.0 2024-08-14 20:05:24,656 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 7100, loss[loss=0.1073, beats_loss=0.01222, ecapa_loss=0.0001218, whisper_loss=0.09381, over 19290.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01083, ecapa_loss=0.0001537, whisper_loss=0.08974, over 3868638.23 frames. ], batch size: 75, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:05:31,110 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.732e-01 2024-08-14 20:05:44,590 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 24 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-14 20:05:52,063 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 32 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-14 20:05:55,010 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2824620.0, ans=0.015 2024-08-14 20:06:00,629 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.830e+01 2.271e+01 2.594e+01 2.929e+01 4.373e+01, threshold=5.188e+01, percent-clipped=0.0 2024-08-14 20:06:12,722 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 20 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-14 20:06:14,126 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-14 20:06:18,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2824720.0, ans=0.1 2024-08-14 20:06:38,646 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 7150, loss[loss=0.1026, beats_loss=0.01032, ecapa_loss=0.0001323, whisper_loss=0.09097, over 22881.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01079, ecapa_loss=0.0001532, whisper_loss=0.09007, over 3873530.60 frames. ], batch size: 92, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:06:45,345 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.829e+01 2024-08-14 20:07:04,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2825020.0, ans=0.0 2024-08-14 20:07:22,340 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 31 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-14 20:07:28,870 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2825220.0, ans=0.125 2024-08-14 20:07:53,184 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 7200, loss[loss=0.1073, beats_loss=0.008244, ecapa_loss=0.0001859, whisper_loss=0.0972, over 14867.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01071, ecapa_loss=0.0001536, whisper_loss=0.09091, over 3871116.34 frames. ], batch size: 60, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:07:59,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2825420.0, ans=0.0 2024-08-14 20:07:59,798 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.68 vs. limit=12.0 2024-08-14 20:08:01,227 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.04 vs. limit=15.0 2024-08-14 20:08:15,506 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 35 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-14 20:08:18,178 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-14 20:08:27,893 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.392e+01 2.665e+01 3.083e+01 4.439e+01, threshold=5.330e+01, percent-clipped=0.0 2024-08-14 20:08:28,444 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2825620.0, ans=0.125 2024-08-14 20:08:29,836 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2825620.0, ans=0.125 2024-08-14 20:08:51,960 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2825820.0, ans=0.05 2024-08-14 20:09:06,748 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 7250, loss[loss=0.1154, beats_loss=0.009675, ecapa_loss=0.0001605, whisper_loss=0.1041, over 22256.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01063, ecapa_loss=0.0001537, whisper_loss=0.09146, over 3895900.83 frames. ], batch size: 90, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:09:07,370 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.254e+02 2024-08-14 20:09:18,249 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2825920.0, ans=0.0 2024-08-14 20:09:18,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2825920.0, ans=0.0 2024-08-14 20:09:22,943 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-14 20:09:24,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2826020.0, ans=0.125 2024-08-14 20:09:45,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2826120.0, ans=0.2 2024-08-14 20:09:53,986 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-14 20:09:55,512 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.548e-03 2024-08-14 20:10:19,766 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 7300, loss[loss=0.08308, beats_loss=0.01092, ecapa_loss=0.0001714, whisper_loss=0.07045, over 18297.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01069, ecapa_loss=0.000154, whisper_loss=0.09081, over 3896002.83 frames. ], batch size: 75, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:10:21,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2826420.0, ans=0.0 2024-08-14 20:10:56,396 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.08 vs. limit=15.0 2024-08-14 20:11:04,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2826520.0, ans=0.2 2024-08-14 20:11:22,542 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2826620.0, ans=0.1 2024-08-14 20:11:26,236 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.805e+01 2.413e+01 2.619e+01 3.021e+01 6.286e+01, threshold=5.238e+01, percent-clipped=1.0 2024-08-14 20:12:00,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2826820.0, ans=0.125 2024-08-14 20:12:01,690 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 17 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-14 20:12:04,218 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 7350, loss[loss=0.1055, beats_loss=0.00992, ecapa_loss=0.0001314, whisper_loss=0.09422, over 19854.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01059, ecapa_loss=0.0001547, whisper_loss=0.09182, over 3900608.96 frames. ], batch size: 78, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:12:10,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2826920.0, ans=0.1 2024-08-14 20:12:17,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2827020.0, ans=0.125 2024-08-14 20:12:19,362 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 20:12:26,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2827020.0, ans=0.125 2024-08-14 20:12:37,648 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2827120.0, ans=0.04949747468305833 2024-08-14 20:12:54,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2827220.0, ans=0.125 2024-08-14 20:13:21,559 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 7400, loss[loss=0.1023, beats_loss=0.009439, ecapa_loss=0.0002218, whisper_loss=0.09068, over 15619.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.0106, ecapa_loss=0.0001552, whisper_loss=0.09167, over 3903128.96 frames. ], batch size: 67, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:13:22,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2827420.0, ans=0.125 2024-08-14 20:13:34,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2827420.0, ans=0.125 2024-08-14 20:13:36,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2827520.0, ans=0.07 2024-08-14 20:13:50,505 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2827520.0, ans=0.0 2024-08-14 20:13:53,607 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 20 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-14 20:13:59,735 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.726e+01 2.369e+01 2.692e+01 3.042e+01 1.751e+02, threshold=5.383e+01, percent-clipped=2.0 2024-08-14 20:14:11,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2827720.0, ans=0.125 2024-08-14 20:14:16,611 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 21 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-14 20:14:27,178 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.64 vs. limit=8.0 2024-08-14 20:14:40,251 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 7450, loss[loss=0.1144, beats_loss=0.01024, ecapa_loss=0.0001679, whisper_loss=0.1025, over 21724.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01059, ecapa_loss=0.0001564, whisper_loss=0.09171, over 3891119.68 frames. ], batch size: 90, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:15:02,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2828020.0, ans=0.125 2024-08-14 20:15:05,076 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.26 vs. limit=22.5 2024-08-14 20:15:20,258 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-14 20:15:25,031 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.67 vs. limit=12.0 2024-08-14 20:15:29,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2828120.0, ans=0.2 2024-08-14 20:15:30,352 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.90 vs. limit=15.0 2024-08-14 20:15:40,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2828220.0, ans=0.125 2024-08-14 20:15:53,660 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-14 20:16:04,151 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2828320.0, ans=0.125 2024-08-14 20:16:05,387 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 23 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-14 20:16:15,680 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 7500, loss[loss=0.1123, beats_loss=0.01046, ecapa_loss=0.0001328, whisper_loss=0.1005, over 20192.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01065, ecapa_loss=0.0001555, whisper_loss=0.09173, over 3929586.86 frames. ], batch size: 80, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:16:34,344 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.23 vs. limit=15.0 2024-08-14 20:16:42,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2828520.0, ans=0.125 2024-08-14 20:16:44,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2828520.0, ans=0.0 2024-08-14 20:17:01,588 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.865e+01 2.331e+01 2.565e+01 2.874e+01 3.652e+01, threshold=5.131e+01, percent-clipped=0.0 2024-08-14 20:17:04,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2828620.0, ans=0.2 2024-08-14 20:17:16,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2828720.0, ans=0.0 2024-08-14 20:17:30,940 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 18 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-14 20:17:51,661 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 7550, loss[loss=0.1066, beats_loss=0.007644, ecapa_loss=0.0002184, whisper_loss=0.09681, over 13839.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01064, ecapa_loss=0.0001557, whisper_loss=0.09148, over 3894668.50 frames. ], batch size: 58, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:18:14,441 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.67 vs. limit=12.0 2024-08-14 20:18:19,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2829020.0, ans=0.125 2024-08-14 20:18:22,562 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 16 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-14 20:18:30,178 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-14 20:18:35,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2829120.0, ans=0.125 2024-08-14 20:18:54,405 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 26 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-14 20:18:56,356 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.95 vs. limit=15.0 2024-08-14 20:19:06,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2829320.0, ans=0.015 2024-08-14 20:19:11,442 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2024-08-14 20:19:25,942 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 7600, loss[loss=0.08986, beats_loss=0.01045, ecapa_loss=0.0001663, whisper_loss=0.07775, over 20954.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01061, ecapa_loss=0.0001565, whisper_loss=0.09116, over 3886663.22 frames. ], batch size: 86, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:19:45,983 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 21 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-14 20:20:00,844 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 11 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-14 20:20:08,918 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.760e+01 2.345e+01 2.622e+01 3.093e+01 1.598e+02, threshold=5.244e+01, percent-clipped=3.0 2024-08-14 20:20:10,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2829620.0, ans=0.0 2024-08-14 20:20:15,425 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 22 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-14 20:20:23,823 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 20 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-14 20:20:27,031 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.18 vs. limit=22.5 2024-08-14 20:20:31,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2829820.0, ans=0.125 2024-08-14 20:20:32,129 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 21 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-14 20:20:35,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2829820.0, ans=0.0 2024-08-14 20:20:36,671 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 37 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-14 20:20:37,848 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 22 from LS+wenet, 31 from Vox, 39 fro AS 2024-08-14 20:20:42,249 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 28 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-14 20:20:46,434 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 7650, loss[loss=0.1101, beats_loss=0.01057, ecapa_loss=0.0001718, whisper_loss=0.09779, over 22099.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01069, ecapa_loss=0.0001555, whisper_loss=0.09026, over 3878556.81 frames. ], batch size: 91, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:21:02,178 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2830020.0, ans=0.125 2024-08-14 20:21:02,368 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.56 vs. limit=15.0 2024-08-14 20:21:03,052 WARNING [optim.py:496] (0/4) Scaling gradients by 0.061782095581293106, model_norm_threshold=52.43657684326172 2024-08-14 20:21:03,235 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.23, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.640e+05, grad_sumsq=1.648e+07, orig_rms_sq=9.952e-03 2024-08-14 20:21:10,995 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-14 20:21:29,196 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 25 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-14 20:21:38,871 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.67 vs. limit=12.0 2024-08-14 20:21:41,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2830320.0, ans=0.2 2024-08-14 20:21:54,862 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2830320.0, ans=0.125 2024-08-14 20:21:56,094 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 19 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-14 20:21:57,171 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 7700, loss[loss=0.09934, beats_loss=0.01158, ecapa_loss=0.0001899, whisper_loss=0.08586, over 16094.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01069, ecapa_loss=0.0001545, whisper_loss=0.0899, over 3853632.54 frames. ], batch size: 67, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:22:00,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2830420.0, ans=0.125 2024-08-14 20:22:30,741 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.939e+01 2.409e+01 2.589e+01 2.990e+01 8.487e+02, threshold=5.178e+01, percent-clipped=3.0 2024-08-14 20:22:32,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2830620.0, ans=0.125 2024-08-14 20:22:44,198 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2830720.0, ans=0.0 2024-08-14 20:22:51,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2830720.0, ans=0.2 2024-08-14 20:23:00,275 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2830820.0, ans=0.0 2024-08-14 20:23:04,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2830820.0, ans=0.1 2024-08-14 20:23:07,270 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2830920.0, ans=0.125 2024-08-14 20:23:08,410 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 7750, loss[loss=0.1201, beats_loss=0.008622, ecapa_loss=0.0002097, whisper_loss=0.1094, over 21390.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01064, ecapa_loss=0.000155, whisper_loss=0.09046, over 3868082.87 frames. ], batch size: 89, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:23:15,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=2830920.0, ans=10.0 2024-08-14 20:23:15,921 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.22 vs. limit=6.0 2024-08-14 20:23:18,642 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 27 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-14 20:23:30,189 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2831020.0, ans=0.125 2024-08-14 20:23:32,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2831020.0, ans=0.125 2024-08-14 20:23:48,058 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.36 vs. limit=22.5 2024-08-14 20:23:58,002 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.33 vs. limit=22.5 2024-08-14 20:24:17,405 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2831320.0, ans=0.1 2024-08-14 20:24:19,811 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 7800, loss[loss=0.09665, beats_loss=0.01081, ecapa_loss=0.0001506, whisper_loss=0.08434, over 20453.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01067, ecapa_loss=0.0001536, whisper_loss=0.09049, over 3885831.01 frames. ], batch size: 83, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:24:20,704 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.23 vs. limit=15.0 2024-08-14 20:24:23,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2831420.0, ans=0.2 2024-08-14 20:24:25,851 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-14 20:24:54,727 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.968e+01 2.359e+01 2.580e+01 2.928e+01 4.088e+01, threshold=5.160e+01, percent-clipped=0.0 2024-08-14 20:24:55,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2831620.0, ans=0.2 2024-08-14 20:25:05,675 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 26 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-14 20:25:06,016 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2831720.0, ans=0.125 2024-08-14 20:25:23,691 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 18 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-14 20:25:32,186 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 7850, loss[loss=0.0975, beats_loss=0.01132, ecapa_loss=0.0001816, whisper_loss=0.08437, over 21727.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01071, ecapa_loss=0.0001539, whisper_loss=0.09055, over 3871371.58 frames. ], batch size: 92, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:25:38,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2831920.0, ans=0.2 2024-08-14 20:25:48,906 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2832020.0, ans=0.125 2024-08-14 20:25:54,644 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.65 vs. limit=15.0 2024-08-14 20:25:58,655 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.11 vs. limit=15.0 2024-08-14 20:26:07,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2832120.0, ans=0.07 2024-08-14 20:26:15,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2832220.0, ans=0.1 2024-08-14 20:26:15,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2832220.0, ans=0.0 2024-08-14 20:26:18,244 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-14 20:26:30,010 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2832320.0, ans=0.125 2024-08-14 20:26:39,430 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2832320.0, ans=0.125 2024-08-14 20:26:43,266 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 7900, loss[loss=0.104, beats_loss=0.01357, ecapa_loss=0.0001037, whisper_loss=0.08938, over 15442.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01079, ecapa_loss=0.0001535, whisper_loss=0.09033, over 3866609.21 frames. ], batch size: 61, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:26:45,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2832420.0, ans=0.0 2024-08-14 20:27:18,405 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.856e+01 2.337e+01 2.582e+01 2.870e+01 4.311e+01, threshold=5.164e+01, percent-clipped=0.0 2024-08-14 20:27:48,223 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2832820.0, ans=0.0 2024-08-14 20:27:54,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2832820.0, ans=0.2 2024-08-14 20:27:56,464 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 7950, loss[loss=0.09104, beats_loss=0.01116, ecapa_loss=0.000171, whisper_loss=0.07818, over 16224.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01074, ecapa_loss=0.000154, whisper_loss=0.09076, over 3883919.54 frames. ], batch size: 69, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:28:01,355 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 23 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-14 20:28:13,961 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 15 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-14 20:28:40,615 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2833220.0, ans=0.1 2024-08-14 20:28:40,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2833220.0, ans=0.125 2024-08-14 20:28:43,234 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 18 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-14 20:28:56,304 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 24 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-14 20:28:57,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2833320.0, ans=0.0 2024-08-14 20:29:09,128 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 8000, loss[loss=0.1117, beats_loss=0.01046, ecapa_loss=0.0001642, whisper_loss=0.09957, over 16931.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01082, ecapa_loss=0.0001526, whisper_loss=0.0904, over 3877668.90 frames. ], batch size: 63, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:29:09,711 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2833420.0, ans=0.0 2024-08-14 20:29:21,252 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 32 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-14 20:29:26,546 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-14 20:29:26,745 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2833520.0, ans=0.5 2024-08-14 20:29:27,914 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 24 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-14 20:29:43,234 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.801e+01 2.316e+01 2.668e+01 3.025e+01 4.748e+01, threshold=5.335e+01, percent-clipped=0.0 2024-08-14 20:29:52,415 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2833720.0, ans=0.125 2024-08-14 20:29:58,088 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2833720.0, ans=0.1 2024-08-14 20:30:00,805 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.74 vs. limit=15.0 2024-08-14 20:30:14,040 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.66 vs. limit=15.0 2024-08-14 20:30:20,488 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 8050, loss[loss=0.1083, beats_loss=0.01085, ecapa_loss=0.000131, whisper_loss=0.09617, over 23398.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01085, ecapa_loss=0.0001522, whisper_loss=0.09029, over 3887814.15 frames. ], batch size: 92, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:30:35,011 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 25 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-14 20:30:48,203 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 27 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-14 20:30:54,618 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.48 vs. limit=8.0 2024-08-14 20:30:59,573 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 27 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-14 20:31:09,216 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 22 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-14 20:31:17,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2834320.0, ans=0.125 2024-08-14 20:31:18,044 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2834320.0, ans=0.125 2024-08-14 20:31:26,957 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.68 vs. limit=15.0 2024-08-14 20:31:27,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2834320.0, ans=0.0 2024-08-14 20:31:31,734 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 8100, loss[loss=0.09681, beats_loss=0.01089, ecapa_loss=0.0001446, whisper_loss=0.08448, over 21097.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01079, ecapa_loss=0.0001529, whisper_loss=0.09063, over 3868122.21 frames. ], batch size: 84, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:31:33,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2834420.0, ans=0.0 2024-08-14 20:31:53,534 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-14 20:31:56,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2834520.0, ans=0.0 2024-08-14 20:32:02,397 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 21 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-14 20:32:06,614 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.290e+01 2.522e+01 2.889e+01 4.208e+01, threshold=5.043e+01, percent-clipped=0.0 2024-08-14 20:32:15,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2834720.0, ans=0.125 2024-08-14 20:32:31,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2834820.0, ans=0.0 2024-08-14 20:32:45,016 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 8150, loss[loss=0.1216, beats_loss=0.009976, ecapa_loss=0.0001499, whisper_loss=0.1101, over 22314.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01071, ecapa_loss=0.0001535, whisper_loss=0.09152, over 3900977.05 frames. ], batch size: 89, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:33:06,293 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2835020.0, ans=0.0 2024-08-14 20:33:07,636 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2835020.0, ans=0.04949747468305833 2024-08-14 20:33:13,133 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 23 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-14 20:33:13,853 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.76 vs. limit=15.0 2024-08-14 20:33:19,184 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.20 vs. limit=15.0 2024-08-14 20:33:40,544 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.21 vs. limit=15.0 2024-08-14 20:33:43,719 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 27 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-14 20:33:55,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2835320.0, ans=0.0 2024-08-14 20:33:58,494 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 8200, loss[loss=0.1018, beats_loss=0.01079, ecapa_loss=0.0001413, whisper_loss=0.08964, over 23121.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0107, ecapa_loss=0.0001543, whisper_loss=0.09117, over 3900744.41 frames. ], batch size: 90, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:34:03,251 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.67 vs. limit=12.0 2024-08-14 20:34:14,758 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2835520.0, ans=0.125 2024-08-14 20:34:25,943 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 25 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-14 20:34:27,971 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2835620.0, ans=0.125 2024-08-14 20:34:33,577 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.288e+01 2.494e+01 2.883e+01 1.855e+02, threshold=4.988e+01, percent-clipped=1.0 2024-08-14 20:34:45,943 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2835720.0, ans=0.1 2024-08-14 20:34:47,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2835720.0, ans=0.125 2024-08-14 20:34:51,516 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 23 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-14 20:34:53,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2835720.0, ans=0.0 2024-08-14 20:35:10,889 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 8250, loss[loss=0.09322, beats_loss=0.01284, ecapa_loss=0.000129, whisper_loss=0.07909, over 21525.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01071, ecapa_loss=0.0001543, whisper_loss=0.09089, over 3932704.44 frames. ], batch size: 83, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:35:15,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2835920.0, ans=0.125 2024-08-14 20:35:16,596 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 23 from LS+wenet, 35 from Vox, 35 fro AS 2024-08-14 20:35:21,370 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-14 20:36:14,588 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 15 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-14 20:36:23,079 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 8300, loss[loss=0.1189, beats_loss=0.01108, ecapa_loss=0.000129, whisper_loss=0.1065, over 21632.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0108, ecapa_loss=0.0001535, whisper_loss=0.09041, over 3932179.17 frames. ], batch size: 84, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:36:34,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2836420.0, ans=0.125 2024-08-14 20:36:48,528 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.56 vs. limit=12.0 2024-08-14 20:36:51,288 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.78 vs. limit=22.5 2024-08-14 20:36:57,293 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.84 vs. limit=12.0 2024-08-14 20:36:57,779 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.801e+01 2.392e+01 2.726e+01 3.062e+01 2.103e+02, threshold=5.453e+01, percent-clipped=2.0 2024-08-14 20:37:01,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2836620.0, ans=0.0 2024-08-14 20:37:04,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2836620.0, ans=0.04949747468305833 2024-08-14 20:37:07,946 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 25 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-14 20:37:10,909 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2836720.0, ans=0.2 2024-08-14 20:37:23,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2836820.0, ans=0.125 2024-08-14 20:37:34,527 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 8350, loss[loss=0.09496, beats_loss=0.01115, ecapa_loss=0.0001459, whisper_loss=0.08236, over 21342.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01078, ecapa_loss=0.0001548, whisper_loss=0.09045, over 3922562.45 frames. ], batch size: 81, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:37:45,808 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.16 vs. limit=12.0 2024-08-14 20:37:52,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2837020.0, ans=0.1 2024-08-14 20:37:53,591 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 16 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-14 20:37:53,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2837020.0, ans=0.1 2024-08-14 20:37:58,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2837020.0, ans=0.1 2024-08-14 20:38:04,158 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2837120.0, ans=0.125 2024-08-14 20:38:14,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2837120.0, ans=0.2 2024-08-14 20:38:21,137 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2837220.0, ans=0.0 2024-08-14 20:38:29,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2837220.0, ans=0.1 2024-08-14 20:38:31,941 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.69 vs. limit=15.0 2024-08-14 20:38:32,812 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2837320.0, ans=0.125 2024-08-14 20:38:41,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2837320.0, ans=0.125 2024-08-14 20:38:46,951 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 8400, loss[loss=0.1085, beats_loss=0.008337, ecapa_loss=0.0001786, whisper_loss=0.09843, over 17113.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01076, ecapa_loss=0.0001543, whisper_loss=0.09023, over 3908832.00 frames. ], batch size: 68, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:39:01,767 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 20 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-14 20:39:06,915 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=2837520.0, ans=22.5 2024-08-14 20:39:09,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2837520.0, ans=0.0 2024-08-14 20:39:20,180 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.32 vs. limit=22.5 2024-08-14 20:39:22,145 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.308e+01 2.540e+01 2.813e+01 3.907e+01, threshold=5.081e+01, percent-clipped=0.0 2024-08-14 20:39:59,895 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 8450, loss[loss=0.08982, beats_loss=0.01128, ecapa_loss=0.0001453, whisper_loss=0.07708, over 15422.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01076, ecapa_loss=0.0001531, whisper_loss=0.09019, over 3926874.03 frames. ], batch size: 60, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:40:00,777 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.96 vs. limit=6.0 2024-08-14 20:40:13,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2838020.0, ans=0.125 2024-08-14 20:40:15,365 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.95 vs. limit=6.0 2024-08-14 20:40:33,337 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2838120.0, ans=0.2 2024-08-14 20:40:41,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2838220.0, ans=0.125 2024-08-14 20:40:44,707 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 20:41:02,116 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2838320.0, ans=0.0 2024-08-14 20:41:11,484 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 8500, loss[loss=0.1117, beats_loss=0.008101, ecapa_loss=0.0001874, whisper_loss=0.1018, over 16115.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01079, ecapa_loss=0.0001541, whisper_loss=0.08926, over 3922960.46 frames. ], batch size: 64, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:41:12,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2838420.0, ans=0.2 2024-08-14 20:41:13,355 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2838420.0, ans=0.125 2024-08-14 20:41:13,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2838420.0, ans=0.0 2024-08-14 20:41:13,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2838420.0, ans=0.125 2024-08-14 20:41:13,762 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.49 vs. limit=15.0 2024-08-14 20:41:19,217 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2838420.0, ans=0.2 2024-08-14 20:41:20,457 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-14 20:41:20,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2838420.0, ans=0.2 2024-08-14 20:41:24,474 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-14 20:41:45,637 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.375e+01 2.644e+01 3.031e+01 3.106e+02, threshold=5.288e+01, percent-clipped=1.0 2024-08-14 20:41:49,220 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2838620.0, ans=0.0 2024-08-14 20:41:50,173 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 24 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-14 20:42:02,209 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.48 vs. limit=10.0 2024-08-14 20:42:03,055 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 18 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-14 20:42:10,078 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 25 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-14 20:42:17,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2838820.0, ans=0.0 2024-08-14 20:42:18,452 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 24 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-14 20:42:22,948 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 8550, loss[loss=0.1101, beats_loss=0.008149, ecapa_loss=0.0001912, whisper_loss=0.1001, over 22000.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01078, ecapa_loss=0.0001532, whisper_loss=0.08933, over 3910072.24 frames. ], batch size: 90, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:42:24,655 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 30 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-14 20:42:55,199 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.42 vs. limit=22.5 2024-08-14 20:43:13,543 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 15 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-14 20:43:15,216 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2839220.0, ans=0.125 2024-08-14 20:43:25,165 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-14 20:43:28,922 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.44 vs. limit=12.0 2024-08-14 20:43:35,482 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 8600, loss[loss=0.1166, beats_loss=0.008632, ecapa_loss=0.0001537, whisper_loss=0.1064, over 17720.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01072, ecapa_loss=0.0001531, whisper_loss=0.08973, over 3907931.82 frames. ], batch size: 69, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:43:36,217 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2839420.0, ans=0.0 2024-08-14 20:43:40,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2839420.0, ans=0.0 2024-08-14 20:43:59,244 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 17 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-14 20:44:02,600 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2839520.0, ans=0.2 2024-08-14 20:44:10,914 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.454e+01 2.758e+01 3.025e+01 4.750e+01, threshold=5.517e+01, percent-clipped=0.0 2024-08-14 20:44:20,519 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2839720.0, ans=0.0 2024-08-14 20:44:28,160 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2839720.0, ans=0.0 2024-08-14 20:44:28,664 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.56 vs. limit=15.0 2024-08-14 20:44:39,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2839820.0, ans=0.0 2024-08-14 20:44:49,464 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 8650, loss[loss=0.1103, beats_loss=0.007079, ecapa_loss=0.0001921, whisper_loss=0.1013, over 16467.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01066, ecapa_loss=0.0001534, whisper_loss=0.09006, over 3884764.70 frames. ], batch size: 63, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:44:59,327 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-284000.pt 2024-08-14 20:45:10,034 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 22 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-14 20:45:14,733 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 17 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-14 20:45:30,367 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.04 vs. limit=15.0 2024-08-14 20:45:30,948 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 25 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-14 20:45:31,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2840120.0, ans=0.1 2024-08-14 20:45:54,200 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 31 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-14 20:46:01,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2840320.0, ans=0.1 2024-08-14 20:46:05,338 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 8700, loss[loss=0.09786, beats_loss=0.01133, ecapa_loss=0.0001632, whisper_loss=0.0849, over 19155.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01064, ecapa_loss=0.0001535, whisper_loss=0.09067, over 3887101.34 frames. ], batch size: 79, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:46:05,504 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 21 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-14 20:46:21,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2840520.0, ans=0.125 2024-08-14 20:46:33,002 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2840620.0, ans=0.125 2024-08-14 20:46:39,503 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.465e+01 2.655e+01 3.081e+01 6.274e+01, threshold=5.311e+01, percent-clipped=1.0 2024-08-14 20:46:50,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2840720.0, ans=0.125 2024-08-14 20:47:09,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2840820.0, ans=0.1 2024-08-14 20:47:11,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2840820.0, ans=0.125 2024-08-14 20:47:13,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2840820.0, ans=0.125 2024-08-14 20:47:17,267 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 8750, loss[loss=0.09607, beats_loss=0.01184, ecapa_loss=0.0001386, whisper_loss=0.08284, over 21886.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01057, ecapa_loss=0.0001544, whisper_loss=0.09079, over 3871826.45 frames. ], batch size: 89, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:47:17,444 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 26 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-14 20:47:31,790 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2841020.0, ans=0.1 2024-08-14 20:48:26,346 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.49 vs. limit=12.0 2024-08-14 20:48:29,944 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 8800, loss[loss=0.09638, beats_loss=0.01128, ecapa_loss=0.0001393, whisper_loss=0.08371, over 16443.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01063, ecapa_loss=0.0001548, whisper_loss=0.09125, over 3904476.31 frames. ], batch size: 64, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:48:30,589 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.20 vs. limit=15.0 2024-08-14 20:48:45,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2841520.0, ans=0.125 2024-08-14 20:49:05,278 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 2.267e+01 2.535e+01 2.766e+01 4.137e+01, threshold=5.070e+01, percent-clipped=0.0 2024-08-14 20:49:05,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2841620.0, ans=0.125 2024-08-14 20:49:08,184 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 25 from LS+wenet, 30 from Vox, 27 fro AS 2024-08-14 20:49:10,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2841620.0, ans=0.09899494936611666 2024-08-14 20:49:24,647 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-14 20:49:25,828 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 30 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-14 20:49:36,581 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2841820.0, ans=0.1 2024-08-14 20:49:42,464 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.90 vs. limit=22.5 2024-08-14 20:49:43,242 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 8850, loss[loss=0.08939, beats_loss=0.01285, ecapa_loss=0.0001237, whisper_loss=0.0753, over 15413.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01064, ecapa_loss=0.0001544, whisper_loss=0.09069, over 3864661.11 frames. ], batch size: 63, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:49:45,422 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2841920.0, ans=0.125 2024-08-14 20:49:48,264 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 20:49:49,763 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.20 vs. limit=6.0 2024-08-14 20:50:07,642 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 23 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-14 20:50:28,742 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 21 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-14 20:50:33,706 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=7.932e+00 2024-08-14 20:50:40,564 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-14 20:50:47,688 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2842320.0, ans=0.1 2024-08-14 20:50:51,786 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 20 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-14 20:50:54,280 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 8900, loss[loss=0.1098, beats_loss=0.01175, ecapa_loss=0.0001471, whisper_loss=0.09659, over 22769.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01074, ecapa_loss=0.0001539, whisper_loss=0.0909, over 3900386.19 frames. ], batch size: 91, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:51:03,405 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 29 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-14 20:51:11,846 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2842520.0, ans=0.125 2024-08-14 20:51:11,870 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2842520.0, ans=0.2 2024-08-14 20:51:19,819 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.03 vs. limit=15.0 2024-08-14 20:51:23,918 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2842620.0, ans=0.2 2024-08-14 20:51:29,105 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.955e+01 2.335e+01 2.555e+01 2.826e+01 4.520e+01, threshold=5.110e+01, percent-clipped=0.0 2024-08-14 20:51:33,951 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 24 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-14 20:51:35,353 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 25 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-14 20:51:47,574 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.60 vs. limit=15.0 2024-08-14 20:51:48,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2842720.0, ans=0.05 2024-08-14 20:52:06,231 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 8950, loss[loss=0.1038, beats_loss=0.01268, ecapa_loss=0.0001259, whisper_loss=0.08983, over 16973.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01078, ecapa_loss=0.0001535, whisper_loss=0.09098, over 3897991.87 frames. ], batch size: 66, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:52:47,886 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.70 vs. limit=6.0 2024-08-14 20:52:54,634 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 19 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-14 20:52:58,274 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.93 vs. limit=15.0 2024-08-14 20:53:09,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2843320.0, ans=0.125 2024-08-14 20:53:18,856 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 9000, loss[loss=0.08156, beats_loss=0.01104, ecapa_loss=0.0001493, whisper_loss=0.06903, over 21055.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01082, ecapa_loss=0.000153, whisper_loss=0.08984, over 3870419.73 frames. ], batch size: 86, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:53:18,858 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-14 20:54:01,024 INFO [train_multi_KD3.py:1149] (0/4) Epoch 20, validation on ASR_libri: loss=0.2527, beats_loss=0, ecapa_loss=0.0005268, whisper_loss=0.2474, over 922467.00 frames. 2024-08-14 20:54:16,990 INFO [train_multi_KD3.py:1149] (0/4) Epoch 20, validation on SV_voxceleb1: loss=0.004208, beats_loss=0, ecapa_loss=0.0004208, whisper_loss=0, over 939242.00 frames. 2024-08-14 20:56:16,659 INFO [train_multi_KD3.py:1149] (0/4) Epoch 20, validation on AT_audioset: loss=0.0236, beats_loss=0.0236, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 20:56:16,663 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32601MB 2024-08-14 20:56:17,490 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2843420.0, ans=0.125 2024-08-14 20:56:21,362 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-14 20:56:28,363 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 24 from LS+wenet, 25 from Vox, 22 fro AS 2024-08-14 20:56:47,866 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 19 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-14 20:56:51,215 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 20:56:51,973 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.899e+01 2.250e+01 2.510e+01 2.872e+01 4.631e+01, threshold=5.020e+01, percent-clipped=0.0 2024-08-14 20:56:55,507 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.24 vs. limit=15.0 2024-08-14 20:57:12,913 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2843720.0, ans=0.1 2024-08-14 20:57:23,027 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 28 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-14 20:57:24,606 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 17 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-14 20:57:26,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2843820.0, ans=0.125 2024-08-14 20:57:29,878 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 9050, loss[loss=0.1143, beats_loss=0.01089, ecapa_loss=0.0001725, whisper_loss=0.1016, over 19168.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01076, ecapa_loss=0.0001525, whisper_loss=0.09085, over 3875559.57 frames. ], batch size: 79, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:57:30,207 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 35 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-14 20:57:34,054 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.28 vs. limit=15.0 2024-08-14 20:57:35,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2843920.0, ans=0.0 2024-08-14 20:57:35,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2843920.0, ans=0.0 2024-08-14 20:57:39,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2843920.0, ans=0.1 2024-08-14 20:57:43,685 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 22 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-14 20:57:44,939 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 22 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-14 20:57:58,185 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 20:58:00,630 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 19 from LS+wenet, 20 from Vox, 17 fro AS 2024-08-14 20:58:25,942 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-14 20:58:43,192 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 9100, loss[loss=0.09303, beats_loss=0.009372, ecapa_loss=0.0001929, whisper_loss=0.08173, over 17506.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01069, ecapa_loss=0.000153, whisper_loss=0.09028, over 3854029.55 frames. ], batch size: 72, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 20:58:58,191 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2844520.0, ans=0.125 2024-08-14 20:58:59,867 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.12 vs. limit=22.5 2024-08-14 20:59:06,629 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 29 from LS+wenet, 28 from Vox, 24 fro AS 2024-08-14 20:59:13,825 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-14 20:59:17,652 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.952e+01 2.420e+01 2.655e+01 2.997e+01 1.110e+02, threshold=5.311e+01, percent-clipped=1.0 2024-08-14 20:59:19,861 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.34 vs. limit=10.0 2024-08-14 20:59:21,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2844620.0, ans=0.125 2024-08-14 20:59:39,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2844820.0, ans=0.0 2024-08-14 20:59:40,390 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.34 vs. limit=10.0 2024-08-14 20:59:42,506 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 17 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-14 20:59:45,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2844820.0, ans=0.125 2024-08-14 20:59:55,165 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 9150, loss[loss=0.1064, beats_loss=0.01024, ecapa_loss=0.0001459, whisper_loss=0.09474, over 22350.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01068, ecapa_loss=0.0001536, whisper_loss=0.0905, over 3875897.47 frames. ], batch size: 89, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 20:59:56,809 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 21 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-14 20:59:59,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2844920.0, ans=0.1 2024-08-14 21:00:07,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2844920.0, ans=0.0 2024-08-14 21:00:19,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2845020.0, ans=0.125 2024-08-14 21:00:36,143 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2845120.0, ans=0.0 2024-08-14 21:00:40,123 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 18 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-14 21:00:47,529 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2845220.0, ans=0.2 2024-08-14 21:00:47,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2845220.0, ans=0.1 2024-08-14 21:00:48,852 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-14 21:00:54,723 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 14 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-14 21:00:55,250 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.91 vs. limit=6.0 2024-08-14 21:01:07,136 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 9200, loss[loss=0.08829, beats_loss=0.0126, ecapa_loss=0.0001574, whisper_loss=0.07411, over 22408.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01066, ecapa_loss=0.0001539, whisper_loss=0.09065, over 3871274.04 frames. ], batch size: 96, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:01:28,906 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2845520.0, ans=0.125 2024-08-14 21:01:41,298 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.818e+01 2.415e+01 2.661e+01 2.941e+01 2.596e+02, threshold=5.321e+01, percent-clipped=3.0 2024-08-14 21:01:42,303 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.94 vs. limit=22.5 2024-08-14 21:01:44,672 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 20 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-14 21:01:46,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2845620.0, ans=0.125 2024-08-14 21:02:11,171 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.48 vs. limit=10.0 2024-08-14 21:02:11,918 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 25 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-14 21:02:18,818 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 9250, loss[loss=0.105, beats_loss=0.01058, ecapa_loss=0.0001658, whisper_loss=0.09278, over 14470.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01068, ecapa_loss=0.0001547, whisper_loss=0.09016, over 3860839.23 frames. ], batch size: 59, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:02:22,657 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2845920.0, ans=0.125 2024-08-14 21:02:25,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2845920.0, ans=0.04949747468305833 2024-08-14 21:02:41,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2846020.0, ans=0.125 2024-08-14 21:02:43,212 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 26 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-14 21:02:54,843 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 20 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-14 21:02:59,862 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.79 vs. limit=22.5 2024-08-14 21:03:03,559 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 24 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-14 21:03:05,139 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 21:03:11,764 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2846220.0, ans=0.0 2024-08-14 21:03:15,297 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2024-08-14 21:03:27,722 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 28 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-14 21:03:30,887 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 22 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-14 21:03:33,444 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 9300, loss[loss=0.09803, beats_loss=0.01272, ecapa_loss=0.0001316, whisper_loss=0.08399, over 21415.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01064, ecapa_loss=0.0001546, whisper_loss=0.09022, over 3862396.22 frames. ], batch size: 85, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:03:35,842 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 24 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-14 21:03:39,356 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.77 vs. limit=12.0 2024-08-14 21:03:51,793 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 24 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-14 21:04:03,494 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 31 from Vox, 29 fro AS 2024-08-14 21:04:05,407 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.68 vs. limit=15.0 2024-08-14 21:04:08,879 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.791e+01 2.351e+01 2.533e+01 2.913e+01 3.870e+01, threshold=5.065e+01, percent-clipped=0.0 2024-08-14 21:04:14,423 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.30 vs. limit=15.0 2024-08-14 21:04:22,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2846720.0, ans=0.125 2024-08-14 21:04:41,299 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 16 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-14 21:04:42,078 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.57 vs. limit=22.5 2024-08-14 21:04:48,413 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 9350, loss[loss=0.1123, beats_loss=0.01072, ecapa_loss=0.0001491, whisper_loss=0.1001, over 22753.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01053, ecapa_loss=0.0001559, whisper_loss=0.09102, over 3887924.42 frames. ], batch size: 88, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:04:55,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2846920.0, ans=0.0 2024-08-14 21:04:56,455 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.935e+01 2024-08-14 21:05:13,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2847020.0, ans=0.125 2024-08-14 21:05:37,223 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 34 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-14 21:05:37,691 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.58 vs. limit=15.0 2024-08-14 21:05:44,679 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 21:06:01,692 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 9400, loss[loss=0.1099, beats_loss=0.008132, ecapa_loss=0.0001773, whisper_loss=0.09999, over 19096.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01052, ecapa_loss=0.0001562, whisper_loss=0.09142, over 3882465.49 frames. ], batch size: 77, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:06:02,357 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2847420.0, ans=0.07 2024-08-14 21:06:04,069 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.74 vs. limit=6.0 2024-08-14 21:06:06,719 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 18 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-14 21:06:17,619 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 16 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-14 21:06:30,797 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 17 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-14 21:06:37,217 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2847620.0, ans=0.125 2024-08-14 21:06:38,196 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.317e+01 2.592e+01 2.927e+01 3.881e+01, threshold=5.184e+01, percent-clipped=0.0 2024-08-14 21:06:43,969 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-14 21:06:49,717 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 13 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-14 21:06:55,766 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2847720.0, ans=0.0 2024-08-14 21:07:00,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2847820.0, ans=0.0 2024-08-14 21:07:13,964 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 17 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-14 21:07:15,139 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 9450, loss[loss=0.0948, beats_loss=0.01104, ecapa_loss=0.0001539, whisper_loss=0.08222, over 15780.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0105, ecapa_loss=0.0001562, whisper_loss=0.0912, over 3841665.99 frames. ], batch size: 63, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:07:16,914 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 23 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-14 21:07:18,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2847920.0, ans=0.0 2024-08-14 21:07:21,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2847920.0, ans=0.125 2024-08-14 21:07:29,219 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.58 vs. limit=22.5 2024-08-14 21:07:47,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2848120.0, ans=0.1 2024-08-14 21:07:50,617 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 22 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-14 21:08:04,208 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-14 21:08:06,288 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.14 vs. limit=12.0 2024-08-14 21:08:21,259 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 30 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-14 21:08:21,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_na.min_abs, batch_count=2848320.0, ans=0.02 2024-08-14 21:08:23,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2848320.0, ans=0.125 2024-08-14 21:08:28,289 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 9500, loss[loss=0.09223, beats_loss=0.01257, ecapa_loss=0.0001696, whisper_loss=0.07797, over 21467.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01064, ecapa_loss=0.000156, whisper_loss=0.09056, over 3864563.66 frames. ], batch size: 91, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:08:52,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2848520.0, ans=0.125 2024-08-14 21:08:58,445 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 36 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-14 21:09:03,854 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.637e+01 2.327e+01 2.619e+01 2.918e+01 1.778e+02, threshold=5.238e+01, percent-clipped=2.0 2024-08-14 21:09:07,042 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 22 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-14 21:09:10,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2848620.0, ans=0.125 2024-08-14 21:09:32,716 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.38 vs. limit=15.0 2024-08-14 21:09:42,074 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 9550, loss[loss=0.1027, beats_loss=0.009279, ecapa_loss=0.0001659, whisper_loss=0.09179, over 22467.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01057, ecapa_loss=0.0001567, whisper_loss=0.09048, over 3848650.19 frames. ], batch size: 90, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:09:46,819 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-14 21:09:58,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2849020.0, ans=0.0 2024-08-14 21:10:11,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2849120.0, ans=0.1 2024-08-14 21:10:16,663 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.86 vs. limit=10.0 2024-08-14 21:10:19,131 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2849120.0, ans=0.0 2024-08-14 21:10:25,848 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 15 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-14 21:10:30,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2849220.0, ans=0.125 2024-08-14 21:10:43,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2849320.0, ans=0.125 2024-08-14 21:10:47,234 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 17 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-14 21:10:52,794 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 24 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-14 21:10:53,713 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 9600, loss[loss=0.09984, beats_loss=0.009667, ecapa_loss=0.0001556, whisper_loss=0.08862, over 20057.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01056, ecapa_loss=0.0001557, whisper_loss=0.09032, over 3848638.55 frames. ], batch size: 81, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:11:03,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2849420.0, ans=0.125 2024-08-14 21:11:05,182 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2849420.0, ans=0.2 2024-08-14 21:11:21,125 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 15 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-14 21:11:26,909 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-14 21:11:27,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2849620.0, ans=0.2 2024-08-14 21:11:29,630 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.324e+01 2.593e+01 2.905e+01 4.004e+01, threshold=5.186e+01, percent-clipped=0.0 2024-08-14 21:11:31,525 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 23 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-14 21:11:40,186 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 27 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-14 21:11:40,530 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2849720.0, ans=0.125 2024-08-14 21:11:58,504 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.09 vs. limit=10.0 2024-08-14 21:12:07,661 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 9650, loss[loss=0.1202, beats_loss=0.009884, ecapa_loss=0.0001433, whisper_loss=0.1089, over 19359.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01058, ecapa_loss=0.0001555, whisper_loss=0.08979, over 3816974.72 frames. ], batch size: 73, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:12:11,240 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.37 vs. limit=15.0 2024-08-14 21:12:30,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2850020.0, ans=0.2 2024-08-14 21:12:31,724 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 24 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-14 21:12:32,178 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2850020.0, ans=0.125 2024-08-14 21:13:09,329 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 20 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-14 21:13:20,531 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 9700, loss[loss=0.09932, beats_loss=0.01307, ecapa_loss=0.0001325, whisper_loss=0.08493, over 21190.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01054, ecapa_loss=0.000157, whisper_loss=0.08971, over 3826925.77 frames. ], batch size: 87, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:13:26,965 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-14 21:13:37,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2850520.0, ans=0.125 2024-08-14 21:13:39,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2850520.0, ans=0.125 2024-08-14 21:13:45,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2850520.0, ans=0.5 2024-08-14 21:13:51,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2850620.0, ans=0.125 2024-08-14 21:13:51,174 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2850620.0, ans=0.125 2024-08-14 21:13:56,526 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.346e+01 2.562e+01 2.964e+01 3.831e+01, threshold=5.124e+01, percent-clipped=0.0 2024-08-14 21:14:02,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2850620.0, ans=0.2 2024-08-14 21:14:08,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2850720.0, ans=0.2 2024-08-14 21:14:17,593 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 18 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-14 21:14:34,916 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 9750, loss[loss=0.1003, beats_loss=0.01186, ecapa_loss=0.000117, whisper_loss=0.0873, over 17454.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01061, ecapa_loss=0.0001555, whisper_loss=0.08968, over 3832668.86 frames. ], batch size: 66, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:14:47,440 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2850920.0, ans=0.1 2024-08-14 21:14:49,117 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.60 vs. limit=15.0 2024-08-14 21:14:50,349 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2851020.0, ans=0.1 2024-08-14 21:15:02,220 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 30 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-14 21:15:14,350 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=2851120.0, ans=0.5 2024-08-14 21:15:35,221 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2851320.0, ans=0.125 2024-08-14 21:15:41,121 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-14 21:15:48,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2851420.0, ans=0.1 2024-08-14 21:15:49,390 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 9800, loss[loss=0.09264, beats_loss=0.01128, ecapa_loss=0.0001615, whisper_loss=0.07974, over 21519.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01059, ecapa_loss=0.0001544, whisper_loss=0.0905, over 3841842.76 frames. ], batch size: 88, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:15:55,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2851420.0, ans=0.1 2024-08-14 21:15:57,065 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 29 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-14 21:16:04,203 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.27 vs. limit=15.0 2024-08-14 21:16:14,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2851520.0, ans=0.2 2024-08-14 21:16:17,951 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 28 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-14 21:16:25,413 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.297e+01 2.616e+01 2.876e+01 8.897e+01, threshold=5.231e+01, percent-clipped=1.0 2024-08-14 21:16:28,877 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 23 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-14 21:16:34,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2851720.0, ans=0.1 2024-08-14 21:16:35,918 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-14 21:16:36,223 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2851720.0, ans=0.0 2024-08-14 21:16:39,554 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.50 vs. limit=8.0 2024-08-14 21:17:03,700 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 9850, loss[loss=0.1116, beats_loss=0.0104, ecapa_loss=0.0001679, whisper_loss=0.09956, over 20343.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01068, ecapa_loss=0.0001545, whisper_loss=0.09045, over 3837099.93 frames. ], batch size: 86, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:17:08,578 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-14 21:17:10,250 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 25 from LS+wenet, 10 from Vox, 30 fro AS 2024-08-14 21:17:29,359 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 25 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-14 21:17:44,679 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.40 vs. limit=15.0 2024-08-14 21:17:49,885 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-14 21:18:03,693 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.41 vs. limit=6.0 2024-08-14 21:18:16,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2852420.0, ans=0.2 2024-08-14 21:18:18,416 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 9900, loss[loss=0.09905, beats_loss=0.01109, ecapa_loss=0.0001454, whisper_loss=0.0865, over 16189.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01066, ecapa_loss=0.0001543, whisper_loss=0.09128, over 3840466.50 frames. ], batch size: 66, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:18:20,220 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 27 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-14 21:18:32,530 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2852520.0, ans=0.1 2024-08-14 21:18:54,344 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.004e+01 2.391e+01 2.621e+01 2.869e+01 9.364e+01, threshold=5.242e+01, percent-clipped=1.0 2024-08-14 21:18:54,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2852620.0, ans=0.125 2024-08-14 21:18:58,182 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.04 vs. limit=15.0 2024-08-14 21:19:03,758 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2852720.0, ans=0.1 2024-08-14 21:19:06,530 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 22 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-14 21:19:15,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2852720.0, ans=0.125 2024-08-14 21:19:16,163 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-14 21:19:29,338 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.848e+05 2024-08-14 21:19:32,470 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2852820.0, ans=0.125 2024-08-14 21:19:35,086 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 9950, loss[loss=0.1129, beats_loss=0.008109, ecapa_loss=0.0001576, whisper_loss=0.1032, over 16963.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01063, ecapa_loss=0.0001549, whisper_loss=0.0911, over 3821039.22 frames. ], batch size: 64, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:19:55,072 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-14 21:19:56,836 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2853020.0, ans=0.125 2024-08-14 21:20:16,779 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.01 vs. limit=15.0 2024-08-14 21:20:46,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2853320.0, ans=0.1 2024-08-14 21:20:47,878 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 21:20:51,972 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 10000, loss[loss=0.1139, beats_loss=0.01112, ecapa_loss=0.000109, whisper_loss=0.1017, over 21860.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01063, ecapa_loss=0.000154, whisper_loss=0.09156, over 3856282.81 frames. ], batch size: 83, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:21:02,253 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2853420.0, ans=0.0 2024-08-14 21:21:11,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2853520.0, ans=0.125 2024-08-14 21:21:18,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2853520.0, ans=0.1 2024-08-14 21:21:24,559 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 17 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-14 21:21:28,840 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.381e+01 2.626e+01 2.960e+01 1.740e+02, threshold=5.252e+01, percent-clipped=1.0 2024-08-14 21:21:50,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2853720.0, ans=0.125 2024-08-14 21:22:08,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2853920.0, ans=0.125 2024-08-14 21:22:08,827 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 10050, loss[loss=0.09661, beats_loss=0.01026, ecapa_loss=0.0001464, whisper_loss=0.08489, over 19551.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01055, ecapa_loss=0.0001533, whisper_loss=0.092, over 3873457.34 frames. ], batch size: 77, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:22:13,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2853920.0, ans=0.0 2024-08-14 21:22:14,286 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.30 vs. limit=22.5 2024-08-14 21:22:33,511 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.19 vs. limit=10.0 2024-08-14 21:22:41,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2854120.0, ans=0.125 2024-08-14 21:22:41,735 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2854120.0, ans=0.0 2024-08-14 21:23:00,706 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.24 vs. limit=22.5 2024-08-14 21:23:23,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2854320.0, ans=0.125 2024-08-14 21:23:23,173 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2854320.0, ans=0.125 2024-08-14 21:23:28,129 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=5.447e-03 2024-08-14 21:23:29,307 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 26 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-14 21:23:30,688 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 10100, loss[loss=0.1046, beats_loss=0.01137, ecapa_loss=0.0001611, whisper_loss=0.09159, over 20776.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01063, ecapa_loss=0.0001529, whisper_loss=0.09132, over 3893565.83 frames. ], batch size: 87, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:23:39,311 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-14 21:23:42,615 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2854420.0, ans=0.2 2024-08-14 21:23:42,733 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.20 vs. limit=15.0 2024-08-14 21:24:00,148 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.19 vs. limit=15.0 2024-08-14 21:24:01,198 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2854520.0, ans=0.0 2024-08-14 21:24:04,833 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2854620.0, ans=0.1 2024-08-14 21:24:10,652 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.362e+01 2.668e+01 2.989e+01 1.433e+02, threshold=5.336e+01, percent-clipped=3.0 2024-08-14 21:24:25,895 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2854720.0, ans=0.125 2024-08-14 21:24:27,565 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-14 21:24:35,836 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2854820.0, ans=0.1 2024-08-14 21:24:52,691 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 10150, loss[loss=0.0841, beats_loss=0.01072, ecapa_loss=0.0001655, whisper_loss=0.07172, over 18076.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.0106, ecapa_loss=0.0001539, whisper_loss=0.09139, over 3906657.21 frames. ], batch size: 75, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:24:53,632 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 31 from LS+wenet, 29 from Vox, 21 fro AS 2024-08-14 21:24:58,232 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2854920.0, ans=0.0 2024-08-14 21:25:28,827 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 17 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-14 21:26:01,778 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 25 from LS+wenet, 11 from Vox, 19 fro AS 2024-08-14 21:26:09,557 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 15 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-14 21:26:10,709 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 10200, loss[loss=0.08244, beats_loss=0.01225, ecapa_loss=0.00013, whisper_loss=0.06888, over 16024.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01071, ecapa_loss=0.0001529, whisper_loss=0.09073, over 3902554.88 frames. ], batch size: 64, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:26:32,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2855520.0, ans=0.0 2024-08-14 21:26:46,193 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.694e+01 2.377e+01 2.660e+01 3.071e+01 4.492e+01, threshold=5.321e+01, percent-clipped=0.0 2024-08-14 21:27:20,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2855820.0, ans=0.1 2024-08-14 21:27:21,359 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 23 from LS+wenet, 11 from Vox, 40 fro AS 2024-08-14 21:27:23,799 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 10250, loss[loss=0.07986, beats_loss=0.0123, ecapa_loss=0.0001822, whisper_loss=0.06574, over 19488.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01063, ecapa_loss=0.0001534, whisper_loss=0.09107, over 3911405.90 frames. ], batch size: 84, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:27:58,491 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2856120.0, ans=0.0 2024-08-14 21:28:10,195 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 21 from LS+wenet, 24 from Vox, 19 fro AS 2024-08-14 21:28:19,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2856220.0, ans=0.015 2024-08-14 21:28:29,698 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2856320.0, ans=0.125 2024-08-14 21:28:34,454 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.27 vs. limit=15.0 2024-08-14 21:28:38,193 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 10300, loss[loss=0.1, beats_loss=0.0114, ecapa_loss=0.0001158, whisper_loss=0.08749, over 22920.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01063, ecapa_loss=0.000153, whisper_loss=0.09089, over 3878837.39 frames. ], batch size: 91, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:29:01,688 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.73 vs. limit=22.5 2024-08-14 21:29:14,012 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.734e+01 2.284e+01 2.585e+01 2.983e+01 4.241e+01, threshold=5.169e+01, percent-clipped=0.0 2024-08-14 21:29:16,292 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-14 21:29:17,440 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 31 from LS+wenet, 13 from Vox, 41 fro AS 2024-08-14 21:29:25,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2856720.0, ans=0.125 2024-08-14 21:29:40,225 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=15.03 vs. limit=15.0 2024-08-14 21:29:47,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2856820.0, ans=0.04949747468305833 2024-08-14 21:29:52,495 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.73 vs. limit=15.0 2024-08-14 21:29:53,032 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 10350, loss[loss=0.08496, beats_loss=0.01148, ecapa_loss=0.0001114, whisper_loss=0.07236, over 18864.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01067, ecapa_loss=0.0001517, whisper_loss=0.09074, over 3880834.85 frames. ], batch size: 73, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:29:54,781 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 21 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-14 21:30:27,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2857120.0, ans=0.0 2024-08-14 21:30:33,860 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2857120.0, ans=0.125 2024-08-14 21:30:39,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2857220.0, ans=0.1 2024-08-14 21:31:29,637 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.00 vs. limit=22.5 2024-08-14 21:31:31,639 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 10400, loss[loss=0.1153, beats_loss=0.0109, ecapa_loss=0.0001441, whisper_loss=0.103, over 15688.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01069, ecapa_loss=0.0001524, whisper_loss=0.09051, over 3855039.15 frames. ], batch size: 63, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:31:49,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2857520.0, ans=0.0 2024-08-14 21:32:07,178 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2857620.0, ans=0.0 2024-08-14 21:32:08,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2857620.0, ans=0.1 2024-08-14 21:32:14,824 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.904e+01 2.400e+01 2.611e+01 2.963e+01 4.216e+01, threshold=5.223e+01, percent-clipped=0.0 2024-08-14 21:32:38,199 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 29 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-14 21:32:41,968 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2857820.0, ans=0.125 2024-08-14 21:32:50,707 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.232e+01 2024-08-14 21:32:59,980 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 10450, loss[loss=0.1129, beats_loss=0.009956, ecapa_loss=0.0001631, whisper_loss=0.1013, over 23628.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01064, ecapa_loss=0.0001539, whisper_loss=0.0904, over 3849897.77 frames. ], batch size: 95, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:33:00,218 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 23 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-14 21:33:15,910 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 19 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-14 21:33:21,952 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2858020.0, ans=0.0 2024-08-14 21:33:21,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2858020.0, ans=0.0 2024-08-14 21:33:21,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2858020.0, ans=0.125 2024-08-14 21:33:58,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2858220.0, ans=0.125 2024-08-14 21:34:02,195 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2858220.0, ans=0.0 2024-08-14 21:34:17,913 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 18 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-14 21:34:29,176 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 10500, loss[loss=0.1015, beats_loss=0.0122, ecapa_loss=0.0001262, whisper_loss=0.08806, over 19872.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01063, ecapa_loss=0.0001535, whisper_loss=0.09037, over 3836965.82 frames. ], batch size: 78, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:34:59,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2858520.0, ans=0.125 2024-08-14 21:35:11,024 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.991e+01 2.314e+01 2.587e+01 2.967e+01 4.494e+01, threshold=5.174e+01, percent-clipped=0.0 2024-08-14 21:35:22,341 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-14 21:35:41,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2858820.0, ans=0.0 2024-08-14 21:35:45,465 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.00 vs. limit=15.0 2024-08-14 21:35:56,193 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 10550, loss[loss=0.1056, beats_loss=0.00759, ecapa_loss=0.0002036, whisper_loss=0.09594, over 21580.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01055, ecapa_loss=0.0001545, whisper_loss=0.09059, over 3860846.90 frames. ], batch size: 93, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:36:12,270 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 21 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-14 21:36:43,081 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 24 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-14 21:36:47,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2859120.0, ans=0.05 2024-08-14 21:36:52,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2859220.0, ans=0.09899494936611666 2024-08-14 21:36:59,466 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2859220.0, ans=0.125 2024-08-14 21:37:14,679 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 16 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-14 21:37:24,488 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.18 vs. limit=15.0 2024-08-14 21:37:25,087 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 10600, loss[loss=0.0965, beats_loss=0.01264, ecapa_loss=0.0001462, whisper_loss=0.0824, over 19904.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01056, ecapa_loss=0.0001554, whisper_loss=0.0908, over 3886753.81 frames. ], batch size: 82, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:37:29,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2859420.0, ans=0.125 2024-08-14 21:37:49,754 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 29 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-14 21:38:03,100 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 27 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-14 21:38:07,048 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.377e+01 2.615e+01 3.017e+01 5.904e+01, threshold=5.231e+01, percent-clipped=2.0 2024-08-14 21:38:07,347 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 22 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-14 21:38:26,575 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 26 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-14 21:38:30,892 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2859720.0, ans=0.0 2024-08-14 21:38:52,471 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 10650, loss[loss=0.108, beats_loss=0.01152, ecapa_loss=0.000132, whisper_loss=0.09512, over 22977.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01057, ecapa_loss=0.0001545, whisper_loss=0.09105, over 3871169.69 frames. ], batch size: 91, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:39:11,787 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 20 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-14 21:39:35,197 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 16 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 21:39:48,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2860220.0, ans=0.0 2024-08-14 21:39:48,358 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2860220.0, ans=0.0 2024-08-14 21:40:15,841 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 10700, loss[loss=0.08501, beats_loss=0.01293, ecapa_loss=0.0001423, whisper_loss=0.07066, over 21793.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0106, ecapa_loss=0.0001538, whisper_loss=0.09109, over 3886195.57 frames. ], batch size: 92, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:40:22,920 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 22 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-14 21:40:29,984 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 19 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-14 21:40:35,148 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-14 21:40:50,330 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 21:40:52,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2860620.0, ans=0.0 2024-08-14 21:40:57,179 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.913e+01 2.378e+01 2.610e+01 2.912e+01 4.621e+02, threshold=5.220e+01, percent-clipped=2.0 2024-08-14 21:41:03,393 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.48 vs. limit=22.5 2024-08-14 21:41:09,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2860720.0, ans=0.125 2024-08-14 21:41:19,352 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 20 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-14 21:41:40,905 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 10750, loss[loss=0.09675, beats_loss=0.01042, ecapa_loss=0.0001742, whisper_loss=0.08458, over 22108.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01063, ecapa_loss=0.0001541, whisper_loss=0.09124, over 3907567.51 frames. ], batch size: 93, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:42:06,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2861020.0, ans=0.125 2024-08-14 21:42:18,195 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 27 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-14 21:43:00,389 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 28 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-14 21:43:09,076 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 10800, loss[loss=0.08655, beats_loss=0.01124, ecapa_loss=0.0001395, whisper_loss=0.07391, over 15044.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0107, ecapa_loss=0.0001521, whisper_loss=0.09151, over 3915867.59 frames. ], batch size: 59, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:43:24,597 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2861520.0, ans=0.0 2024-08-14 21:43:49,122 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.901e+01 2.222e+01 2.555e+01 2.864e+01 4.186e+01, threshold=5.109e+01, percent-clipped=0.0 2024-08-14 21:43:51,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2861620.0, ans=0.125 2024-08-14 21:44:34,062 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 10850, loss[loss=0.1065, beats_loss=0.009738, ecapa_loss=0.0001674, whisper_loss=0.09511, over 17887.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01058, ecapa_loss=0.0001538, whisper_loss=0.09225, over 3936040.76 frames. ], batch size: 72, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:44:49,081 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 16 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-14 21:45:06,470 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2862120.0, ans=0.125 2024-08-14 21:45:42,922 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-14 21:45:59,111 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 10900, loss[loss=0.1143, beats_loss=0.009176, ecapa_loss=0.0001741, whisper_loss=0.1034, over 22866.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01062, ecapa_loss=0.0001542, whisper_loss=0.09176, over 3934267.35 frames. ], batch size: 92, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:46:06,516 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 24 from LS+wenet, 10 from Vox, 34 fro AS 2024-08-14 21:46:13,584 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 23 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-14 21:46:22,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2862520.0, ans=0.125 2024-08-14 21:46:27,076 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=6.387e-02 2024-08-14 21:46:39,958 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.934e+01 2.360e+01 2.623e+01 2.983e+01 2.734e+02, threshold=5.246e+01, percent-clipped=0.0 2024-08-14 21:46:40,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2862620.0, ans=0.07 2024-08-14 21:46:46,081 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 25 from LS+wenet, 12 from Vox, 21 fro AS 2024-08-14 21:46:52,215 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.01 vs. limit=10.0 2024-08-14 21:47:09,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2862820.0, ans=0.0 2024-08-14 21:47:20,716 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-14 21:47:25,514 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 10950, loss[loss=0.1082, beats_loss=0.00897, ecapa_loss=0.0001601, whisper_loss=0.09768, over 16313.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01065, ecapa_loss=0.0001537, whisper_loss=0.09153, over 3905256.75 frames. ], batch size: 63, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:47:25,751 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 21 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-14 21:47:36,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2862920.0, ans=0.125 2024-08-14 21:47:55,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2863020.0, ans=0.125 2024-08-14 21:48:17,290 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-14 21:48:17,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2863220.0, ans=0.0 2024-08-14 21:48:17,931 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.40 vs. limit=15.0 2024-08-14 21:48:24,420 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2863220.0, ans=0.0 2024-08-14 21:48:30,607 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 28 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-14 21:48:50,385 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 11000, loss[loss=0.07896, beats_loss=0.009736, ecapa_loss=0.0001714, whisper_loss=0.06751, over 15128.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01051, ecapa_loss=0.0001542, whisper_loss=0.09212, over 3918070.88 frames. ], batch size: 60, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:48:54,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2863420.0, ans=0.0 2024-08-14 21:49:30,874 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.380e+01 2.630e+01 2.844e+01 1.265e+02, threshold=5.261e+01, percent-clipped=2.0 2024-08-14 21:49:47,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2863720.0, ans=0.1 2024-08-14 21:49:47,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2863720.0, ans=0.2 2024-08-14 21:50:06,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2863820.0, ans=0.125 2024-08-14 21:50:15,524 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 11050, loss[loss=0.1068, beats_loss=0.01123, ecapa_loss=0.0001525, whisper_loss=0.09406, over 21905.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01057, ecapa_loss=0.0001549, whisper_loss=0.09227, over 3939524.67 frames. ], batch size: 89, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:50:19,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2863920.0, ans=0.025 2024-08-14 21:50:20,731 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 20 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-14 21:50:24,309 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 28 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-14 21:50:25,745 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 17 from LS+wenet, 19 from Vox, 17 fro AS 2024-08-14 21:50:34,147 WARNING [optim.py:496] (0/4) Scaling gradients by 0.04806042090058327, model_norm_threshold=52.6092414855957 2024-08-14 21:50:34,343 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.31, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.750e+05, grad_sumsq=3.750e+05, orig_rms_sq=1.000e+00 2024-08-14 21:50:43,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2864020.0, ans=0.125 2024-08-14 21:50:48,171 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-14 21:50:59,245 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 22 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-14 21:51:39,824 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 11100, loss[loss=0.0949, beats_loss=0.009446, ecapa_loss=0.0001451, whisper_loss=0.08401, over 17471.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01056, ecapa_loss=0.0001551, whisper_loss=0.09172, over 3916259.51 frames. ], batch size: 67, lr: 3.10e-03, grad_scale: 1.152921504606847e+18 2024-08-14 21:51:56,443 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.04 vs. limit=22.5 2024-08-14 21:52:11,408 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 35 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-14 21:52:19,195 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=6.297e+00 2024-08-14 21:52:19,836 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.762e+01 2.327e+01 2.589e+01 2.870e+01 1.095e+03, threshold=5.179e+01, percent-clipped=2.0 2024-08-14 21:52:21,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2864620.0, ans=0.2 2024-08-14 21:52:27,661 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-14 21:52:30,613 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-14 21:52:34,192 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2864720.0, ans=0.125 2024-08-14 21:52:48,908 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-14 21:52:49,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2864820.0, ans=0.2 2024-08-14 21:52:51,253 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=13.52 vs. limit=15.0 2024-08-14 21:53:00,489 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.23 vs. limit=15.0 2024-08-14 21:53:04,728 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 11150, loss[loss=0.09217, beats_loss=0.01237, ecapa_loss=0.0001609, whisper_loss=0.07819, over 22268.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01062, ecapa_loss=0.0001543, whisper_loss=0.09102, over 3892135.15 frames. ], batch size: 91, lr: 3.10e-03, grad_scale: 1.152921504606847e+18 2024-08-14 21:53:04,916 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-14 21:53:10,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2864920.0, ans=0.0 2024-08-14 21:53:15,168 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2864920.0, ans=0.0 2024-08-14 21:53:31,694 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 19 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-14 21:53:38,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2865120.0, ans=0.125 2024-08-14 21:53:48,984 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2865120.0, ans=0.1 2024-08-14 21:53:53,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2865220.0, ans=0.125 2024-08-14 21:54:02,279 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 35 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-14 21:54:07,101 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-14 21:54:22,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2865320.0, ans=0.07 2024-08-14 21:54:28,298 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 11200, loss[loss=0.1228, beats_loss=0.01003, ecapa_loss=0.0001641, whisper_loss=0.1111, over 21526.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01061, ecapa_loss=0.0001541, whisper_loss=0.09152, over 3893503.41 frames. ], batch size: 85, lr: 3.10e-03, grad_scale: 1.152921504606847e+18 2024-08-14 21:54:53,620 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2865520.0, ans=0.125 2024-08-14 21:55:01,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2865620.0, ans=0.125 2024-08-14 21:55:06,805 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2865620.0, ans=0.09899494936611666 2024-08-14 21:55:07,476 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.876e+01 2.313e+01 2.527e+01 2.769e+01 5.053e+01, threshold=5.054e+01, percent-clipped=0.0 2024-08-14 21:55:20,315 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.34 vs. limit=15.0 2024-08-14 21:55:36,443 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2865820.0, ans=0.125 2024-08-14 21:55:43,290 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.552e+05 2024-08-14 21:55:52,368 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 11250, loss[loss=0.08495, beats_loss=0.01063, ecapa_loss=0.0001458, whisper_loss=0.07286, over 14006.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01059, ecapa_loss=0.0001543, whisper_loss=0.09143, over 3886295.27 frames. ], batch size: 54, lr: 3.10e-03, grad_scale: 1.152921504606847e+18 2024-08-14 21:55:52,926 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2865920.0, ans=10.0 2024-08-14 21:56:45,334 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 26 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-14 21:56:51,745 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 23 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-14 21:57:00,738 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2866320.0, ans=0.125 2024-08-14 21:57:09,862 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.69 vs. limit=15.0 2024-08-14 21:57:18,220 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 11300, loss[loss=0.09935, beats_loss=0.009634, ecapa_loss=0.0001967, whisper_loss=0.08775, over 15563.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01067, ecapa_loss=0.000154, whisper_loss=0.09068, over 3851290.35 frames. ], batch size: 64, lr: 3.10e-03, grad_scale: 1.152921504606847e+18 2024-08-14 21:57:31,530 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.17 vs. limit=15.0 2024-08-14 21:57:52,721 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-14 21:57:55,101 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.84 vs. limit=15.0 2024-08-14 21:57:58,623 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.341e+01 2.610e+01 2.928e+01 1.579e+02, threshold=5.221e+01, percent-clipped=2.0 2024-08-14 21:58:17,535 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 14 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-14 21:58:36,026 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2866820.0, ans=0.025 2024-08-14 21:58:39,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2866820.0, ans=0.025 2024-08-14 21:58:41,868 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 11350, loss[loss=0.07948, beats_loss=0.0117, ecapa_loss=0.0001586, whisper_loss=0.0662, over 19245.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0106, ecapa_loss=0.0001549, whisper_loss=0.09098, over 3860789.47 frames. ], batch size: 78, lr: 3.10e-03, grad_scale: 1.152921504606847e+18 2024-08-14 21:58:47,199 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 20 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-14 21:58:57,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2867020.0, ans=0.07 2024-08-14 21:59:59,615 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.60 vs. limit=15.0 2024-08-14 22:00:08,087 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 11400, loss[loss=0.08955, beats_loss=0.01261, ecapa_loss=0.0001213, whisper_loss=0.07572, over 21506.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01061, ecapa_loss=0.0001527, whisper_loss=0.09111, over 3854100.04 frames. ], batch size: 87, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:00:12,829 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-14 22:00:16,203 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-14 22:00:49,827 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.006e+01 2.381e+01 2.566e+01 2.831e+01 4.188e+01, threshold=5.133e+01, percent-clipped=0.0 2024-08-14 22:01:09,394 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.96 vs. limit=15.0 2024-08-14 22:01:16,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2867820.0, ans=0.125 2024-08-14 22:01:18,451 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2867820.0, ans=10.0 2024-08-14 22:01:31,237 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 11450, loss[loss=0.1003, beats_loss=0.009125, ecapa_loss=0.000178, whisper_loss=0.08935, over 19779.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01073, ecapa_loss=0.0001529, whisper_loss=0.09063, over 3886414.75 frames. ], batch size: 81, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:01:45,252 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-14 22:01:54,103 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 22:02:00,923 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2868020.0, ans=0.125 2024-08-14 22:02:05,548 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2868120.0, ans=0.125 2024-08-14 22:02:17,080 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.51 vs. limit=15.0 2024-08-14 22:02:29,970 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 20 from LS+wenet, 11 from Vox, 34 fro AS 2024-08-14 22:02:33,055 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.74 vs. limit=15.0 2024-08-14 22:02:49,244 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2868320.0, ans=0.1 2024-08-14 22:02:50,627 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 26 from LS+wenet, 11 from Vox, 17 fro AS 2024-08-14 22:02:53,161 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 11500, loss[loss=0.1164, beats_loss=0.008171, ecapa_loss=0.0001716, whisper_loss=0.1065, over 22857.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01066, ecapa_loss=0.0001537, whisper_loss=0.09125, over 3892020.13 frames. ], batch size: 94, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:02:54,612 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.52 vs. limit=12.0 2024-08-14 22:03:10,541 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 35 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-14 22:03:15,766 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2868520.0, ans=0.1 2024-08-14 22:03:19,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2868520.0, ans=0.125 2024-08-14 22:03:25,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2868620.0, ans=0.125 2024-08-14 22:03:34,594 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.943e+01 2.455e+01 2.723e+01 3.029e+01 4.016e+01, threshold=5.445e+01, percent-clipped=0.0 2024-08-14 22:03:44,422 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2868720.0, ans=0.125 2024-08-14 22:03:45,985 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 25 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-14 22:03:46,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2868720.0, ans=0.125 2024-08-14 22:03:47,469 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 17 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-14 22:03:52,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2868720.0, ans=0.0 2024-08-14 22:04:08,310 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.60 vs. limit=22.5 2024-08-14 22:04:18,419 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 11550, loss[loss=0.1251, beats_loss=0.006408, ecapa_loss=0.0002288, whisper_loss=0.1164, over 18344.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01071, ecapa_loss=0.0001538, whisper_loss=0.09052, over 3883887.94 frames. ], batch size: 73, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:04:41,804 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-14 22:05:13,626 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 16 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-14 22:05:15,270 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 23 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-14 22:05:25,856 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 22:05:34,043 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2869320.0, ans=0.125 2024-08-14 22:05:39,161 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.91 vs. limit=15.0 2024-08-14 22:05:39,795 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 11600, loss[loss=0.09861, beats_loss=0.0115, ecapa_loss=0.0001476, whisper_loss=0.08563, over 20529.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01064, ecapa_loss=0.000154, whisper_loss=0.09098, over 3863353.18 frames. ], batch size: 83, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:06:04,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2869520.0, ans=0.1 2024-08-14 22:06:10,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2869620.0, ans=0.0 2024-08-14 22:06:19,927 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.685e+01 2.332e+01 2.648e+01 3.162e+01 2.380e+02, threshold=5.297e+01, percent-clipped=2.0 2024-08-14 22:06:41,569 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-14 22:06:51,100 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-14 22:06:55,971 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-08-14 22:07:00,497 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 11650, loss[loss=0.08863, beats_loss=0.01067, ecapa_loss=0.0001498, whisper_loss=0.07646, over 15170.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01075, ecapa_loss=0.0001525, whisper_loss=0.09074, over 3863233.14 frames. ], batch size: 60, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:07:07,029 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2869920.0, ans=0.125 2024-08-14 22:07:29,241 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.95 vs. limit=15.0 2024-08-14 22:07:32,607 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2870020.0, ans=0.0 2024-08-14 22:07:35,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2870120.0, ans=0.125 2024-08-14 22:07:45,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2870120.0, ans=0.04949747468305833 2024-08-14 22:07:53,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2870220.0, ans=0.125 2024-08-14 22:07:56,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2870220.0, ans=0.0 2024-08-14 22:07:59,168 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-14 22:08:01,557 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.78 vs. limit=15.0 2024-08-14 22:08:04,360 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-14 22:08:17,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2870320.0, ans=0.125 2024-08-14 22:08:22,354 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 23 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-14 22:08:23,724 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 11700, loss[loss=0.0916, beats_loss=0.01235, ecapa_loss=0.0001473, whisper_loss=0.07777, over 19437.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01076, ecapa_loss=0.0001536, whisper_loss=0.09069, over 3899958.39 frames. ], batch size: 81, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:08:43,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2870520.0, ans=0.0 2024-08-14 22:09:04,717 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.79 vs. limit=22.5 2024-08-14 22:09:06,457 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.749e+01 2.421e+01 2.717e+01 3.040e+01 4.718e+01, threshold=5.433e+01, percent-clipped=0.0 2024-08-14 22:09:07,460 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.15 vs. limit=22.5 2024-08-14 22:09:13,499 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.67 vs. limit=15.0 2024-08-14 22:09:16,323 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-14 22:09:35,922 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2870820.0, ans=0.125 2024-08-14 22:09:36,026 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2870820.0, ans=0.1 2024-08-14 22:09:37,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2870820.0, ans=0.125 2024-08-14 22:09:38,584 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 23 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-14 22:09:45,275 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 11750, loss[loss=0.0954, beats_loss=0.01155, ecapa_loss=0.0001621, whisper_loss=0.08223, over 17799.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01082, ecapa_loss=0.0001526, whisper_loss=0.09085, over 3908153.14 frames. ], batch size: 75, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:10:00,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2870920.0, ans=0.125 2024-08-14 22:10:04,191 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.13 vs. limit=22.5 2024-08-14 22:10:20,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2871020.0, ans=0.125 2024-08-14 22:10:29,466 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2871120.0, ans=0.035 2024-08-14 22:10:38,149 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2871120.0, ans=0.125 2024-08-14 22:10:53,502 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 14 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-14 22:10:59,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2871220.0, ans=0.05 2024-08-14 22:11:22,269 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 11800, loss[loss=0.09381, beats_loss=0.01154, ecapa_loss=0.0001768, whisper_loss=0.0805, over 20968.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01084, ecapa_loss=0.0001529, whisper_loss=0.09082, over 3911269.91 frames. ], batch size: 88, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:11:25,947 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-14 22:11:34,174 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2871420.0, ans=0.125 2024-08-14 22:11:39,804 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-14 22:11:42,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2871520.0, ans=0.125 2024-08-14 22:11:43,948 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-14 22:11:45,611 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.19 vs. limit=22.5 2024-08-14 22:11:54,314 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 22 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-14 22:11:55,127 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2024-08-14 22:12:00,399 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 33 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-14 22:12:03,170 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.906e+01 2.341e+01 2.560e+01 2.807e+01 8.705e+01, threshold=5.119e+01, percent-clipped=2.0 2024-08-14 22:12:04,870 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2871620.0, ans=0.2 2024-08-14 22:12:12,785 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.12 vs. limit=15.0 2024-08-14 22:12:30,145 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 27 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-14 22:12:41,764 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2871820.0, ans=0.04949747468305833 2024-08-14 22:12:54,657 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2871920.0, ans=0.0 2024-08-14 22:12:55,777 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 11850, loss[loss=0.1055, beats_loss=0.01184, ecapa_loss=0.0001281, whisper_loss=0.09241, over 16219.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0108, ecapa_loss=0.0001523, whisper_loss=0.09087, over 3939193.73 frames. ], batch size: 64, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:13:12,599 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2871920.0, ans=0.0 2024-08-14 22:13:43,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2872120.0, ans=0.2 2024-08-14 22:13:44,150 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.52 vs. limit=6.0 2024-08-14 22:13:56,846 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.12 vs. limit=12.0 2024-08-14 22:13:57,682 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 18 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-14 22:14:17,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2872220.0, ans=0.125 2024-08-14 22:14:42,341 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.24 vs. limit=12.0 2024-08-14 22:14:48,226 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 11900, loss[loss=0.1048, beats_loss=0.01052, ecapa_loss=0.0001565, whisper_loss=0.09272, over 20566.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01077, ecapa_loss=0.0001526, whisper_loss=0.09175, over 3928180.55 frames. ], batch size: 77, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:15:06,030 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 25 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-14 22:15:22,258 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 20 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-14 22:15:44,664 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.256e+01 2.473e+01 2.865e+01 1.430e+02, threshold=4.947e+01, percent-clipped=1.0 2024-08-14 22:16:04,225 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2872720.0, ans=0.125 2024-08-14 22:16:04,265 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2872720.0, ans=0.125 2024-08-14 22:16:21,420 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2872820.0, ans=0.125 2024-08-14 22:16:38,634 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 22:16:40,755 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 11950, loss[loss=0.09403, beats_loss=0.01144, ecapa_loss=0.0001764, whisper_loss=0.08083, over 16904.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01065, ecapa_loss=0.0001539, whisper_loss=0.09194, over 3916466.03 frames. ], batch size: 69, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:16:41,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2872920.0, ans=0.0 2024-08-14 22:16:41,327 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2872920.0, ans=0.125 2024-08-14 22:16:42,819 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2872920.0, ans=0.0 2024-08-14 22:16:51,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2872920.0, ans=0.125 2024-08-14 22:16:55,335 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-14 22:17:15,368 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.84 vs. limit=22.5 2024-08-14 22:17:29,831 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 17 from Vox, 48 fro AS 2024-08-14 22:17:45,050 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-14 22:17:49,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2873220.0, ans=0.0 2024-08-14 22:18:04,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2873220.0, ans=0.1 2024-08-14 22:18:10,639 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2873320.0, ans=0.0 2024-08-14 22:18:18,605 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.21 vs. limit=15.0 2024-08-14 22:18:21,962 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 12000, loss[loss=0.07629, beats_loss=0.01195, ecapa_loss=0.0001137, whisper_loss=0.0632, over 15403.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01071, ecapa_loss=0.0001532, whisper_loss=0.09153, over 3890487.44 frames. ], batch size: 62, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:18:21,964 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-14 22:19:04,384 INFO [train_multi_KD3.py:1149] (0/4) Epoch 20, validation on ASR_libri: loss=0.2521, beats_loss=0, ecapa_loss=0.0005404, whisper_loss=0.2466, over 922467.00 frames. 2024-08-14 22:19:20,867 INFO [train_multi_KD3.py:1149] (0/4) Epoch 20, validation on SV_voxceleb1: loss=0.004324, beats_loss=0, ecapa_loss=0.0004324, whisper_loss=0, over 939242.00 frames. 2024-08-14 22:21:26,121 INFO [train_multi_KD3.py:1149] (0/4) Epoch 20, validation on AT_audioset: loss=0.02348, beats_loss=0.02348, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 22:21:26,125 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32601MB 2024-08-14 22:22:03,636 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.872e+01 2.280e+01 2.500e+01 2.714e+01 9.772e+01, threshold=5.000e+01, percent-clipped=1.0 2024-08-14 22:22:17,699 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2873720.0, ans=0.125 2024-08-14 22:22:30,965 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-14 22:22:35,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2873820.0, ans=0.025 2024-08-14 22:22:41,331 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 12050, loss[loss=0.1057, beats_loss=0.01035, ecapa_loss=0.0001617, whisper_loss=0.09375, over 21449.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01075, ecapa_loss=0.0001525, whisper_loss=0.09082, over 3874407.19 frames. ], batch size: 87, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:22:43,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2873920.0, ans=0.125 2024-08-14 22:22:54,131 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.25 vs. limit=22.5 2024-08-14 22:23:01,218 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=2874020.0, ans=10.0 2024-08-14 22:23:20,383 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.76 vs. limit=10.0 2024-08-14 22:23:24,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2874220.0, ans=0.1 2024-08-14 22:23:46,168 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2874320.0, ans=0.1 2024-08-14 22:23:47,522 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-14 22:23:52,680 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.62 vs. limit=15.0 2024-08-14 22:23:56,915 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 12100, loss[loss=0.1012, beats_loss=0.009963, ecapa_loss=0.0001334, whisper_loss=0.08986, over 23620.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01073, ecapa_loss=0.0001514, whisper_loss=0.09079, over 3866018.24 frames. ], batch size: 92, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:24:03,466 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2874420.0, ans=0.125 2024-08-14 22:24:07,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2874420.0, ans=0.1 2024-08-14 22:24:20,439 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 22 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-14 22:24:31,785 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.43 vs. limit=6.0 2024-08-14 22:24:34,440 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-14 22:24:35,828 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.787e+01 2.418e+01 2.574e+01 2.910e+01 4.724e+01, threshold=5.149e+01, percent-clipped=0.0 2024-08-14 22:24:39,398 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2874620.0, ans=0.125 2024-08-14 22:24:46,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2874720.0, ans=0.0 2024-08-14 22:25:06,986 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2874820.0, ans=0.125 2024-08-14 22:25:13,579 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 12150, loss[loss=0.09783, beats_loss=0.01203, ecapa_loss=0.0001374, whisper_loss=0.08442, over 22758.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01065, ecapa_loss=0.0001526, whisper_loss=0.09123, over 3851338.25 frames. ], batch size: 91, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:25:30,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2875020.0, ans=0.2 2024-08-14 22:25:57,205 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2875220.0, ans=0.05 2024-08-14 22:25:57,537 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.52 vs. limit=22.5 2024-08-14 22:26:06,770 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.33 vs. limit=15.0 2024-08-14 22:26:23,021 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 23 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-14 22:26:24,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=2875320.0, ans=15.0 2024-08-14 22:26:28,812 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 12200, loss[loss=0.1117, beats_loss=0.008454, ecapa_loss=0.0001441, whisper_loss=0.1018, over 20855.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01066, ecapa_loss=0.0001534, whisper_loss=0.09077, over 3858238.21 frames. ], batch size: 82, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:26:40,930 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 21 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-14 22:26:57,934 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-14 22:26:59,461 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 22 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-14 22:27:02,928 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.32 vs. limit=15.0 2024-08-14 22:27:03,200 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.93 vs. limit=15.0 2024-08-14 22:27:06,355 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.310e+01 2.621e+01 2.982e+01 1.533e+02, threshold=5.242e+01, percent-clipped=1.0 2024-08-14 22:27:19,617 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-14 22:27:19,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2875720.0, ans=0.0 2024-08-14 22:27:24,519 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2875720.0, ans=0.125 2024-08-14 22:27:45,784 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 12250, loss[loss=0.09543, beats_loss=0.01171, ecapa_loss=0.0001736, whisper_loss=0.08198, over 22057.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01061, ecapa_loss=0.0001537, whisper_loss=0.0906, over 3834569.16 frames. ], batch size: 93, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:27:52,227 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 26 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-14 22:27:55,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2875920.0, ans=0.125 2024-08-14 22:27:57,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2875920.0, ans=0.1 2024-08-14 22:28:00,189 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2876020.0, ans=0.0 2024-08-14 22:28:00,189 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2876020.0, ans=0.125 2024-08-14 22:28:29,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2876120.0, ans=0.125 2024-08-14 22:28:30,611 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 32 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-14 22:28:38,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2876220.0, ans=0.05 2024-08-14 22:29:00,007 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 27 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-14 22:29:02,581 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 12300, loss[loss=0.07554, beats_loss=0.009219, ecapa_loss=0.0001515, whisper_loss=0.06481, over 15408.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01054, ecapa_loss=0.0001536, whisper_loss=0.09067, over 3820290.44 frames. ], batch size: 60, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:29:10,769 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2876420.0, ans=0.0 2024-08-14 22:29:22,429 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.04 vs. limit=15.0 2024-08-14 22:29:31,365 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 22:29:32,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2876620.0, ans=0.125 2024-08-14 22:29:39,439 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.742e+01 2.269e+01 2.570e+01 2.894e+01 3.715e+01, threshold=5.139e+01, percent-clipped=0.0 2024-08-14 22:30:04,153 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-14 22:30:05,607 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 23 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-14 22:30:11,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2876820.0, ans=0.2 2024-08-14 22:30:16,627 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-14 22:30:17,876 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 12350, loss[loss=0.1153, beats_loss=0.01053, ecapa_loss=0.0001532, whisper_loss=0.1032, over 23336.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01054, ecapa_loss=0.0001533, whisper_loss=0.09108, over 3842154.40 frames. ], batch size: 92, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:30:18,067 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-14 22:30:22,984 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2876920.0, ans=0.125 2024-08-14 22:30:33,218 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2877020.0, ans=0.0 2024-08-14 22:30:42,908 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.12 vs. limit=12.0 2024-08-14 22:30:43,960 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-14 22:30:47,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2877020.0, ans=0.125 2024-08-14 22:30:49,206 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.33 vs. limit=15.0 2024-08-14 22:31:00,099 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.50 vs. limit=15.0 2024-08-14 22:31:11,493 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2877220.0, ans=0.2 2024-08-14 22:31:19,168 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2877320.0, ans=0.09899494936611666 2024-08-14 22:31:34,817 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 12400, loss[loss=0.1131, beats_loss=0.01051, ecapa_loss=0.0001554, whisper_loss=0.1011, over 22630.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0105, ecapa_loss=0.0001529, whisper_loss=0.09158, over 3870912.95 frames. ], batch size: 93, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:31:34,989 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-14 22:31:42,035 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 31 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-14 22:31:42,293 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2877420.0, ans=0.2 2024-08-14 22:31:47,777 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.61 vs. limit=15.0 2024-08-14 22:31:59,138 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 15 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-14 22:32:12,108 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-14 22:32:13,143 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.352e+01 2.633e+01 2.974e+01 1.809e+02, threshold=5.265e+01, percent-clipped=2.0 2024-08-14 22:32:15,265 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2877620.0, ans=0.2 2024-08-14 22:32:31,687 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.98 vs. limit=15.0 2024-08-14 22:32:49,918 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 12450, loss[loss=0.08752, beats_loss=0.01014, ecapa_loss=0.0001836, whisper_loss=0.07554, over 16159.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01046, ecapa_loss=0.0001538, whisper_loss=0.09189, over 3885278.37 frames. ], batch size: 68, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:33:04,165 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.84 vs. limit=15.0 2024-08-14 22:33:12,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2878020.0, ans=0.1 2024-08-14 22:33:27,942 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2878120.0, ans=0.1 2024-08-14 22:33:27,953 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2878120.0, ans=0.125 2024-08-14 22:33:39,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2878220.0, ans=0.125 2024-08-14 22:33:48,594 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 20 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-14 22:33:59,799 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.57 vs. limit=15.0 2024-08-14 22:34:04,794 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 12500, loss[loss=0.1264, beats_loss=0.008815, ecapa_loss=0.0001459, whisper_loss=0.1161, over 23708.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01052, ecapa_loss=0.0001532, whisper_loss=0.0918, over 3898317.64 frames. ], batch size: 92, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:34:16,677 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2878420.0, ans=0.2 2024-08-14 22:34:25,267 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2878520.0, ans=0.1 2024-08-14 22:34:27,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2878520.0, ans=0.2 2024-08-14 22:34:28,511 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 43 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-14 22:34:31,991 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2878520.0, ans=0.125 2024-08-14 22:34:34,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2878620.0, ans=0.0 2024-08-14 22:34:43,703 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.731e+01 2.340e+01 2.618e+01 2.965e+01 2.177e+02, threshold=5.235e+01, percent-clipped=3.0 2024-08-14 22:34:47,189 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=2878620.0, ans=22.5 2024-08-14 22:34:56,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2878720.0, ans=0.125 2024-08-14 22:35:21,063 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 12550, loss[loss=0.1221, beats_loss=0.01151, ecapa_loss=0.0001173, whisper_loss=0.1094, over 23697.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01055, ecapa_loss=0.0001525, whisper_loss=0.09202, over 3910270.26 frames. ], batch size: 90, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:35:22,698 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 14 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-14 22:35:42,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2879020.0, ans=0.0 2024-08-14 22:35:51,790 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2879120.0, ans=0.0 2024-08-14 22:36:00,178 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2024-08-14 22:36:03,887 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-14 22:36:07,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2879220.0, ans=0.2 2024-08-14 22:36:35,780 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 12600, loss[loss=0.105, beats_loss=0.008808, ecapa_loss=0.0001741, whisper_loss=0.0945, over 21637.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01056, ecapa_loss=0.0001528, whisper_loss=0.0922, over 3923971.65 frames. ], batch size: 87, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:36:36,193 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2879420.0, ans=0.125 2024-08-14 22:36:37,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2879420.0, ans=0.125 2024-08-14 22:36:38,006 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.62 vs. limit=22.5 2024-08-14 22:36:55,527 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.30 vs. limit=6.0 2024-08-14 22:36:58,200 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2879520.0, ans=0.125 2024-08-14 22:37:04,630 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.05 vs. limit=15.0 2024-08-14 22:37:08,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2879620.0, ans=0.125 2024-08-14 22:37:10,984 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-14 22:37:12,807 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-14 22:37:13,827 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.820e+01 2.404e+01 2.680e+01 3.035e+01 5.751e+01, threshold=5.360e+01, percent-clipped=1.0 2024-08-14 22:37:37,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2879820.0, ans=0.0 2024-08-14 22:37:51,545 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 12650, loss[loss=0.08944, beats_loss=0.01224, ecapa_loss=0.0001544, whisper_loss=0.07566, over 18617.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01066, ecapa_loss=0.0001539, whisper_loss=0.09107, over 3915737.06 frames. ], batch size: 73, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:37:55,905 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-14 22:38:01,937 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-288000.pt 2024-08-14 22:38:07,568 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2879920.0, ans=0.1 2024-08-14 22:39:05,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2880320.0, ans=0.125 2024-08-14 22:39:09,567 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 12700, loss[loss=0.09976, beats_loss=0.01074, ecapa_loss=0.0001946, whisper_loss=0.08708, over 21146.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01069, ecapa_loss=0.0001534, whisper_loss=0.09097, over 3902981.57 frames. ], batch size: 90, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:39:22,446 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 28 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-14 22:39:23,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2880520.0, ans=0.125 2024-08-14 22:39:28,272 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 29 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-14 22:39:30,142 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2880520.0, ans=0.0 2024-08-14 22:39:44,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2880620.0, ans=0.125 2024-08-14 22:39:48,473 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.668e+01 2.293e+01 2.553e+01 2.866e+01 4.469e+01, threshold=5.107e+01, percent-clipped=0.0 2024-08-14 22:40:16,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2880820.0, ans=0.125 2024-08-14 22:40:22,652 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 17 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-14 22:40:23,324 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.43 vs. limit=22.5 2024-08-14 22:40:25,248 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-14 22:40:26,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2880920.0, ans=0.1 2024-08-14 22:40:28,372 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 12750, loss[loss=0.1132, beats_loss=0.00905, ecapa_loss=0.0001578, whisper_loss=0.1025, over 22242.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01072, ecapa_loss=0.0001548, whisper_loss=0.09021, over 3870292.14 frames. ], batch size: 91, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:40:36,741 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2880920.0, ans=0.125 2024-08-14 22:40:54,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2881020.0, ans=0.125 2024-08-14 22:41:05,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2881120.0, ans=0.1 2024-08-14 22:41:32,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2881320.0, ans=0.125 2024-08-14 22:41:41,385 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 18 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-14 22:41:47,794 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 12800, loss[loss=0.08661, beats_loss=0.01019, ecapa_loss=0.0002072, whisper_loss=0.07435, over 20237.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01077, ecapa_loss=0.0001556, whisper_loss=0.08986, over 3868421.02 frames. ], batch size: 91, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:42:08,934 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-14 22:42:27,546 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.949e+01 2.348e+01 2.569e+01 2.991e+01 4.323e+01, threshold=5.137e+01, percent-clipped=0.0 2024-08-14 22:42:41,088 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2881720.0, ans=0.0 2024-08-14 22:42:50,350 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2881820.0, ans=0.125 2024-08-14 22:42:58,881 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.72 vs. limit=6.0 2024-08-14 22:43:07,198 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 12850, loss[loss=0.1115, beats_loss=0.01082, ecapa_loss=0.0001501, whisper_loss=0.09917, over 21602.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01071, ecapa_loss=0.0001568, whisper_loss=0.09051, over 3852450.17 frames. ], batch size: 85, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:43:34,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2882020.0, ans=0.125 2024-08-14 22:43:40,857 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.94 vs. limit=15.0 2024-08-14 22:44:26,245 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 12900, loss[loss=0.08351, beats_loss=0.01066, ecapa_loss=0.0001189, whisper_loss=0.07166, over 16523.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01078, ecapa_loss=0.0001556, whisper_loss=0.08956, over 3837658.44 frames. ], batch size: 62, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:44:49,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2882520.0, ans=0.0 2024-08-14 22:44:53,975 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-14 22:44:55,974 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2882520.0, ans=0.1 2024-08-14 22:44:58,674 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 22 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-14 22:45:02,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2882620.0, ans=0.2 2024-08-14 22:45:06,658 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.839e+01 2.310e+01 2.541e+01 3.103e+01 4.872e+01, threshold=5.081e+01, percent-clipped=0.0 2024-08-14 22:45:07,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2882620.0, ans=0.0 2024-08-14 22:45:11,884 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 26 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-14 22:45:15,294 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 22:45:16,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2882720.0, ans=0.125 2024-08-14 22:45:34,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2882820.0, ans=0.1 2024-08-14 22:45:42,131 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 23 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-14 22:45:43,815 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 22 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-14 22:45:46,684 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 12950, loss[loss=0.09214, beats_loss=0.01135, ecapa_loss=0.000138, whisper_loss=0.07941, over 13678.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01073, ecapa_loss=0.000156, whisper_loss=0.08962, over 3843904.84 frames. ], batch size: 53, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:45:56,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2882920.0, ans=0.0 2024-08-14 22:46:23,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2883120.0, ans=0.0 2024-08-14 22:46:43,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2883220.0, ans=0.1 2024-08-14 22:46:46,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2883220.0, ans=0.125 2024-08-14 22:46:51,548 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.07 vs. limit=15.0 2024-08-14 22:47:02,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2883320.0, ans=0.1 2024-08-14 22:47:05,135 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 13000, loss[loss=0.08911, beats_loss=0.01032, ecapa_loss=0.0001839, whisper_loss=0.07695, over 21063.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01069, ecapa_loss=0.000155, whisper_loss=0.08996, over 3856090.48 frames. ], batch size: 86, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:47:22,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2883520.0, ans=0.125 2024-08-14 22:47:24,754 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=2883520.0, ans=0.025 2024-08-14 22:47:32,108 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.06 vs. limit=22.5 2024-08-14 22:47:34,773 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2883520.0, ans=0.0 2024-08-14 22:47:36,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2883620.0, ans=0.125 2024-08-14 22:47:44,854 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.656e+01 2.265e+01 2.555e+01 2.965e+01 9.531e+01, threshold=5.110e+01, percent-clipped=1.0 2024-08-14 22:47:52,358 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 14 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-14 22:48:03,005 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.96 vs. limit=22.5 2024-08-14 22:48:15,489 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 24 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-14 22:48:15,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2883820.0, ans=0.0 2024-08-14 22:48:23,020 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 13050, loss[loss=0.1016, beats_loss=0.01118, ecapa_loss=0.0001704, whisper_loss=0.08876, over 22285.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01067, ecapa_loss=0.0001558, whisper_loss=0.08971, over 3816193.15 frames. ], batch size: 94, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:48:33,969 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.65 vs. limit=15.0 2024-08-14 22:48:46,322 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.35 vs. limit=22.5 2024-08-14 22:48:48,877 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.08 vs. limit=22.5 2024-08-14 22:48:58,986 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2884120.0, ans=0.0 2024-08-14 22:49:00,351 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 18 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-14 22:49:04,905 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-14 22:49:05,357 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.64 vs. limit=15.0 2024-08-14 22:49:11,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=2884220.0, ans=10.0 2024-08-14 22:49:19,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2884220.0, ans=0.125 2024-08-14 22:49:39,055 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 13100, loss[loss=0.1127, beats_loss=0.01069, ecapa_loss=0.0001371, whisper_loss=0.1007, over 19567.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01067, ecapa_loss=0.0001544, whisper_loss=0.08947, over 3848699.80 frames. ], batch size: 79, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:49:40,123 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.51 vs. limit=15.0 2024-08-14 22:49:59,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2884520.0, ans=0.1 2024-08-14 22:50:03,121 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-14 22:50:10,318 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-14 22:50:10,541 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2884620.0, ans=0.125 2024-08-14 22:50:17,756 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.252e+01 2.466e+01 2.762e+01 3.730e+01, threshold=4.933e+01, percent-clipped=0.0 2024-08-14 22:50:55,044 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 13150, loss[loss=0.08141, beats_loss=0.01422, ecapa_loss=0.0001164, whisper_loss=0.06603, over 21887.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01073, ecapa_loss=0.0001526, whisper_loss=0.08965, over 3869347.14 frames. ], batch size: 89, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:51:07,773 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2884920.0, ans=0.2 2024-08-14 22:51:47,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2885220.0, ans=0.1 2024-08-14 22:51:52,044 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 24 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-14 22:51:53,870 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2885220.0, ans=0.1 2024-08-14 22:51:59,616 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2885320.0, ans=0.0 2024-08-14 22:52:06,398 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.41 vs. limit=15.0 2024-08-14 22:52:11,439 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 13200, loss[loss=0.1156, beats_loss=0.01081, ecapa_loss=0.000169, whisper_loss=0.1031, over 22960.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01064, ecapa_loss=0.0001527, whisper_loss=0.0906, over 3874849.77 frames. ], batch size: 95, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:52:14,688 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 20 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-14 22:52:26,540 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2885520.0, ans=0.0 2024-08-14 22:52:31,294 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2885520.0, ans=0.0 2024-08-14 22:52:49,384 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.780e+01 2.330e+01 2.568e+01 2.844e+01 4.836e+01, threshold=5.136e+01, percent-clipped=0.0 2024-08-14 22:52:56,243 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2885720.0, ans=0.09899494936611666 2024-08-14 22:53:06,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2885720.0, ans=0.125 2024-08-14 22:53:09,932 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2885720.0, ans=0.2 2024-08-14 22:53:12,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2885820.0, ans=0.125 2024-08-14 22:53:27,876 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 13250, loss[loss=0.1027, beats_loss=0.01179, ecapa_loss=0.0001496, whisper_loss=0.08938, over 23200.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01071, ecapa_loss=0.0001533, whisper_loss=0.09, over 3864713.62 frames. ], batch size: 92, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:54:05,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2886120.0, ans=0.125 2024-08-14 22:54:11,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2886220.0, ans=0.125 2024-08-14 22:54:13,405 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.432e-01 2024-08-14 22:54:34,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2886320.0, ans=0.125 2024-08-14 22:54:38,175 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 39 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-14 22:54:42,215 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 13300, loss[loss=0.09888, beats_loss=0.0107, ecapa_loss=0.0001467, whisper_loss=0.08672, over 18623.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01067, ecapa_loss=0.0001521, whisper_loss=0.09003, over 3861148.95 frames. ], batch size: 72, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:55:05,700 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 11 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-14 22:55:18,429 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 22:55:20,772 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.690e+01 2.319e+01 2.603e+01 2.951e+01 5.075e+01, threshold=5.206e+01, percent-clipped=0.0 2024-08-14 22:55:24,507 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2886620.0, ans=0.125 2024-08-14 22:55:26,867 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.92 vs. limit=22.5 2024-08-14 22:55:30,693 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.61 vs. limit=10.0 2024-08-14 22:55:32,239 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2886720.0, ans=0.0 2024-08-14 22:55:33,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2886720.0, ans=0.125 2024-08-14 22:55:55,845 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-14 22:55:58,367 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 13350, loss[loss=0.1151, beats_loss=0.0101, ecapa_loss=0.0001942, whisper_loss=0.1031, over 21458.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01065, ecapa_loss=0.0001522, whisper_loss=0.08998, over 3846567.50 frames. ], batch size: 91, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:55:59,447 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=15.09 vs. limit=15.0 2024-08-14 22:56:09,554 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 25 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-14 22:56:09,806 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2886920.0, ans=0.1 2024-08-14 22:56:12,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2887020.0, ans=0.125 2024-08-14 22:56:19,006 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.70 vs. limit=6.0 2024-08-14 22:56:21,567 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 24 from LS+wenet, 16 from Vox, 17 fro AS 2024-08-14 22:56:23,554 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2887020.0, ans=0.125 2024-08-14 22:56:35,599 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2887120.0, ans=0.0 2024-08-14 22:56:52,926 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 29 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-14 22:56:59,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2887320.0, ans=0.09899494936611666 2024-08-14 22:57:00,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2887320.0, ans=0.125 2024-08-14 22:57:03,891 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.29 vs. limit=22.5 2024-08-14 22:57:13,566 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 13400, loss[loss=0.09939, beats_loss=0.01238, ecapa_loss=0.0001346, whisper_loss=0.08566, over 21585.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01062, ecapa_loss=0.0001537, whisper_loss=0.09024, over 3814083.57 frames. ], batch size: 88, lr: 3.09e-03, grad_scale: 1.152921504606847e+18 2024-08-14 22:57:21,336 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2887420.0, ans=0.125 2024-08-14 22:57:33,436 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 16 from LS+wenet, 23 from Vox, 20 fro AS 2024-08-14 22:57:33,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2887520.0, ans=0.0 2024-08-14 22:57:40,764 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 18 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-14 22:57:45,779 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2887620.0, ans=0.125 2024-08-14 22:57:50,681 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.597e+01 2.321e+01 2.514e+01 2.715e+01 4.037e+01, threshold=5.027e+01, percent-clipped=0.0 2024-08-14 22:57:51,194 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 16 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-14 22:57:59,293 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2887720.0, ans=0.1 2024-08-14 22:58:11,852 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-14 22:58:28,182 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 13450, loss[loss=0.103, beats_loss=0.01185, ecapa_loss=0.0001018, whisper_loss=0.09014, over 16455.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01067, ecapa_loss=0.0001535, whisper_loss=0.08987, over 3805534.00 frames. ], batch size: 60, lr: 3.09e-03, grad_scale: 1.152921504606847e+18 2024-08-14 22:58:33,955 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 32 from Vox, 34 fro AS 2024-08-14 22:58:45,756 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 13 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-14 22:58:46,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2888020.0, ans=0.125 2024-08-14 22:59:00,794 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2888120.0, ans=0.0 2024-08-14 22:59:01,943 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-14 22:59:12,265 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 22:59:13,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2888220.0, ans=0.125 2024-08-14 22:59:39,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2888420.0, ans=0.125 2024-08-14 22:59:40,572 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 13500, loss[loss=0.1022, beats_loss=0.01078, ecapa_loss=0.0001954, whisper_loss=0.08943, over 18884.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0107, ecapa_loss=0.0001537, whisper_loss=0.09019, over 3828250.39 frames. ], batch size: 77, lr: 3.09e-03, grad_scale: 1.152921504606847e+18 2024-08-14 23:00:18,919 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.006e+01 2.433e+01 2.626e+01 2.844e+01 4.134e+01, threshold=5.252e+01, percent-clipped=0.0 2024-08-14 23:00:22,471 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 18 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-14 23:00:30,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2888720.0, ans=0.2 2024-08-14 23:00:44,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2888820.0, ans=0.1 2024-08-14 23:00:55,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2888920.0, ans=0.125 2024-08-14 23:00:56,335 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 13550, loss[loss=0.104, beats_loss=0.01044, ecapa_loss=0.0001711, whisper_loss=0.09181, over 22647.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01065, ecapa_loss=0.0001534, whisper_loss=0.09021, over 3835903.78 frames. ], batch size: 91, lr: 3.09e-03, grad_scale: 1.152921504606847e+18 2024-08-14 23:01:02,997 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2888920.0, ans=0.5 2024-08-14 23:01:04,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2888920.0, ans=0.125 2024-08-14 23:01:10,696 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.65 vs. limit=22.5 2024-08-14 23:01:12,781 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-14 23:01:14,095 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 20 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-14 23:01:20,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2889020.0, ans=0.2 2024-08-14 23:01:21,687 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 23:01:33,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2889120.0, ans=0.125 2024-08-14 23:01:42,528 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2889220.0, ans=0.125 2024-08-14 23:01:49,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2889220.0, ans=0.1 2024-08-14 23:01:53,726 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 23 from LS+wenet, 10 from Vox, 32 fro AS 2024-08-14 23:01:59,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2889320.0, ans=0.0 2024-08-14 23:02:09,299 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 13600, loss[loss=0.1027, beats_loss=0.01066, ecapa_loss=0.0001428, whisper_loss=0.09061, over 20705.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01072, ecapa_loss=0.0001531, whisper_loss=0.09037, over 3837248.11 frames. ], batch size: 81, lr: 3.09e-03, grad_scale: 1.152921504606847e+18 2024-08-14 23:02:32,520 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 16 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-14 23:02:44,474 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2889620.0, ans=0.125 2024-08-14 23:02:45,506 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.864e+01 2.266e+01 2.507e+01 2.843e+01 1.129e+02, threshold=5.014e+01, percent-clipped=1.0 2024-08-14 23:03:09,776 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 25 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-14 23:03:11,253 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2889820.0, ans=0.125 2024-08-14 23:03:15,275 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 27 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-14 23:03:22,483 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 13650, loss[loss=0.1303, beats_loss=0.01007, ecapa_loss=0.0001634, whisper_loss=0.1186, over 22880.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0108, ecapa_loss=0.0001528, whisper_loss=0.09045, over 3816919.23 frames. ], batch size: 91, lr: 3.09e-03, grad_scale: 1.152921504606847e+18 2024-08-14 23:03:28,829 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 26 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-14 23:03:32,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2889920.0, ans=0.0 2024-08-14 23:03:36,504 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-14 23:03:44,193 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2890020.0, ans=0.0 2024-08-14 23:03:47,461 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 23:03:56,047 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 25 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-14 23:03:56,626 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.97 vs. limit=15.0 2024-08-14 23:04:02,395 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2890120.0, ans=0.1 2024-08-14 23:04:19,244 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.20 vs. limit=15.0 2024-08-14 23:04:27,686 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 30 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-14 23:04:36,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2890420.0, ans=0.125 2024-08-14 23:04:37,810 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 13700, loss[loss=0.1001, beats_loss=0.009377, ecapa_loss=0.0001323, whisper_loss=0.08944, over 18595.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01079, ecapa_loss=0.000153, whisper_loss=0.0907, over 3860070.60 frames. ], batch size: 71, lr: 3.08e-03, grad_scale: 1.152921504606847e+18 2024-08-14 23:04:38,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2890420.0, ans=0.125 2024-08-14 23:04:43,135 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2890420.0, ans=0.1 2024-08-14 23:04:43,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2890420.0, ans=0.125 2024-08-14 23:04:47,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2890420.0, ans=0.125 2024-08-14 23:04:56,409 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2890520.0, ans=0.0 2024-08-14 23:05:11,830 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.93 vs. limit=15.0 2024-08-14 23:05:14,935 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.20 vs. limit=15.0 2024-08-14 23:05:15,381 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.409e+01 2.626e+01 3.025e+01 7.983e+01, threshold=5.251e+01, percent-clipped=2.0 2024-08-14 23:05:17,270 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2890620.0, ans=0.2 2024-08-14 23:05:36,425 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-14 23:05:52,631 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 13750, loss[loss=0.1045, beats_loss=0.009255, ecapa_loss=0.0001791, whisper_loss=0.09341, over 19817.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01074, ecapa_loss=0.000153, whisper_loss=0.09129, over 3870161.19 frames. ], batch size: 81, lr: 3.08e-03, grad_scale: 1.152921504606847e+18 2024-08-14 23:05:59,220 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2890920.0, ans=0.125 2024-08-14 23:06:11,294 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 18 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-14 23:06:13,417 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.13 vs. limit=22.5 2024-08-14 23:06:30,997 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2891120.0, ans=0.2 2024-08-14 23:06:31,453 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.82 vs. limit=22.5 2024-08-14 23:06:32,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2891120.0, ans=0.09899494936611666 2024-08-14 23:06:35,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2891120.0, ans=0.125 2024-08-14 23:06:48,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2891220.0, ans=0.1 2024-08-14 23:06:57,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2891320.0, ans=0.2 2024-08-14 23:06:59,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2891320.0, ans=0.09899494936611666 2024-08-14 23:07:07,961 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 13800, loss[loss=0.09487, beats_loss=0.01147, ecapa_loss=0.0001405, whisper_loss=0.08199, over 22155.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0107, ecapa_loss=0.0001525, whisper_loss=0.09115, over 3848538.56 frames. ], batch size: 92, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:07:24,686 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 27 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-14 23:07:27,753 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 25 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-14 23:07:31,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2891520.0, ans=0.0 2024-08-14 23:07:46,925 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.261e+01 2.530e+01 3.041e+01 4.646e+01, threshold=5.059e+01, percent-clipped=0.0 2024-08-14 23:07:47,472 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2891620.0, ans=0.1 2024-08-14 23:08:01,051 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=9.05 vs. limit=12.0 2024-08-14 23:08:22,811 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 13850, loss[loss=0.0932, beats_loss=0.01133, ecapa_loss=0.0001774, whisper_loss=0.08009, over 18818.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01063, ecapa_loss=0.0001523, whisper_loss=0.09168, over 3873853.11 frames. ], batch size: 78, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:08:27,596 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 18 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-14 23:09:07,859 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 19 from LS+wenet, 28 from Vox, 43 fro AS 2024-08-14 23:09:08,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2892220.0, ans=0.125 2024-08-14 23:09:21,448 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 37 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-14 23:09:31,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=2892320.0, ans=6.0 2024-08-14 23:09:34,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2892320.0, ans=0.1 2024-08-14 23:09:40,141 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 13900, loss[loss=0.1071, beats_loss=0.01062, ecapa_loss=0.0001734, whisper_loss=0.09473, over 21249.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01056, ecapa_loss=0.000154, whisper_loss=0.0917, over 3885021.23 frames. ], batch size: 89, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:09:43,458 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-14 23:09:46,767 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-14 23:10:09,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=2892620.0, ans=0.025 2024-08-14 23:10:20,032 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.891e+01 2.401e+01 2.644e+01 3.017e+01 4.177e+01, threshold=5.288e+01, percent-clipped=0.0 2024-08-14 23:10:29,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2892720.0, ans=0.1 2024-08-14 23:10:37,920 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2892720.0, ans=0.0 2024-08-14 23:10:47,888 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2892820.0, ans=0.125 2024-08-14 23:10:50,401 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 21 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-14 23:10:56,040 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 13950, loss[loss=0.1009, beats_loss=0.008206, ecapa_loss=0.000166, whisper_loss=0.09101, over 17883.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01049, ecapa_loss=0.0001548, whisper_loss=0.09203, over 3894684.39 frames. ], batch size: 71, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:10:59,289 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-14 23:11:01,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2892920.0, ans=0.125 2024-08-14 23:11:15,192 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=2893020.0, ans=15.0 2024-08-14 23:11:19,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2893020.0, ans=0.1 2024-08-14 23:11:29,230 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 28 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-14 23:11:35,656 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 21 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-14 23:11:37,410 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 31 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-14 23:11:45,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2893220.0, ans=0.125 2024-08-14 23:11:49,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2893220.0, ans=0.0 2024-08-14 23:11:56,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2893320.0, ans=0.0 2024-08-14 23:11:58,454 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 17 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-14 23:12:08,957 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.49 vs. limit=22.5 2024-08-14 23:12:11,023 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 14000, loss[loss=0.1066, beats_loss=0.009865, ecapa_loss=0.0002089, whisper_loss=0.09467, over 16792.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01045, ecapa_loss=0.0001545, whisper_loss=0.09201, over 3898035.87 frames. ], batch size: 70, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:12:39,006 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-14 23:12:44,219 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.92 vs. limit=15.0 2024-08-14 23:12:50,915 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.781e+01 2.312e+01 2.545e+01 2.868e+01 4.909e+01, threshold=5.090e+01, percent-clipped=0.0 2024-08-14 23:12:55,985 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=2893720.0, ans=0.95 2024-08-14 23:13:13,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2893820.0, ans=0.0 2024-08-14 23:13:24,870 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 21 from LS+wenet, 31 from Vox, 40 fro AS 2024-08-14 23:13:27,434 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 14050, loss[loss=0.1041, beats_loss=0.008458, ecapa_loss=0.0001454, whisper_loss=0.09419, over 15005.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01056, ecapa_loss=0.0001534, whisper_loss=0.0923, over 3907563.24 frames. ], batch size: 54, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:13:29,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2893920.0, ans=0.09899494936611666 2024-08-14 23:13:35,654 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.34 vs. limit=6.0 2024-08-14 23:13:41,229 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 19 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-14 23:13:53,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2894020.0, ans=0.2 2024-08-14 23:14:07,350 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.20 vs. limit=15.0 2024-08-14 23:14:29,556 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2894320.0, ans=0.1 2024-08-14 23:14:42,159 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 14100, loss[loss=0.1139, beats_loss=0.008449, ecapa_loss=0.000183, whisper_loss=0.1037, over 21555.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01059, ecapa_loss=0.0001532, whisper_loss=0.09185, over 3906991.81 frames. ], batch size: 88, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:14:47,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2894420.0, ans=0.125 2024-08-14 23:15:21,473 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.675e+01 2.405e+01 2.712e+01 3.125e+01 2.483e+02, threshold=5.424e+01, percent-clipped=1.0 2024-08-14 23:15:38,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2894720.0, ans=0.125 2024-08-14 23:15:41,174 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 26 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-14 23:15:50,425 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 23:15:54,584 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-14 23:15:56,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=2894920.0, ans=0.95 2024-08-14 23:15:57,332 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 14150, loss[loss=0.1052, beats_loss=0.01003, ecapa_loss=0.0001612, whisper_loss=0.09356, over 22384.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01064, ecapa_loss=0.0001522, whisper_loss=0.0921, over 3900296.32 frames. ], batch size: 89, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:16:13,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2895020.0, ans=0.0 2024-08-14 23:16:16,091 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.45 vs. limit=15.0 2024-08-14 23:16:35,379 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-14 23:16:43,655 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2895220.0, ans=0.125 2024-08-14 23:16:56,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2895320.0, ans=0.125 2024-08-14 23:17:12,646 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 14200, loss[loss=0.1267, beats_loss=0.00766, ecapa_loss=0.0001467, whisper_loss=0.1175, over 19215.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01067, ecapa_loss=0.0001518, whisper_loss=0.09171, over 3913641.10 frames. ], batch size: 72, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:17:21,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2895420.0, ans=0.0 2024-08-14 23:17:25,470 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2895420.0, ans=0.125 2024-08-14 23:17:48,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2895620.0, ans=0.125 2024-08-14 23:17:52,659 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.967e+01 2.327e+01 2.651e+01 3.057e+01 2.487e+02, threshold=5.302e+01, percent-clipped=2.0 2024-08-14 23:18:06,631 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 18 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-14 23:18:07,245 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.66 vs. limit=15.0 2024-08-14 23:18:17,475 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 8 from Vox, 37 fro AS 2024-08-14 23:18:26,289 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2895820.0, ans=0.0 2024-08-14 23:18:28,495 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 14250, loss[loss=0.1243, beats_loss=0.01056, ecapa_loss=0.00014, whisper_loss=0.1123, over 23385.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.0106, ecapa_loss=0.0001506, whisper_loss=0.09228, over 3926601.74 frames. ], batch size: 91, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:18:34,814 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.66 vs. limit=15.0 2024-08-14 23:18:37,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2895920.0, ans=0.0 2024-08-14 23:18:43,561 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 25 from LS+wenet, 19 from Vox, 50 fro AS 2024-08-14 23:19:04,031 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.14 vs. limit=15.0 2024-08-14 23:19:07,347 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.65 vs. limit=15.0 2024-08-14 23:19:10,737 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 27 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-14 23:19:14,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2896220.0, ans=0.0 2024-08-14 23:19:44,531 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.77 vs. limit=15.0 2024-08-14 23:19:45,095 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 14300, loss[loss=0.1039, beats_loss=0.01052, ecapa_loss=0.0001378, whisper_loss=0.09199, over 19490.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01071, ecapa_loss=0.0001501, whisper_loss=0.09108, over 3942380.37 frames. ], batch size: 79, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:19:49,161 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.47 vs. limit=15.0 2024-08-14 23:19:57,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2896420.0, ans=0.5 2024-08-14 23:20:00,314 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 21 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-14 23:20:16,247 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-14 23:20:23,671 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.848e+01 2.328e+01 2.532e+01 2.890e+01 9.959e+01, threshold=5.063e+01, percent-clipped=3.0 2024-08-14 23:20:33,349 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2896720.0, ans=0.125 2024-08-14 23:20:35,546 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 19 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-14 23:20:39,281 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.57 vs. limit=6.0 2024-08-14 23:20:47,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2896820.0, ans=0.125 2024-08-14 23:20:58,746 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 14350, loss[loss=0.09882, beats_loss=0.01052, ecapa_loss=0.0001545, whisper_loss=0.08676, over 22796.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01068, ecapa_loss=0.0001513, whisper_loss=0.09148, over 3965916.14 frames. ], batch size: 90, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:21:06,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2896920.0, ans=0.125 2024-08-14 23:21:15,454 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2897020.0, ans=0.125 2024-08-14 23:21:29,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2897120.0, ans=0.125 2024-08-14 23:21:35,397 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2897120.0, ans=0.125 2024-08-14 23:21:47,155 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2897220.0, ans=0.2 2024-08-14 23:21:57,608 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2897320.0, ans=0.1 2024-08-14 23:21:58,956 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 27 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-14 23:22:07,985 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-14 23:22:10,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2897420.0, ans=0.125 2024-08-14 23:22:11,664 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 14400, loss[loss=0.1094, beats_loss=0.01166, ecapa_loss=0.0001511, whisper_loss=0.09626, over 21671.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01067, ecapa_loss=0.0001526, whisper_loss=0.09106, over 3958960.85 frames. ], batch size: 89, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:22:13,686 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.71 vs. limit=15.0 2024-08-14 23:22:15,493 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.93 vs. limit=15.0 2024-08-14 23:22:19,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2897420.0, ans=0.05 2024-08-14 23:22:26,588 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2897520.0, ans=0.125 2024-08-14 23:22:31,100 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=7.145e-01 2024-08-14 23:22:43,179 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2897620.0, ans=10.0 2024-08-14 23:22:43,198 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2897620.0, ans=0.0 2024-08-14 23:22:50,059 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 20 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-14 23:22:51,282 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.446e+01 2.693e+01 3.071e+01 5.241e+01, threshold=5.387e+01, percent-clipped=1.0 2024-08-14 23:22:59,229 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.59 vs. limit=10.0 2024-08-14 23:23:21,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2897820.0, ans=0.125 2024-08-14 23:23:26,813 INFO [train_multi_KD3.py:1116] (0/4) Epoch 20, batch 14450, loss[loss=0.1072, beats_loss=0.01147, ecapa_loss=0.0001394, whisper_loss=0.09431, over 16521.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01071, ecapa_loss=0.0001522, whisper_loss=0.09097, over 3961305.24 frames. ], batch size: 66, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:23:30,359 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 23:23:31,412 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 24 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-14 23:23:36,218 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 16 from Vox, 49 fro AS 2024-08-14 23:23:50,690 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 36 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-14 23:24:09,109 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 23 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-14 23:24:10,558 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 23 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-14 23:24:31,527 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/epoch-20.pt 2024-08-14 23:25:02,517 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 0, loss[loss=0.112, beats_loss=0.008141, ecapa_loss=0.0001529, whisper_loss=0.1023, over 23945.00 frames. ], tot_loss[loss=0.112, beats_loss=0.008141, ecapa_loss=0.0001529, whisper_loss=0.1023, over 23945.00 frames. ], batch size: 91, lr: 3.00e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:25:02,518 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-14 23:25:46,201 INFO [train_multi_KD3.py:1149] (0/4) Epoch 21, validation on ASR_libri: loss=0.2536, beats_loss=0, ecapa_loss=0.0005489, whisper_loss=0.2481, over 922467.00 frames. 2024-08-14 23:26:02,307 INFO [train_multi_KD3.py:1149] (0/4) Epoch 21, validation on SV_voxceleb1: loss=0.004256, beats_loss=0, ecapa_loss=0.0004256, whisper_loss=0, over 939242.00 frames. 2024-08-14 23:28:02,834 INFO [train_multi_KD3.py:1149] (0/4) Epoch 21, validation on AT_audioset: loss=0.02343, beats_loss=0.02343, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 23:28:02,838 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32601MB 2024-08-14 23:29:15,941 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.480e+05 2024-08-14 23:29:30,519 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.886e+01 2.477e+01 2.723e+01 3.011e+01 4.734e+01, threshold=5.445e+01, percent-clipped=0.0 2024-08-14 23:29:46,995 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 26 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-14 23:29:50,297 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.69 vs. limit=22.5 2024-08-14 23:30:13,505 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 50, loss[loss=0.1053, beats_loss=0.009731, ecapa_loss=0.0001512, whisper_loss=0.09404, over 17275.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.009564, ecapa_loss=0.0001606, whisper_loss=0.09116, over 898814.01 frames. ], batch size: 70, lr: 3.00e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:31:15,495 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2899050.0, ans=0.0 2024-08-14 23:31:29,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2899150.0, ans=0.035 2024-08-14 23:31:44,405 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.75 vs. limit=22.5 2024-08-14 23:31:51,213 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.47 vs. limit=12.0 2024-08-14 23:31:53,694 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.65 vs. limit=15.0 2024-08-14 23:31:55,326 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 26 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-14 23:32:13,765 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 100, loss[loss=0.1039, beats_loss=0.007561, ecapa_loss=0.0001751, whisper_loss=0.09461, over 13969.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.009706, ecapa_loss=0.0001574, whisper_loss=0.08933, over 1555936.54 frames. ], batch size: 55, lr: 3.00e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:32:16,851 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 25 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-14 23:32:56,821 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-14 23:32:59,232 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2899550.0, ans=0.07 2024-08-14 23:33:04,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2899550.0, ans=0.125 2024-08-14 23:33:09,002 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2899550.0, ans=0.0 2024-08-14 23:33:28,198 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2899650.0, ans=0.125 2024-08-14 23:33:31,825 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.849e+01 2.639e+01 2.885e+01 3.247e+01 3.567e+02, threshold=5.770e+01, percent-clipped=1.0 2024-08-14 23:33:32,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2899650.0, ans=0.125 2024-08-14 23:33:40,805 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 29 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-14 23:33:58,109 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2899750.0, ans=0.1 2024-08-14 23:34:07,770 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 150, loss[loss=0.1157, beats_loss=0.01073, ecapa_loss=0.0001405, whisper_loss=0.1035, over 19539.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.00982, ecapa_loss=0.0001537, whisper_loss=0.09081, over 2075563.83 frames. ], batch size: 77, lr: 3.00e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:34:08,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2899850.0, ans=0.0 2024-08-14 23:34:16,433 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2899850.0, ans=0.0 2024-08-14 23:34:16,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=2899850.0, ans=10.0 2024-08-14 23:34:22,480 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.22 vs. limit=15.0 2024-08-14 23:34:35,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2899950.0, ans=0.125 2024-08-14 23:34:40,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2899950.0, ans=0.0 2024-08-14 23:34:43,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2900050.0, ans=0.125 2024-08-14 23:34:50,465 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2900050.0, ans=0.125 2024-08-14 23:35:10,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2900150.0, ans=0.2 2024-08-14 23:35:18,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2900250.0, ans=0.1 2024-08-14 23:35:29,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2900250.0, ans=0.0 2024-08-14 23:35:31,935 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 200, loss[loss=0.0935, beats_loss=0.01017, ecapa_loss=0.0001663, whisper_loss=0.08167, over 15449.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.009986, ecapa_loss=0.0001534, whisper_loss=0.09125, over 2453217.95 frames. ], batch size: 61, lr: 3.00e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:35:32,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=2900350.0, ans=10.0 2024-08-14 23:35:38,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2900350.0, ans=0.0 2024-08-14 23:35:48,532 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2900450.0, ans=0.125 2024-08-14 23:35:53,226 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 29 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-14 23:36:06,047 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.33 vs. limit=15.0 2024-08-14 23:36:23,829 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.758e+01 2.304e+01 2.562e+01 2.943e+01 6.143e+01, threshold=5.124e+01, percent-clipped=1.0 2024-08-14 23:36:46,068 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 13 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-14 23:36:49,621 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 250, loss[loss=0.0802, beats_loss=0.009965, ecapa_loss=0.0002104, whisper_loss=0.06813, over 15593.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01016, ecapa_loss=0.0001526, whisper_loss=0.09107, over 2753510.53 frames. ], batch size: 69, lr: 3.00e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:36:54,429 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2900850.0, ans=0.07 2024-08-14 23:37:14,746 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.34 vs. limit=15.0 2024-08-14 23:37:27,065 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.88 vs. limit=15.0 2024-08-14 23:37:28,655 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.85 vs. limit=15.0 2024-08-14 23:37:40,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2901150.0, ans=0.125 2024-08-14 23:37:41,526 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 33 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-14 23:38:01,928 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 300, loss[loss=0.1033, beats_loss=0.007897, ecapa_loss=0.0001519, whisper_loss=0.09384, over 21635.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01021, ecapa_loss=0.0001539, whisper_loss=0.09063, over 2967063.85 frames. ], batch size: 83, lr: 3.00e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:38:12,822 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2901350.0, ans=0.2 2024-08-14 23:38:19,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2901450.0, ans=0.0 2024-08-14 23:38:49,267 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.731e+01 2.216e+01 2.512e+01 2.821e+01 4.988e+01, threshold=5.024e+01, percent-clipped=0.0 2024-08-14 23:38:54,142 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2901650.0, ans=0.125 2024-08-14 23:39:13,192 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 350, loss[loss=0.08514, beats_loss=0.01136, ecapa_loss=0.0001291, whisper_loss=0.07249, over 15630.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01046, ecapa_loss=0.0001525, whisper_loss=0.09048, over 3158425.72 frames. ], batch size: 61, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:39:24,351 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.11 vs. limit=15.0 2024-08-14 23:39:31,789 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 22 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-14 23:39:36,396 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 15 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-14 23:39:40,427 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 19 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-14 23:39:42,121 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2902050.0, ans=0.1 2024-08-14 23:39:54,499 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.whiten.whitening_limit, batch_count=2902050.0, ans=12.0 2024-08-14 23:40:01,289 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 23 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-14 23:40:11,470 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2902250.0, ans=0.0 2024-08-14 23:40:15,834 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2902250.0, ans=0.125 2024-08-14 23:40:27,223 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 400, loss[loss=0.09782, beats_loss=0.01094, ecapa_loss=0.0001441, whisper_loss=0.08544, over 15345.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01048, ecapa_loss=0.0001528, whisper_loss=0.08994, over 3270760.82 frames. ], batch size: 61, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:40:42,811 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 34 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-14 23:40:48,620 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-14 23:40:50,268 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-14 23:40:53,083 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 25 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-14 23:41:04,969 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2902550.0, ans=0.125 2024-08-14 23:41:18,398 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.395e+01 2.699e+01 3.154e+01 2.910e+02, threshold=5.398e+01, percent-clipped=2.0 2024-08-14 23:41:20,879 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 23:41:22,521 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 25 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-14 23:41:32,442 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2902750.0, ans=0.0 2024-08-14 23:41:36,408 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2902750.0, ans=0.125 2024-08-14 23:41:38,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2902750.0, ans=0.125 2024-08-14 23:41:41,686 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2902750.0, ans=0.2 2024-08-14 23:41:43,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2902750.0, ans=0.125 2024-08-14 23:41:43,313 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2902750.0, ans=0.0 2024-08-14 23:41:45,446 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 450, loss[loss=0.1035, beats_loss=0.01115, ecapa_loss=0.0001531, whisper_loss=0.0908, over 21946.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01049, ecapa_loss=0.0001524, whisper_loss=0.09015, over 3401317.64 frames. ], batch size: 88, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:41:45,742 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 24 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-14 23:42:22,367 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2903050.0, ans=0.0 2024-08-14 23:43:04,673 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 500, loss[loss=0.08879, beats_loss=0.0124, ecapa_loss=0.000123, whisper_loss=0.07516, over 17399.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01035, ecapa_loss=0.0001525, whisper_loss=0.08987, over 3444576.82 frames. ], batch size: 69, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:43:05,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=2903350.0, ans=0.025 2024-08-14 23:43:19,502 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 23 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-14 23:43:28,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2903450.0, ans=0.2 2024-08-14 23:43:40,442 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 30 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 23:43:44,817 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 27 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-14 23:43:54,000 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.41 vs. limit=22.5 2024-08-14 23:43:55,660 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.939e+01 2.351e+01 2.577e+01 2.922e+01 3.343e+02, threshold=5.154e+01, percent-clipped=2.0 2024-08-14 23:43:57,407 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 31 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-14 23:43:59,267 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2903650.0, ans=0.0 2024-08-14 23:44:13,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2903750.0, ans=0.0 2024-08-14 23:44:13,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2903750.0, ans=0.125 2024-08-14 23:44:18,636 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 550, loss[loss=0.1083, beats_loss=0.00924, ecapa_loss=0.0001613, whisper_loss=0.09747, over 21904.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01038, ecapa_loss=0.0001515, whisper_loss=0.08971, over 3539804.46 frames. ], batch size: 89, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:44:18,772 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 17 from Vox, 50 fro AS 2024-08-14 23:44:30,114 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-14 23:44:46,393 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.44 vs. limit=15.0 2024-08-14 23:44:48,834 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2904050.0, ans=0.07 2024-08-14 23:45:05,491 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 14 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-14 23:45:24,565 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 600, loss[loss=0.1009, beats_loss=0.008652, ecapa_loss=0.0002084, whisper_loss=0.09015, over 13386.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01045, ecapa_loss=0.0001518, whisper_loss=0.08926, over 3605733.67 frames. ], batch size: 57, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:45:26,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2904350.0, ans=0.0 2024-08-14 23:45:31,508 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 19 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-14 23:45:37,890 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 24 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-14 23:46:08,127 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2904650.0, ans=0.07 2024-08-14 23:46:08,848 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.681e+01 2.318e+01 2.599e+01 2.895e+01 9.632e+01, threshold=5.197e+01, percent-clipped=3.0 2024-08-14 23:46:18,461 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-14 23:46:29,972 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 650, loss[loss=0.1127, beats_loss=0.009166, ecapa_loss=0.0001719, whisper_loss=0.1018, over 19113.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01054, ecapa_loss=0.0001515, whisper_loss=0.08905, over 3651159.87 frames. ], batch size: 77, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:46:37,254 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.33 vs. limit=12.0 2024-08-14 23:46:43,402 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 18 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-14 23:46:51,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2904950.0, ans=0.2 2024-08-14 23:46:52,682 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 18 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-14 23:46:56,985 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2905050.0, ans=0.1 2024-08-14 23:47:08,086 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 14 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-14 23:47:28,799 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2905250.0, ans=0.0 2024-08-14 23:47:30,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2905250.0, ans=0.0 2024-08-14 23:47:34,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2905250.0, ans=0.125 2024-08-14 23:47:36,377 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 700, loss[loss=0.09947, beats_loss=0.01169, ecapa_loss=0.0001283, whisper_loss=0.0865, over 16589.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01049, ecapa_loss=0.0001521, whisper_loss=0.08958, over 3673778.17 frames. ], batch size: 67, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:47:36,576 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 31 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-14 23:47:41,580 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 23 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-14 23:47:53,590 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 16 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-14 23:48:21,180 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.342e+01 2.517e+01 2.889e+01 6.845e+01, threshold=5.033e+01, percent-clipped=2.0 2024-08-14 23:48:27,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2905750.0, ans=0.0 2024-08-14 23:48:35,814 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2905750.0, ans=0.125 2024-08-14 23:48:41,142 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2905850.0, ans=0.0 2024-08-14 23:48:41,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2905850.0, ans=0.07 2024-08-14 23:48:41,933 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 750, loss[loss=0.1131, beats_loss=0.008278, ecapa_loss=0.0001621, whisper_loss=0.1032, over 16637.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0104, ecapa_loss=0.0001529, whisper_loss=0.09024, over 3674227.14 frames. ], batch size: 63, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:48:58,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2905950.0, ans=0.125 2024-08-14 23:49:12,884 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2024-08-14 23:49:19,379 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.78 vs. limit=15.0 2024-08-14 23:49:37,220 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 25 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-14 23:49:47,275 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 800, loss[loss=0.09113, beats_loss=0.01143, ecapa_loss=0.000164, whisper_loss=0.07805, over 22502.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01044, ecapa_loss=0.0001522, whisper_loss=0.08942, over 3703841.22 frames. ], batch size: 91, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:50:01,246 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.62 vs. limit=15.0 2024-08-14 23:50:07,095 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 28 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-14 23:50:22,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2906550.0, ans=0.125 2024-08-14 23:50:31,719 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.222e+01 2.469e+01 2.749e+01 4.131e+01, threshold=4.938e+01, percent-clipped=0.0 2024-08-14 23:50:38,473 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 24 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-14 23:50:39,686 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-14 23:50:39,885 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 23:50:52,378 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 850, loss[loss=0.1056, beats_loss=0.01156, ecapa_loss=0.0001272, whisper_loss=0.09275, over 19615.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01048, ecapa_loss=0.0001503, whisper_loss=0.08921, over 3727257.82 frames. ], batch size: 76, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:50:59,210 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 15 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-14 23:51:10,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2906950.0, ans=0.05 2024-08-14 23:51:11,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2906950.0, ans=0.1 2024-08-14 23:51:26,756 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.76 vs. limit=10.0 2024-08-14 23:51:27,401 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2907050.0, ans=0.0 2024-08-14 23:51:29,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2907050.0, ans=0.0 2024-08-14 23:51:33,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2907150.0, ans=0.125 2024-08-14 23:51:40,693 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2907150.0, ans=0.125 2024-08-14 23:51:41,509 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 15 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-14 23:51:47,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2907250.0, ans=0.125 2024-08-14 23:51:58,599 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 900, loss[loss=0.08937, beats_loss=0.01029, ecapa_loss=0.0001298, whisper_loss=0.07779, over 14918.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01044, ecapa_loss=0.0001496, whisper_loss=0.08972, over 3740527.85 frames. ], batch size: 56, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:52:06,956 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2907350.0, ans=0.2 2024-08-14 23:52:09,211 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-14 23:52:20,414 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.35 vs. limit=12.0 2024-08-14 23:52:22,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2907450.0, ans=0.125 2024-08-14 23:52:23,907 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 29 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-14 23:52:38,400 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2907650.0, ans=0.05 2024-08-14 23:52:43,577 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.513e+01 2.774e+01 3.197e+01 6.969e+01, threshold=5.548e+01, percent-clipped=1.0 2024-08-14 23:52:46,116 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.47 vs. limit=15.0 2024-08-14 23:53:03,785 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 34 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-14 23:53:05,103 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 950, loss[loss=0.1177, beats_loss=0.01028, ecapa_loss=0.0001067, whisper_loss=0.1064, over 24738.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01046, ecapa_loss=0.0001486, whisper_loss=0.08942, over 3771124.76 frames. ], batch size: 93, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:53:07,773 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 17 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-14 23:53:12,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2907850.0, ans=0.125 2024-08-14 23:53:14,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2907850.0, ans=0.1 2024-08-14 23:53:19,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2907950.0, ans=0.2 2024-08-14 23:53:27,806 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.72 vs. limit=22.5 2024-08-14 23:53:37,020 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.69 vs. limit=15.0 2024-08-14 23:54:01,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2908250.0, ans=0.125 2024-08-14 23:54:10,586 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 1000, loss[loss=0.1006, beats_loss=0.01233, ecapa_loss=0.0001335, whisper_loss=0.0869, over 20992.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.0105, ecapa_loss=0.0001488, whisper_loss=0.0885, over 3775129.07 frames. ], batch size: 83, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:54:11,087 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 23:54:13,402 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-14 23:54:22,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2908450.0, ans=0.1 2024-08-14 23:54:28,424 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2908450.0, ans=0.0 2024-08-14 23:54:31,451 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.59 vs. limit=6.0 2024-08-14 23:54:41,989 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 25 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-14 23:54:42,850 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.32 vs. limit=22.5 2024-08-14 23:54:51,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2908650.0, ans=0.125 2024-08-14 23:54:55,032 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.358e+01 2.563e+01 2.906e+01 4.216e+01, threshold=5.126e+01, percent-clipped=0.0 2024-08-14 23:54:56,438 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 24 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-14 23:54:58,715 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.65 vs. limit=6.0 2024-08-14 23:55:09,767 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 17 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-14 23:55:15,897 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 1050, loss[loss=0.1163, beats_loss=0.008523, ecapa_loss=0.0001506, whisper_loss=0.1063, over 14390.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01053, ecapa_loss=0.0001488, whisper_loss=0.08918, over 3778471.11 frames. ], batch size: 54, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:55:21,027 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 25 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-14 23:55:31,828 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 31 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-14 23:55:35,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2908950.0, ans=0.0 2024-08-14 23:55:41,952 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 23:55:44,706 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=6.038e+01 2024-08-14 23:55:58,218 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 25 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-14 23:56:04,008 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.46 vs. limit=12.0 2024-08-14 23:56:21,307 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 1100, loss[loss=0.105, beats_loss=0.009693, ecapa_loss=0.0001856, whisper_loss=0.09346, over 20555.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01049, ecapa_loss=0.0001486, whisper_loss=0.08969, over 3799086.79 frames. ], batch size: 85, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:56:35,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2909450.0, ans=0.0 2024-08-14 23:56:39,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2909450.0, ans=0.1 2024-08-14 23:56:42,776 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.95 vs. limit=15.0 2024-08-14 23:56:54,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2909550.0, ans=0.125 2024-08-14 23:57:05,441 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.948e+01 2.324e+01 2.535e+01 2.853e+01 4.555e+01, threshold=5.069e+01, percent-clipped=0.0 2024-08-14 23:57:07,189 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2909650.0, ans=0.125 2024-08-14 23:57:13,047 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=14.72 vs. limit=15.0 2024-08-14 23:57:15,163 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 23 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 23:57:15,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2909750.0, ans=0.0 2024-08-14 23:57:26,558 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 1150, loss[loss=0.1054, beats_loss=0.01246, ecapa_loss=0.0001075, whisper_loss=0.09189, over 17157.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0105, ecapa_loss=0.0001478, whisper_loss=0.0895, over 3804373.60 frames. ], batch size: 67, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:57:49,847 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.92 vs. limit=15.0 2024-08-14 23:58:07,497 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 19 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-14 23:58:17,913 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2910250.0, ans=0.04949747468305833 2024-08-14 23:58:18,795 WARNING [optim.py:496] (0/4) Scaling gradients by 0.08089441806077957, model_norm_threshold=50.69233322143555 2024-08-14 23:58:18,982 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.38, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.506e+05, grad_sumsq=1.506e+05, orig_rms_sq=1.000e+00 2024-08-14 23:58:26,030 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 35 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-14 23:58:31,451 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2910350.0, ans=0.05 2024-08-14 23:58:32,269 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 1200, loss[loss=0.09051, beats_loss=0.01077, ecapa_loss=0.0001175, whisper_loss=0.07857, over 18754.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.0106, ecapa_loss=0.0001477, whisper_loss=0.08914, over 3786522.90 frames. ], batch size: 72, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:58:34,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=2910350.0, ans=10.0 2024-08-14 23:58:38,354 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2910350.0, ans=0.0 2024-08-14 23:58:40,160 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.24 vs. limit=15.0 2024-08-14 23:58:51,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2910450.0, ans=0.0 2024-08-14 23:58:52,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2910450.0, ans=0.125 2024-08-14 23:58:58,670 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.80 vs. limit=10.0 2024-08-14 23:59:05,566 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-14 23:59:08,301 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 25 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-14 23:59:18,482 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.775e+01 2.273e+01 2.471e+01 2.921e+01 6.266e+02, threshold=4.943e+01, percent-clipped=3.0 2024-08-14 23:59:20,061 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-14 23:59:30,623 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2910750.0, ans=0.2 2024-08-14 23:59:37,109 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2910750.0, ans=0.0 2024-08-14 23:59:37,438 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.10 vs. limit=15.0 2024-08-14 23:59:39,943 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 1250, loss[loss=0.1126, beats_loss=0.008983, ecapa_loss=0.0001537, whisper_loss=0.1021, over 18220.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01055, ecapa_loss=0.0001488, whisper_loss=0.08983, over 3787679.06 frames. ], batch size: 70, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:59:45,603 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2910850.0, ans=0.0 2024-08-14 23:59:58,871 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.44 vs. limit=15.0 2024-08-15 00:00:04,478 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2910950.0, ans=0.0 2024-08-15 00:00:10,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=2911050.0, ans=22.5 2024-08-15 00:00:25,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2911150.0, ans=0.1 2024-08-15 00:00:26,781 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 24 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-15 00:00:51,596 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 1300, loss[loss=0.1042, beats_loss=0.01031, ecapa_loss=0.0001808, whisper_loss=0.09205, over 13477.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01059, ecapa_loss=0.0001481, whisper_loss=0.08942, over 3780968.07 frames. ], batch size: 54, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:01:16,073 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-15 00:01:23,026 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2911550.0, ans=0.0 2024-08-15 00:01:27,503 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 16 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-15 00:01:36,335 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2911650.0, ans=0.1 2024-08-15 00:01:40,983 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.605e+01 2.220e+01 2.514e+01 2.820e+01 5.708e+01, threshold=5.028e+01, percent-clipped=1.0 2024-08-15 00:01:58,531 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-15 00:02:07,111 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 1350, loss[loss=0.09872, beats_loss=0.01225, ecapa_loss=0.0001475, whisper_loss=0.08499, over 22181.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01057, ecapa_loss=0.0001493, whisper_loss=0.08988, over 3797853.17 frames. ], batch size: 89, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:02:12,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2911850.0, ans=0.125 2024-08-15 00:02:19,146 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2911850.0, ans=0.0 2024-08-15 00:02:24,408 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2911950.0, ans=0.0 2024-08-15 00:02:32,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2911950.0, ans=0.0 2024-08-15 00:02:39,435 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 17 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-15 00:02:48,013 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.94 vs. limit=15.0 2024-08-15 00:02:54,024 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.05 vs. limit=15.0 2024-08-15 00:02:55,018 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 12 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-15 00:02:57,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2912150.0, ans=0.0 2024-08-15 00:03:04,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2912150.0, ans=0.1 2024-08-15 00:03:26,180 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 1400, loss[loss=0.1092, beats_loss=0.01232, ecapa_loss=0.0001464, whisper_loss=0.09544, over 23166.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01059, ecapa_loss=0.0001479, whisper_loss=0.08973, over 3791567.16 frames. ], batch size: 93, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:03:39,266 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2912350.0, ans=0.1 2024-08-15 00:03:47,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2912450.0, ans=0.125 2024-08-15 00:03:54,620 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.63 vs. limit=15.0 2024-08-15 00:03:56,408 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.24 vs. limit=22.5 2024-08-15 00:03:58,207 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 30 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-15 00:04:04,570 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.15 vs. limit=15.0 2024-08-15 00:04:10,067 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 19 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-15 00:04:18,974 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.745e+01 2.213e+01 2.409e+01 2.857e+01 4.258e+01, threshold=4.818e+01, percent-clipped=0.0 2024-08-15 00:04:29,855 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.48 vs. limit=6.0 2024-08-15 00:04:30,474 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 14 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-15 00:04:36,387 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2912750.0, ans=0.125 2024-08-15 00:04:39,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2912750.0, ans=0.2 2024-08-15 00:05:00,911 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 1450, loss[loss=0.07283, beats_loss=0.01198, ecapa_loss=0.0001416, whisper_loss=0.05944, over 18809.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01056, ecapa_loss=0.0001484, whisper_loss=0.08975, over 3811893.15 frames. ], batch size: 75, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:05:07,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2912850.0, ans=0.125 2024-08-15 00:05:31,242 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2913050.0, ans=0.125 2024-08-15 00:05:38,565 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 21 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-15 00:05:54,827 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.78 vs. limit=22.5 2024-08-15 00:05:55,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2913150.0, ans=0.125 2024-08-15 00:06:14,914 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2913250.0, ans=0.1 2024-08-15 00:06:14,935 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2913250.0, ans=0.0 2024-08-15 00:06:16,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2913250.0, ans=0.0 2024-08-15 00:06:25,522 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 1500, loss[loss=0.1027, beats_loss=0.009822, ecapa_loss=0.0001618, whisper_loss=0.0913, over 17256.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01055, ecapa_loss=0.0001488, whisper_loss=0.08934, over 3812381.00 frames. ], batch size: 68, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:06:48,575 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2913450.0, ans=0.0 2024-08-15 00:07:10,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2913650.0, ans=0.125 2024-08-15 00:07:13,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2913650.0, ans=0.0 2024-08-15 00:07:20,020 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.692e+01 2.340e+01 2.607e+01 2.901e+01 2.472e+02, threshold=5.215e+01, percent-clipped=2.0 2024-08-15 00:07:44,779 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 25 from LS+wenet, 12 from Vox, 21 fro AS 2024-08-15 00:07:54,059 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 1550, loss[loss=0.09384, beats_loss=0.009932, ecapa_loss=0.0001678, whisper_loss=0.08223, over 16503.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01047, ecapa_loss=0.0001496, whisper_loss=0.08978, over 3789779.87 frames. ], batch size: 67, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:08:03,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2913850.0, ans=0.0 2024-08-15 00:08:05,427 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2913850.0, ans=0.125 2024-08-15 00:08:34,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2914050.0, ans=0.1 2024-08-15 00:08:54,192 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2914050.0, ans=0.125 2024-08-15 00:08:57,080 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.16 vs. limit=5.0 2024-08-15 00:09:29,208 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.14 vs. limit=15.0 2024-08-15 00:09:33,121 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2914250.0, ans=0.125 2024-08-15 00:09:34,888 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2914350.0, ans=0.125 2024-08-15 00:09:36,516 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 1600, loss[loss=0.08188, beats_loss=0.01269, ecapa_loss=0.0001559, whisper_loss=0.06763, over 14684.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01046, ecapa_loss=0.0001489, whisper_loss=0.08995, over 3798390.63 frames. ], batch size: 57, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:09:51,861 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-15 00:09:53,294 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 24 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-15 00:09:57,157 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.08 vs. limit=15.0 2024-08-15 00:10:36,758 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2914550.0, ans=0.125 2024-08-15 00:10:57,275 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.670e+01 2.368e+01 2.559e+01 2.879e+01 3.704e+01, threshold=5.117e+01, percent-clipped=0.0 2024-08-15 00:11:29,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2914750.0, ans=0.125 2024-08-15 00:11:36,674 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 1650, loss[loss=0.08173, beats_loss=0.01098, ecapa_loss=0.00014, whisper_loss=0.06935, over 17838.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01044, ecapa_loss=0.0001478, whisper_loss=0.09023, over 3827390.68 frames. ], batch size: 68, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:12:09,420 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.33 vs. limit=10.0 2024-08-15 00:12:48,572 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.27 vs. limit=15.0 2024-08-15 00:13:03,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2915150.0, ans=0.125 2024-08-15 00:13:36,045 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 1700, loss[loss=0.1008, beats_loss=0.0109, ecapa_loss=0.0001359, whisper_loss=0.08858, over 22254.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01046, ecapa_loss=0.0001468, whisper_loss=0.0904, over 3830448.32 frames. ], batch size: 88, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:13:52,213 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.33 vs. limit=22.5 2024-08-15 00:14:13,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2915450.0, ans=0.125 2024-08-15 00:14:22,743 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 18 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-15 00:14:25,456 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 21 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-15 00:14:55,577 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.283e+01 2.594e+01 2.900e+01 5.252e+01, threshold=5.187e+01, percent-clipped=1.0 2024-08-15 00:14:56,081 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2915650.0, ans=0.125 2024-08-15 00:15:26,954 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 38 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-15 00:15:30,653 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 1750, loss[loss=0.09991, beats_loss=0.01257, ecapa_loss=9.907e-05, whisper_loss=0.08635, over 22002.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0105, ecapa_loss=0.000147, whisper_loss=0.09006, over 3822420.93 frames. ], batch size: 85, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:15:37,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2915850.0, ans=0.125 2024-08-15 00:15:39,856 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 21 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-15 00:15:43,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2915850.0, ans=0.2 2024-08-15 00:15:58,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2915950.0, ans=0.1 2024-08-15 00:16:16,389 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 20 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-15 00:16:19,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2916150.0, ans=0.0 2024-08-15 00:16:22,725 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.01 vs. limit=22.5 2024-08-15 00:16:23,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2916150.0, ans=0.07 2024-08-15 00:16:23,697 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2916150.0, ans=0.04949747468305833 2024-08-15 00:16:24,531 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-15 00:16:24,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2916150.0, ans=0.125 2024-08-15 00:16:39,046 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 26 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-15 00:16:42,964 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 1800, loss[loss=0.09904, beats_loss=0.01099, ecapa_loss=0.0001785, whisper_loss=0.08626, over 21632.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01039, ecapa_loss=0.0001483, whisper_loss=0.09064, over 3836181.40 frames. ], batch size: 93, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:17:18,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2916550.0, ans=0.125 2024-08-15 00:17:19,953 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2916550.0, ans=0.2 2024-08-15 00:17:21,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2916550.0, ans=0.04949747468305833 2024-08-15 00:17:29,577 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.746e+01 2.322e+01 2.601e+01 3.039e+01 2.068e+02, threshold=5.202e+01, percent-clipped=5.0 2024-08-15 00:17:31,092 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 21 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-15 00:17:31,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2916650.0, ans=0.125 2024-08-15 00:17:44,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2916750.0, ans=0.1 2024-08-15 00:17:44,258 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.68 vs. limit=15.0 2024-08-15 00:17:47,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2916750.0, ans=0.125 2024-08-15 00:17:51,527 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.216e-01 2024-08-15 00:17:52,290 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 1850, loss[loss=0.1037, beats_loss=0.008601, ecapa_loss=0.0001985, whisper_loss=0.09316, over 20589.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01042, ecapa_loss=0.0001483, whisper_loss=0.08996, over 3826187.30 frames. ], batch size: 88, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:17:53,937 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 20 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-15 00:17:57,925 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2916850.0, ans=0.0 2024-08-15 00:18:00,409 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-15 00:18:06,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2916950.0, ans=0.1 2024-08-15 00:18:08,746 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-15 00:18:10,738 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2916950.0, ans=0.125 2024-08-15 00:18:19,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2916950.0, ans=0.125 2024-08-15 00:19:10,421 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 1900, loss[loss=0.09873, beats_loss=0.01088, ecapa_loss=0.0001587, whisper_loss=0.08626, over 21347.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01046, ecapa_loss=0.0001486, whisper_loss=0.09038, over 3832204.40 frames. ], batch size: 86, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:19:44,773 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 23 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-15 00:19:58,968 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 26 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-15 00:20:05,388 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.18 vs. limit=15.0 2024-08-15 00:20:06,027 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.713e+01 2.388e+01 2.711e+01 3.006e+01 3.511e+02, threshold=5.422e+01, percent-clipped=5.0 2024-08-15 00:20:27,597 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 31 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-15 00:20:29,015 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-15 00:20:30,012 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 1950, loss[loss=0.1044, beats_loss=0.01086, ecapa_loss=0.0001622, whisper_loss=0.09195, over 22837.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01046, ecapa_loss=0.0001504, whisper_loss=0.09027, over 3833759.27 frames. ], batch size: 93, lr: 2.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:20:32,737 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2917850.0, ans=0.0 2024-08-15 00:20:38,786 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 21 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-15 00:20:39,381 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.76 vs. limit=15.0 2024-08-15 00:20:59,772 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 17 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-15 00:21:01,585 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 17 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-15 00:21:26,227 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2918150.0, ans=0.1 2024-08-15 00:21:26,297 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2918150.0, ans=0.1 2024-08-15 00:21:39,370 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.12 vs. limit=6.0 2024-08-15 00:21:40,020 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2918250.0, ans=0.125 2024-08-15 00:21:51,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2918350.0, ans=0.1 2024-08-15 00:21:52,702 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 2000, loss[loss=0.1002, beats_loss=0.01215, ecapa_loss=0.0001504, whisper_loss=0.08651, over 21243.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01041, ecapa_loss=0.0001502, whisper_loss=0.09021, over 3819091.69 frames. ], batch size: 85, lr: 2.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:22:19,089 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2918450.0, ans=0.0 2024-08-15 00:22:26,737 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2918550.0, ans=0.125 2024-08-15 00:22:27,099 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.73 vs. limit=15.0 2024-08-15 00:22:36,907 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-15 00:22:46,709 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.73 vs. limit=22.5 2024-08-15 00:22:49,153 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.908e+01 2.256e+01 2.494e+01 2.913e+01 5.528e+01, threshold=4.988e+01, percent-clipped=1.0 2024-08-15 00:22:56,725 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 15 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-15 00:23:03,209 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-15 00:23:15,062 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 2050, loss[loss=0.08876, beats_loss=0.01095, ecapa_loss=0.0001463, whisper_loss=0.07635, over 20881.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.0105, ecapa_loss=0.000149, whisper_loss=0.08919, over 3823673.04 frames. ], batch size: 86, lr: 2.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:23:15,859 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2918850.0, ans=0.1 2024-08-15 00:23:17,117 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 20 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-15 00:23:18,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2918850.0, ans=0.1 2024-08-15 00:23:21,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2918850.0, ans=0.0 2024-08-15 00:23:48,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2919050.0, ans=0.125 2024-08-15 00:23:51,921 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.46 vs. limit=22.5 2024-08-15 00:23:53,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2919050.0, ans=0.2 2024-08-15 00:24:36,909 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 2100, loss[loss=0.1035, beats_loss=0.0105, ecapa_loss=0.0001307, whisper_loss=0.09168, over 18883.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01055, ecapa_loss=0.0001477, whisper_loss=0.08954, over 3809981.13 frames. ], batch size: 73, lr: 2.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:24:37,073 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 24 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-15 00:24:40,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2919350.0, ans=0.0 2024-08-15 00:25:05,524 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2919450.0, ans=0.0 2024-08-15 00:25:08,860 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-15 00:25:19,851 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.58 vs. limit=12.0 2024-08-15 00:25:31,861 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.720e+01 2.305e+01 2.496e+01 2.863e+01 3.632e+01, threshold=4.993e+01, percent-clipped=0.0 2024-08-15 00:25:33,662 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 17 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-15 00:25:35,291 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 17 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-15 00:25:37,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2919650.0, ans=0.1 2024-08-15 00:25:48,111 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 23 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-15 00:25:51,075 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-15 00:25:55,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2919750.0, ans=0.1 2024-08-15 00:25:57,927 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 2150, loss[loss=0.09024, beats_loss=0.01312, ecapa_loss=0.0001286, whisper_loss=0.07583, over 22437.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01062, ecapa_loss=0.000146, whisper_loss=0.08983, over 3820256.32 frames. ], batch size: 91, lr: 2.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:26:00,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2919850.0, ans=0.0 2024-08-15 00:26:03,686 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-15 00:26:05,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2919850.0, ans=0.125 2024-08-15 00:26:22,179 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-292000.pt 2024-08-15 00:26:26,421 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 34 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-15 00:26:28,344 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.31 vs. limit=10.0 2024-08-15 00:26:39,052 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 18 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-15 00:26:57,234 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 21 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-15 00:27:04,283 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2920150.0, ans=0.1 2024-08-15 00:27:17,903 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 19 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-15 00:27:26,845 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 2200, loss[loss=0.0984, beats_loss=0.01118, ecapa_loss=0.000157, whisper_loss=0.08565, over 21872.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01071, ecapa_loss=0.0001458, whisper_loss=0.08969, over 3813152.38 frames. ], batch size: 90, lr: 2.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:28:02,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2920550.0, ans=0.0 2024-08-15 00:28:10,514 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2920550.0, ans=0.1 2024-08-15 00:28:21,218 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 18 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-15 00:28:22,506 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.720e+01 2.336e+01 2.682e+01 3.041e+01 4.507e+01, threshold=5.364e+01, percent-clipped=0.0 2024-08-15 00:28:34,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2920750.0, ans=0.0 2024-08-15 00:28:41,990 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2920750.0, ans=0.0 2024-08-15 00:28:49,487 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 2250, loss[loss=0.08966, beats_loss=0.01074, ecapa_loss=0.0001644, whisper_loss=0.07727, over 15336.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01079, ecapa_loss=0.0001471, whisper_loss=0.08937, over 3815308.96 frames. ], batch size: 63, lr: 2.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:28:50,335 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2920850.0, ans=0.0 2024-08-15 00:28:56,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2920850.0, ans=0.0 2024-08-15 00:29:15,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2920950.0, ans=0.2 2024-08-15 00:29:21,736 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 25 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-15 00:29:49,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2921150.0, ans=0.125 2024-08-15 00:29:54,821 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-15 00:30:11,316 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 2300, loss[loss=0.105, beats_loss=0.01092, ecapa_loss=0.0001481, whisper_loss=0.09258, over 21941.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01074, ecapa_loss=0.00015, whisper_loss=0.09019, over 3842608.83 frames. ], batch size: 88, lr: 2.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:30:32,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2921450.0, ans=0.0 2024-08-15 00:30:44,033 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2024-08-15 00:30:45,535 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.71 vs. limit=10.0 2024-08-15 00:30:51,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2921550.0, ans=0.2 2024-08-15 00:30:56,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2921650.0, ans=0.125 2024-08-15 00:30:58,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2921650.0, ans=0.04949747468305833 2024-08-15 00:31:04,698 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.272e+01 2.490e+01 2.826e+01 4.749e+01, threshold=4.979e+01, percent-clipped=0.0 2024-08-15 00:31:07,856 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 34 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-15 00:31:23,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2921750.0, ans=0.0 2024-08-15 00:31:29,314 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.00 vs. limit=15.0 2024-08-15 00:31:32,532 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 2350, loss[loss=0.1074, beats_loss=0.01073, ecapa_loss=0.0001363, whisper_loss=0.09533, over 22485.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0107, ecapa_loss=0.0001496, whisper_loss=0.0907, over 3840693.59 frames. ], batch size: 90, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:31:35,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2921850.0, ans=0.125 2024-08-15 00:31:38,809 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 15 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-15 00:31:57,429 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 00:32:09,819 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 26 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-15 00:32:41,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2922250.0, ans=0.2 2024-08-15 00:32:53,755 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 2400, loss[loss=0.1156, beats_loss=0.01052, ecapa_loss=0.00016, whisper_loss=0.1034, over 21977.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01062, ecapa_loss=0.00015, whisper_loss=0.09146, over 3830272.41 frames. ], batch size: 89, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:33:12,720 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2922450.0, ans=0.0 2024-08-15 00:33:18,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2922450.0, ans=0.125 2024-08-15 00:33:23,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2922450.0, ans=0.125 2024-08-15 00:33:43,707 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2922650.0, ans=0.125 2024-08-15 00:33:50,221 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.905e+01 2.275e+01 2.495e+01 2.898e+01 2.121e+02, threshold=4.990e+01, percent-clipped=1.0 2024-08-15 00:34:15,617 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 2450, loss[loss=0.1145, beats_loss=0.01129, ecapa_loss=0.0001191, whisper_loss=0.102, over 19298.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0106, ecapa_loss=0.0001494, whisper_loss=0.09113, over 3824273.11 frames. ], batch size: 75, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:34:15,861 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 20 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-15 00:34:34,644 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-15 00:34:59,100 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 26 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-15 00:35:28,359 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2923250.0, ans=0.0 2024-08-15 00:35:29,831 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 23 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-15 00:35:31,711 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.03 vs. limit=10.0 2024-08-15 00:35:38,642 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 2500, loss[loss=0.1035, beats_loss=0.01081, ecapa_loss=0.0001298, whisper_loss=0.09136, over 15568.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01062, ecapa_loss=0.0001499, whisper_loss=0.09069, over 3795772.38 frames. ], batch size: 59, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:35:41,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2923350.0, ans=0.125 2024-08-15 00:35:43,472 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2923350.0, ans=0.09899494936611666 2024-08-15 00:35:52,871 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2923450.0, ans=0.0 2024-08-15 00:36:20,001 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 37 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-15 00:36:22,848 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-15 00:36:32,284 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.759e+01 2.328e+01 2.584e+01 2.965e+01 7.495e+01, threshold=5.168e+01, percent-clipped=2.0 2024-08-15 00:36:38,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2923650.0, ans=0.125 2024-08-15 00:36:42,181 WARNING [optim.py:496] (0/4) Scaling gradients by 0.026615602895617485, model_norm_threshold=51.67815017700195 2024-08-15 00:36:42,374 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.18, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.809e+05, grad_sumsq=6.809e+05, orig_rms_sq=1.000e+00 2024-08-15 00:36:44,192 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2923750.0, ans=0.125 2024-08-15 00:36:56,274 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 13 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-15 00:36:59,190 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 2550, loss[loss=0.08979, beats_loss=0.01011, ecapa_loss=0.0001684, whisper_loss=0.07799, over 17645.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01047, ecapa_loss=0.0001493, whisper_loss=0.09187, over 3810845.66 frames. ], batch size: 71, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:37:15,496 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 18 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-15 00:37:22,769 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.07 vs. limit=22.5 2024-08-15 00:37:33,903 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 21 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-15 00:38:08,066 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-15 00:38:15,497 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 23 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-15 00:38:16,926 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 2600, loss[loss=0.1002, beats_loss=0.007575, ecapa_loss=0.0001886, whisper_loss=0.09069, over 16857.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01054, ecapa_loss=0.00015, whisper_loss=0.09116, over 3831704.37 frames. ], batch size: 68, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:38:17,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2924350.0, ans=0.0 2024-08-15 00:38:20,472 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2924350.0, ans=0.125 2024-08-15 00:38:51,923 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2924550.0, ans=0.0 2024-08-15 00:39:08,650 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.339e+01 2.635e+01 2.939e+01 1.942e+03, threshold=5.270e+01, percent-clipped=3.0 2024-08-15 00:39:26,997 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-15 00:39:32,862 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 2650, loss[loss=0.1016, beats_loss=0.01179, ecapa_loss=0.0001283, whisper_loss=0.08852, over 14164.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0105, ecapa_loss=0.0001502, whisper_loss=0.09106, over 3826489.37 frames. ], batch size: 56, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:39:36,190 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-15 00:40:03,799 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2024-08-15 00:40:08,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2925050.0, ans=0.0 2024-08-15 00:40:12,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2925050.0, ans=0.2 2024-08-15 00:40:15,682 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.92 vs. limit=12.0 2024-08-15 00:40:18,256 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 26 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-15 00:40:35,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2925250.0, ans=0.0 2024-08-15 00:40:36,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2925250.0, ans=0.0 2024-08-15 00:40:50,042 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 2700, loss[loss=0.07507, beats_loss=0.01089, ecapa_loss=0.0001947, whisper_loss=0.06223, over 18185.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01051, ecapa_loss=0.0001488, whisper_loss=0.09153, over 3852481.77 frames. ], batch size: 78, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:40:54,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2925350.0, ans=0.125 2024-08-15 00:41:26,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2925550.0, ans=0.125 2024-08-15 00:41:26,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2925550.0, ans=0.1 2024-08-15 00:41:43,696 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.535e+01 2.279e+01 2.490e+01 2.726e+01 4.419e+01, threshold=4.980e+01, percent-clipped=0.0 2024-08-15 00:41:57,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2925750.0, ans=0.0 2024-08-15 00:42:00,999 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-15 00:42:08,499 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 19 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-15 00:42:09,560 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 2750, loss[loss=0.09431, beats_loss=0.01182, ecapa_loss=0.0001317, whisper_loss=0.08117, over 15675.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01051, ecapa_loss=0.0001498, whisper_loss=0.09102, over 3860121.17 frames. ], batch size: 61, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:42:40,793 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2926050.0, ans=0.2 2024-08-15 00:42:45,361 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 18 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-15 00:42:50,959 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 19 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-15 00:42:54,212 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 23 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-15 00:42:57,781 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2926150.0, ans=0.125 2024-08-15 00:42:59,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2926150.0, ans=0.0 2024-08-15 00:43:08,671 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2926150.0, ans=0.0 2024-08-15 00:43:15,226 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.37 vs. limit=15.0 2024-08-15 00:43:22,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2926250.0, ans=0.0 2024-08-15 00:43:24,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2926250.0, ans=0.125 2024-08-15 00:43:31,324 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 22 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-15 00:43:32,321 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 2800, loss[loss=0.0943, beats_loss=0.01087, ecapa_loss=0.0001594, whisper_loss=0.08184, over 19327.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0105, ecapa_loss=0.0001496, whisper_loss=0.09138, over 3878369.99 frames. ], batch size: 78, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:43:34,006 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.46 vs. limit=10.0 2024-08-15 00:43:43,389 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.35 vs. limit=15.0 2024-08-15 00:43:44,366 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 25 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-15 00:43:52,675 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2926450.0, ans=0.0 2024-08-15 00:44:04,041 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 22 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-15 00:44:18,676 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=7.146e-02 2024-08-15 00:44:31,126 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.375e+01 2.624e+01 2.895e+01 7.200e+01, threshold=5.247e+01, percent-clipped=1.0 2024-08-15 00:44:45,789 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 22 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-15 00:44:54,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2926750.0, ans=0.125 2024-08-15 00:45:00,081 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 2850, loss[loss=0.07999, beats_loss=0.01145, ecapa_loss=0.0001219, whisper_loss=0.06733, over 18853.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01049, ecapa_loss=0.0001494, whisper_loss=0.09161, over 3904567.06 frames. ], batch size: 71, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:45:08,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2926850.0, ans=0.125 2024-08-15 00:45:13,135 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 24 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-15 00:45:15,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2926950.0, ans=0.0 2024-08-15 00:45:30,639 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-15 00:45:35,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2927050.0, ans=0.125 2024-08-15 00:45:47,415 INFO [train_multi_KD3.py:844] (0/4) A total of 100 cuts. 26 from LS+wenet, 27 from Vox, 47 fro AS 2024-08-15 00:46:08,411 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2927250.0, ans=0.125 2024-08-15 00:46:10,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2927250.0, ans=0.1 2024-08-15 00:46:21,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2927250.0, ans=0.125 2024-08-15 00:46:22,906 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-15 00:46:24,450 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 2900, loss[loss=0.1091, beats_loss=0.01153, ecapa_loss=0.0001606, whisper_loss=0.09592, over 21305.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01066, ecapa_loss=0.0001505, whisper_loss=0.09097, over 3914252.90 frames. ], batch size: 88, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:46:33,987 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.79 vs. limit=22.5 2024-08-15 00:46:43,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2927450.0, ans=0.1 2024-08-15 00:46:51,455 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2927450.0, ans=0.125 2024-08-15 00:46:54,935 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 26 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-15 00:47:17,406 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.01 vs. limit=12.0 2024-08-15 00:47:23,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2927650.0, ans=0.0 2024-08-15 00:47:24,860 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.908e+01 2.340e+01 2.621e+01 2.930e+01 9.659e+01, threshold=5.242e+01, percent-clipped=1.0 2024-08-15 00:47:43,312 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-15 00:47:52,608 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 2950, loss[loss=0.1079, beats_loss=0.01305, ecapa_loss=0.000158, whisper_loss=0.09324, over 17504.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0106, ecapa_loss=0.000152, whisper_loss=0.09117, over 3937608.85 frames. ], batch size: 70, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:48:00,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2927850.0, ans=0.05 2024-08-15 00:48:02,619 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.54 vs. limit=15.0 2024-08-15 00:48:02,730 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.38 vs. limit=15.0 2024-08-15 00:48:34,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2928050.0, ans=0.1 2024-08-15 00:48:36,479 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=5.214e+05 2024-08-15 00:48:39,872 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-15 00:48:50,994 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 25 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-15 00:49:06,605 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 39 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-15 00:49:20,220 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2928250.0, ans=0.1 2024-08-15 00:49:22,676 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 3000, loss[loss=0.09008, beats_loss=0.01248, ecapa_loss=0.0001448, whisper_loss=0.07615, over 19356.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01058, ecapa_loss=0.0001532, whisper_loss=0.09144, over 3922354.66 frames. ], batch size: 75, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:49:22,677 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-15 00:49:42,689 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.1680, 2.1159, 3.5011, 3.3001], device='cuda:0') 2024-08-15 00:50:02,517 INFO [train_multi_KD3.py:1149] (0/4) Epoch 21, validation on ASR_libri: loss=0.2533, beats_loss=0, ecapa_loss=0.0005339, whisper_loss=0.248, over 922467.00 frames. 2024-08-15 00:50:21,779 INFO [train_multi_KD3.py:1149] (0/4) Epoch 21, validation on SV_voxceleb1: loss=0.004208, beats_loss=0, ecapa_loss=0.0004208, whisper_loss=0, over 939242.00 frames. 2024-08-15 00:51:35,214 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.8362, 2.1611, 1.7757, 1.2821, 1.5671, 1.5547, 1.9637, 1.9138], device='cuda:0') 2024-08-15 00:52:15,951 INFO [train_multi_KD3.py:1149] (0/4) Epoch 21, validation on AT_audioset: loss=0.02337, beats_loss=0.02337, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 00:52:15,955 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32601MB 2024-08-15 00:52:21,032 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.75 vs. limit=6.0 2024-08-15 00:52:36,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2928450.0, ans=0.125 2024-08-15 00:52:36,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2928450.0, ans=0.125 2024-08-15 00:52:39,365 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.74 vs. limit=22.5 2024-08-15 00:52:56,888 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2928550.0, ans=0.07 2024-08-15 00:52:59,626 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 17 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-15 00:53:08,684 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.875e+01 2.494e+01 2.724e+01 3.053e+01 2.712e+02, threshold=5.448e+01, percent-clipped=2.0 2024-08-15 00:53:26,245 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-15 00:53:37,449 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 3050, loss[loss=0.09952, beats_loss=0.009703, ecapa_loss=0.0001682, whisper_loss=0.08814, over 14852.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01065, ecapa_loss=0.0001531, whisper_loss=0.09137, over 3942303.36 frames. ], batch size: 60, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:53:44,297 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 21 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-15 00:53:54,189 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=2928950.0, ans=15.0 2024-08-15 00:54:00,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2928950.0, ans=0.125 2024-08-15 00:54:04,130 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2928950.0, ans=0.125 2024-08-15 00:54:17,034 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-15 00:54:17,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2929050.0, ans=0.125 2024-08-15 00:54:17,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2929050.0, ans=0.125 2024-08-15 00:54:27,703 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 22 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-15 00:54:30,126 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2929150.0, ans=0.09899494936611666 2024-08-15 00:54:34,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=2929150.0, ans=0.05 2024-08-15 00:54:44,020 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 17 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-15 00:54:59,612 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=9.102e-01 2024-08-15 00:55:03,154 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.533e-03 2024-08-15 00:55:06,010 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 3100, loss[loss=0.09524, beats_loss=0.01018, ecapa_loss=0.0001698, whisper_loss=0.08337, over 18188.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01071, ecapa_loss=0.0001528, whisper_loss=0.09081, over 3946223.09 frames. ], batch size: 75, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:55:27,412 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 17 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-15 00:55:56,510 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2929550.0, ans=0.09899494936611666 2024-08-15 00:56:00,230 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 00:56:05,860 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.762e+01 2.234e+01 2.421e+01 2.829e+01 3.932e+01, threshold=4.842e+01, percent-clipped=0.0 2024-08-15 00:56:12,537 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 24 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-15 00:56:24,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2929750.0, ans=0.125 2024-08-15 00:56:34,185 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 3150, loss[loss=0.09273, beats_loss=0.01142, ecapa_loss=0.0001382, whisper_loss=0.07993, over 16895.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01068, ecapa_loss=0.0001525, whisper_loss=0.09175, over 3900044.06 frames. ], batch size: 65, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:56:41,874 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2929850.0, ans=0.125 2024-08-15 00:56:43,721 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 35 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-15 00:57:02,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2929950.0, ans=0.0 2024-08-15 00:57:04,527 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 00:57:09,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2930050.0, ans=0.2 2024-08-15 00:57:16,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2930050.0, ans=0.125 2024-08-15 00:57:48,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2930250.0, ans=0.125 2024-08-15 00:57:58,648 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 3200, loss[loss=0.1124, beats_loss=0.01059, ecapa_loss=0.0001401, whisper_loss=0.1004, over 18878.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01063, ecapa_loss=0.000152, whisper_loss=0.0921, over 3869634.83 frames. ], batch size: 72, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:58:00,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2930350.0, ans=0.0 2024-08-15 00:58:09,158 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 17 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-15 00:58:58,100 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.935e+01 2.340e+01 2.607e+01 2.888e+01 4.627e+01, threshold=5.214e+01, percent-clipped=0.0 2024-08-15 00:59:08,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2930750.0, ans=0.0 2024-08-15 00:59:10,519 WARNING [optim.py:496] (0/4) Scaling gradients by 0.07108230143785477, model_norm_threshold=52.141944885253906 2024-08-15 00:59:10,716 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.17, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=9.209e+04, grad_sumsq=9.209e+04, orig_rms_sq=1.000e+00 2024-08-15 00:59:18,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2930750.0, ans=0.95 2024-08-15 00:59:23,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2930750.0, ans=0.0 2024-08-15 00:59:26,753 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 3250, loss[loss=0.1149, beats_loss=0.009282, ecapa_loss=0.0001812, whisper_loss=0.1038, over 21386.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01056, ecapa_loss=0.0001529, whisper_loss=0.09244, over 3869701.02 frames. ], batch size: 84, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:59:32,568 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2930850.0, ans=0.1 2024-08-15 00:59:50,974 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.44 vs. limit=22.5 2024-08-15 00:59:58,956 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.06 vs. limit=15.0 2024-08-15 01:00:08,767 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2931050.0, ans=0.1 2024-08-15 01:00:12,454 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 26 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-15 01:00:16,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2931050.0, ans=0.1 2024-08-15 01:00:52,395 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2931350.0, ans=0.125 2024-08-15 01:00:53,139 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 3300, loss[loss=0.07689, beats_loss=0.01225, ecapa_loss=0.0002056, whisper_loss=0.06259, over 17872.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01056, ecapa_loss=0.0001535, whisper_loss=0.09237, over 3860125.63 frames. ], batch size: 80, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:01:06,681 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-15 01:01:15,104 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.05 vs. limit=15.0 2024-08-15 01:01:21,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2931450.0, ans=0.1 2024-08-15 01:01:38,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2931550.0, ans=0.125 2024-08-15 01:01:42,489 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.56 vs. limit=15.0 2024-08-15 01:01:47,530 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.782e+01 2.317e+01 2.622e+01 2.887e+01 7.335e+02, threshold=5.244e+01, percent-clipped=2.0 2024-08-15 01:01:51,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2931650.0, ans=0.125 2024-08-15 01:02:09,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2931750.0, ans=0.125 2024-08-15 01:02:14,770 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 3350, loss[loss=0.1142, beats_loss=0.008783, ecapa_loss=0.0001858, whisper_loss=0.1036, over 21265.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01059, ecapa_loss=0.0001529, whisper_loss=0.09255, over 3876958.23 frames. ], batch size: 89, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:02:17,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2931850.0, ans=0.125 2024-08-15 01:02:37,932 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 31 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-15 01:02:39,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2931950.0, ans=0.125 2024-08-15 01:02:50,716 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 24 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-15 01:02:51,212 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.64 vs. limit=6.0 2024-08-15 01:02:58,913 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 31 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-15 01:03:16,557 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 24 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-15 01:03:27,808 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 23 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-15 01:03:45,974 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 3400, loss[loss=0.08429, beats_loss=0.0129, ecapa_loss=0.0001406, whisper_loss=0.06998, over 21549.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01054, ecapa_loss=0.0001525, whisper_loss=0.09231, over 3890962.31 frames. ], batch size: 88, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:04:10,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2932450.0, ans=0.1 2024-08-15 01:04:22,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=2932550.0, ans=10.0 2024-08-15 01:04:26,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2932550.0, ans=0.0 2024-08-15 01:04:29,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2932550.0, ans=0.125 2024-08-15 01:04:33,027 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 19 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-15 01:04:34,317 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 24 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-15 01:04:43,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2932650.0, ans=0.125 2024-08-15 01:04:44,584 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.702e+01 2.349e+01 2.635e+01 2.975e+01 2.960e+02, threshold=5.270e+01, percent-clipped=3.0 2024-08-15 01:05:02,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2932750.0, ans=0.04949747468305833 2024-08-15 01:05:09,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2932750.0, ans=0.1 2024-08-15 01:05:14,102 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 3450, loss[loss=0.1192, beats_loss=0.009655, ecapa_loss=0.0001352, whisper_loss=0.1082, over 16616.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01054, ecapa_loss=0.0001536, whisper_loss=0.09186, over 3887727.74 frames. ], batch size: 62, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:05:14,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2932850.0, ans=0.0 2024-08-15 01:05:14,592 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=2932850.0, ans=0.95 2024-08-15 01:05:30,680 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.89 vs. limit=6.0 2024-08-15 01:06:29,514 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2933250.0, ans=0.125 2024-08-15 01:06:33,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2933250.0, ans=0.125 2024-08-15 01:06:47,211 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 3500, loss[loss=0.1052, beats_loss=0.01024, ecapa_loss=0.0001364, whisper_loss=0.09364, over 22874.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01059, ecapa_loss=0.0001527, whisper_loss=0.09104, over 3867560.83 frames. ], batch size: 91, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:06:52,443 WARNING [optim.py:496] (0/4) Scaling gradients by 0.0579170286655426, model_norm_threshold=52.703590393066406 2024-08-15 01:06:52,624 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.12, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.021e+05, grad_sumsq=2.962e+04, orig_rms_sq=3.448e+00 2024-08-15 01:06:53,298 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.52 vs. limit=15.0 2024-08-15 01:07:11,142 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.48 vs. limit=15.0 2024-08-15 01:07:12,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2933450.0, ans=0.125 2024-08-15 01:07:31,056 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.06 vs. limit=12.0 2024-08-15 01:07:37,567 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 16 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-15 01:07:43,090 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 20 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-15 01:07:44,490 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.730e+01 2.354e+01 2.571e+01 2.919e+01 9.100e+02, threshold=5.142e+01, percent-clipped=1.0 2024-08-15 01:07:46,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2933650.0, ans=0.0 2024-08-15 01:07:55,296 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.87 vs. limit=15.0 2024-08-15 01:08:01,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2933750.0, ans=0.2 2024-08-15 01:08:10,715 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 3550, loss[loss=0.07734, beats_loss=0.01336, ecapa_loss=0.0001108, whisper_loss=0.06287, over 15966.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01054, ecapa_loss=0.0001537, whisper_loss=0.09141, over 3874281.13 frames. ], batch size: 61, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:08:18,891 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 17 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-15 01:08:20,571 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2933850.0, ans=0.0 2024-08-15 01:08:31,285 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 16 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-15 01:08:36,358 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 14 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-15 01:09:00,991 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 22 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-15 01:09:01,208 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2934150.0, ans=0.125 2024-08-15 01:09:20,404 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 01:09:33,281 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 3600, loss[loss=0.1108, beats_loss=0.01082, ecapa_loss=0.0001387, whisper_loss=0.09858, over 19839.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01061, ecapa_loss=0.0001525, whisper_loss=0.09123, over 3906126.13 frames. ], batch size: 80, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:09:48,150 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2934350.0, ans=0.125 2024-08-15 01:10:10,117 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2934550.0, ans=0.125 2024-08-15 01:10:12,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=2934550.0, ans=0.2 2024-08-15 01:10:31,170 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 2.251e+01 2.514e+01 2.875e+01 6.843e+01, threshold=5.029e+01, percent-clipped=1.0 2024-08-15 01:10:33,478 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.61 vs. limit=22.5 2024-08-15 01:10:35,525 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2934650.0, ans=0.05 2024-08-15 01:10:47,763 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.34 vs. limit=6.0 2024-08-15 01:10:52,455 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2934750.0, ans=0.0 2024-08-15 01:10:58,971 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 3650, loss[loss=0.1098, beats_loss=0.01113, ecapa_loss=0.0001307, whisper_loss=0.09737, over 22910.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01062, ecapa_loss=0.0001532, whisper_loss=0.09072, over 3874081.17 frames. ], batch size: 90, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:10:59,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2934850.0, ans=0.125 2024-08-15 01:11:20,801 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 16 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-15 01:11:27,358 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 24 from LS+wenet, 10 from Vox, 23 fro AS 2024-08-15 01:11:34,056 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 32 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-15 01:11:47,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2935150.0, ans=0.04949747468305833 2024-08-15 01:11:57,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2935150.0, ans=0.125 2024-08-15 01:11:58,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2935150.0, ans=0.125 2024-08-15 01:12:03,539 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.66 vs. limit=15.0 2024-08-15 01:12:19,902 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 3700, loss[loss=0.1139, beats_loss=0.01101, ecapa_loss=0.0001516, whisper_loss=0.1014, over 22041.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0106, ecapa_loss=0.0001523, whisper_loss=0.09103, over 3872329.47 frames. ], batch size: 90, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:12:32,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2935350.0, ans=0.5 2024-08-15 01:12:44,229 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.98 vs. limit=15.0 2024-08-15 01:12:46,093 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.18 vs. limit=5.0 2024-08-15 01:12:59,273 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2935550.0, ans=0.0 2024-08-15 01:13:05,017 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 38 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-15 01:13:17,735 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.268e+01 2.455e+01 2.851e+01 9.066e+01, threshold=4.910e+01, percent-clipped=1.0 2024-08-15 01:13:45,942 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 3750, loss[loss=0.1127, beats_loss=0.01215, ecapa_loss=0.000137, whisper_loss=0.09916, over 18920.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01061, ecapa_loss=0.0001526, whisper_loss=0.0905, over 3850449.22 frames. ], batch size: 73, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:13:53,656 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-15 01:13:55,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=2935850.0, ans=0.5 2024-08-15 01:14:15,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2935950.0, ans=0.1 2024-08-15 01:14:18,426 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 15 from Vox, 47 fro AS 2024-08-15 01:14:23,040 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-15 01:14:29,338 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 17 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-15 01:14:37,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2936150.0, ans=0.1 2024-08-15 01:14:39,445 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 24 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-15 01:14:57,892 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.274e-01 2024-08-15 01:15:00,723 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2936250.0, ans=0.125 2024-08-15 01:15:11,059 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 3800, loss[loss=0.09882, beats_loss=0.0113, ecapa_loss=0.000173, whisper_loss=0.08579, over 22235.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01067, ecapa_loss=0.0001526, whisper_loss=0.08993, over 3843079.57 frames. ], batch size: 94, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:15:11,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2936350.0, ans=0.125 2024-08-15 01:15:13,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2936350.0, ans=0.0 2024-08-15 01:15:27,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2936450.0, ans=0.0 2024-08-15 01:15:27,763 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.28 vs. limit=22.5 2024-08-15 01:15:31,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2936450.0, ans=0.125 2024-08-15 01:15:36,351 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 21 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-15 01:15:44,112 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 31 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-15 01:15:45,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2936550.0, ans=0.125 2024-08-15 01:16:05,847 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 19 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-15 01:16:06,988 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.293e+01 2.527e+01 3.101e+01 3.900e+02, threshold=5.055e+01, percent-clipped=2.0 2024-08-15 01:16:08,074 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.89 vs. limit=6.0 2024-08-15 01:16:23,066 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.63 vs. limit=15.0 2024-08-15 01:16:27,616 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 35 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-15 01:16:27,810 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2936750.0, ans=0.0 2024-08-15 01:16:34,153 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 3850, loss[loss=0.1223, beats_loss=0.01026, ecapa_loss=0.0001614, whisper_loss=0.1104, over 22213.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01064, ecapa_loss=0.0001521, whisper_loss=0.09051, over 3854221.79 frames. ], batch size: 89, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:16:41,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2936850.0, ans=0.0 2024-08-15 01:16:42,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2936850.0, ans=0.125 2024-08-15 01:16:44,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2936850.0, ans=0.125 2024-08-15 01:16:46,131 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2936850.0, ans=0.1 2024-08-15 01:17:01,504 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 01:17:02,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2936950.0, ans=0.0 2024-08-15 01:17:09,807 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 39 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-15 01:17:22,676 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.13 vs. limit=22.5 2024-08-15 01:17:23,611 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 29 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-15 01:17:49,410 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2937250.0, ans=0.0 2024-08-15 01:17:52,461 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 22 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-15 01:18:01,524 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 3900, loss[loss=0.09833, beats_loss=0.0106, ecapa_loss=0.0001385, whisper_loss=0.08635, over 16164.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01071, ecapa_loss=0.0001523, whisper_loss=0.09113, over 3886065.49 frames. ], batch size: 61, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:18:26,130 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2937450.0, ans=0.1 2024-08-15 01:18:27,405 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-15 01:18:30,512 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.23 vs. limit=12.0 2024-08-15 01:18:54,826 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 25 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-15 01:18:59,265 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.036e+01 2.342e+01 2.596e+01 2.902e+01 5.795e+01, threshold=5.192e+01, percent-clipped=1.0 2024-08-15 01:19:23,744 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-15 01:19:27,359 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 3950, loss[loss=0.1178, beats_loss=0.00937, ecapa_loss=0.0001553, whisper_loss=0.1069, over 19075.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.0106, ecapa_loss=0.0001546, whisper_loss=0.09173, over 3888144.43 frames. ], batch size: 74, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:19:29,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2937850.0, ans=0.0 2024-08-15 01:19:42,820 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 20 from LS+wenet, 28 from Vox, 24 fro AS 2024-08-15 01:20:00,863 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-15 01:20:01,398 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.68 vs. limit=12.0 2024-08-15 01:20:33,888 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2938150.0, ans=0.0 2024-08-15 01:20:35,476 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 15 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-15 01:20:41,031 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2938250.0, ans=0.07 2024-08-15 01:20:44,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2938250.0, ans=0.0 2024-08-15 01:20:56,626 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 4000, loss[loss=0.07979, beats_loss=0.01214, ecapa_loss=0.000147, whisper_loss=0.06619, over 17448.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01042, ecapa_loss=0.0001548, whisper_loss=0.09256, over 3903703.06 frames. ], batch size: 70, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:21:05,501 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-15 01:21:46,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=2938550.0, ans=0.05 2024-08-15 01:21:58,504 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.427e+01 2.607e+01 2.957e+01 4.809e+01, threshold=5.214e+01, percent-clipped=0.0 2024-08-15 01:22:16,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2938750.0, ans=0.0 2024-08-15 01:22:24,853 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2938750.0, ans=0.125 2024-08-15 01:22:27,516 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 4050, loss[loss=0.1103, beats_loss=0.01101, ecapa_loss=0.000153, whisper_loss=0.09776, over 22247.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01038, ecapa_loss=0.0001545, whisper_loss=0.09249, over 3886481.31 frames. ], batch size: 91, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:22:39,881 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2938850.0, ans=0.125 2024-08-15 01:22:41,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2938850.0, ans=0.0 2024-08-15 01:22:50,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2938950.0, ans=0.1 2024-08-15 01:22:58,077 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2938950.0, ans=0.2 2024-08-15 01:23:07,455 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-15 01:23:29,474 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2939150.0, ans=0.07 2024-08-15 01:23:33,301 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 30 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-15 01:23:58,950 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 4100, loss[loss=0.07794, beats_loss=0.01282, ecapa_loss=0.0001432, whisper_loss=0.06369, over 19309.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01045, ecapa_loss=0.0001548, whisper_loss=0.09167, over 3861660.76 frames. ], batch size: 79, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:24:19,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2939450.0, ans=0.0 2024-08-15 01:24:45,902 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 18 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-15 01:24:53,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2939650.0, ans=0.0 2024-08-15 01:24:58,298 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.844e+01 2.312e+01 2.568e+01 2.993e+01 3.291e+02, threshold=5.136e+01, percent-clipped=2.0 2024-08-15 01:25:07,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2939750.0, ans=0.2 2024-08-15 01:25:26,395 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 4150, loss[loss=0.133, beats_loss=0.008865, ecapa_loss=0.0001394, whisper_loss=0.1228, over 22516.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01059, ecapa_loss=0.0001531, whisper_loss=0.09098, over 3860648.39 frames. ], batch size: 85, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:25:28,103 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 29 from LS+wenet, 27 from Vox, 25 fro AS 2024-08-15 01:25:29,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2939850.0, ans=0.09899494936611666 2024-08-15 01:25:44,221 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2939950.0, ans=0.1 2024-08-15 01:26:06,621 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 8 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-15 01:26:06,968 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2940050.0, ans=0.125 2024-08-15 01:26:09,355 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2940050.0, ans=0.0 2024-08-15 01:26:17,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2940150.0, ans=0.2 2024-08-15 01:26:22,897 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 31 from Vox, 34 fro AS 2024-08-15 01:26:41,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2940250.0, ans=0.0 2024-08-15 01:26:52,918 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 4200, loss[loss=0.1056, beats_loss=0.01157, ecapa_loss=0.0002011, whisper_loss=0.09198, over 20929.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01061, ecapa_loss=0.0001533, whisper_loss=0.09044, over 3865456.40 frames. ], batch size: 89, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:27:06,473 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2940350.0, ans=0.125 2024-08-15 01:27:11,083 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.61 vs. limit=15.0 2024-08-15 01:27:24,592 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2940550.0, ans=0.0 2024-08-15 01:27:44,030 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.838e+01 2.316e+01 2.522e+01 2.873e+01 3.693e+01, threshold=5.044e+01, percent-clipped=0.0 2024-08-15 01:27:51,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2940750.0, ans=0.1 2024-08-15 01:28:06,730 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 4250, loss[loss=0.08826, beats_loss=0.008953, ecapa_loss=0.0001416, whisper_loss=0.0779, over 17084.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01066, ecapa_loss=0.0001532, whisper_loss=0.09005, over 3876068.25 frames. ], batch size: 68, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:28:08,528 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 01:28:12,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2940850.0, ans=0.1 2024-08-15 01:28:15,952 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-15 01:28:22,437 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 21 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-15 01:28:24,863 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-15 01:28:28,831 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-15 01:28:55,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2941150.0, ans=0.1 2024-08-15 01:28:56,847 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 18 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-15 01:29:13,860 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 4300, loss[loss=0.09563, beats_loss=0.01068, ecapa_loss=0.0001535, whisper_loss=0.08342, over 23079.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01073, ecapa_loss=0.0001527, whisper_loss=0.08941, over 3882141.04 frames. ], batch size: 92, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:29:22,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2941350.0, ans=0.125 2024-08-15 01:29:30,796 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.39 vs. limit=15.0 2024-08-15 01:29:58,843 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.262e+01 2.467e+01 2.860e+01 4.963e+01, threshold=4.934e+01, percent-clipped=0.0 2024-08-15 01:30:07,381 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.79 vs. limit=22.5 2024-08-15 01:30:17,712 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.73 vs. limit=6.0 2024-08-15 01:30:19,776 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 4350, loss[loss=0.1175, beats_loss=0.009023, ecapa_loss=0.0001781, whisper_loss=0.1067, over 19003.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01064, ecapa_loss=0.0001531, whisper_loss=0.08971, over 3851789.85 frames. ], batch size: 78, lr: 2.98e-03, grad_scale: 1.152921504606847e+18 2024-08-15 01:30:24,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2941850.0, ans=0.125 2024-08-15 01:30:25,596 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 16 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-15 01:30:42,672 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 23 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-15 01:30:52,938 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-15 01:31:00,294 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2024-08-15 01:31:03,436 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-15 01:31:25,768 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 4400, loss[loss=0.1156, beats_loss=0.01021, ecapa_loss=0.000118, whisper_loss=0.1042, over 24664.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01063, ecapa_loss=0.0001523, whisper_loss=0.08955, over 3859715.78 frames. ], batch size: 92, lr: 2.98e-03, grad_scale: 1.152921504606847e+18 2024-08-15 01:31:34,705 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 14 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-15 01:31:41,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2942450.0, ans=0.125 2024-08-15 01:31:45,529 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 29 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-15 01:31:54,443 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2942550.0, ans=0.0 2024-08-15 01:32:05,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2942650.0, ans=0.125 2024-08-15 01:32:08,698 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 27 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-15 01:32:09,930 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.967e+01 2.385e+01 2.562e+01 2.975e+01 4.289e+01, threshold=5.125e+01, percent-clipped=0.0 2024-08-15 01:32:10,133 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 28 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-15 01:32:19,209 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2942750.0, ans=0.125 2024-08-15 01:32:23,442 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.82 vs. limit=12.0 2024-08-15 01:32:30,917 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 4450, loss[loss=0.1146, beats_loss=0.01042, ecapa_loss=0.0001503, whisper_loss=0.1026, over 23083.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01062, ecapa_loss=0.000152, whisper_loss=0.09005, over 3869633.56 frames. ], batch size: 91, lr: 2.98e-03, grad_scale: 1.152921504606847e+18 2024-08-15 01:32:36,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2942850.0, ans=0.125 2024-08-15 01:32:38,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2942850.0, ans=0.09899494936611666 2024-08-15 01:32:42,760 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2942950.0, ans=0.125 2024-08-15 01:32:49,001 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 28 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-15 01:33:05,220 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.45 vs. limit=12.0 2024-08-15 01:33:06,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2943050.0, ans=0.2 2024-08-15 01:33:07,309 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 16 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-15 01:33:11,286 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 21 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-15 01:33:36,222 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 4500, loss[loss=0.1194, beats_loss=0.01001, ecapa_loss=0.0001458, whisper_loss=0.1079, over 21669.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01069, ecapa_loss=0.0001522, whisper_loss=0.08973, over 3867092.84 frames. ], batch size: 84, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:33:36,455 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-15 01:33:40,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2943350.0, ans=0.0 2024-08-15 01:33:42,315 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=30.09 vs. limit=22.5 2024-08-15 01:33:48,869 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2943450.0, ans=0.125 2024-08-15 01:33:54,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2943450.0, ans=0.0 2024-08-15 01:34:07,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2943550.0, ans=0.1 2024-08-15 01:34:15,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2943650.0, ans=0.1 2024-08-15 01:34:15,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2943650.0, ans=0.1 2024-08-15 01:34:23,173 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.884e+01 2.368e+01 2.661e+01 3.187e+01 2.204e+02, threshold=5.323e+01, percent-clipped=1.0 2024-08-15 01:34:41,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2943850.0, ans=0.1 2024-08-15 01:34:42,806 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 4550, loss[loss=0.102, beats_loss=0.00879, ecapa_loss=0.0001857, whisper_loss=0.09132, over 13787.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01067, ecapa_loss=0.0001532, whisper_loss=0.08998, over 3881613.10 frames. ], batch size: 56, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:34:53,416 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 23 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-15 01:35:17,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2944050.0, ans=0.125 2024-08-15 01:35:18,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2944050.0, ans=0.2 2024-08-15 01:35:29,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2944150.0, ans=0.125 2024-08-15 01:35:30,111 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 25 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-15 01:35:34,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2944250.0, ans=0.0 2024-08-15 01:35:37,591 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 27 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-15 01:35:48,579 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 4600, loss[loss=0.1005, beats_loss=0.01266, ecapa_loss=0.0001557, whisper_loss=0.08625, over 21884.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01057, ecapa_loss=0.0001529, whisper_loss=0.09077, over 3917859.83 frames. ], batch size: 93, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:36:06,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2944450.0, ans=0.2 2024-08-15 01:36:07,176 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 24 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-15 01:36:11,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2944450.0, ans=0.125 2024-08-15 01:36:16,235 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2944550.0, ans=0.1 2024-08-15 01:36:18,401 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 22 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-15 01:36:18,798 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.79 vs. limit=12.0 2024-08-15 01:36:19,651 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 22 from LS+wenet, 10 from Vox, 22 fro AS 2024-08-15 01:36:23,747 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 25 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-15 01:36:33,091 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2944650.0, ans=0.125 2024-08-15 01:36:33,284 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.47 vs. limit=15.0 2024-08-15 01:36:33,903 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.934e+01 2.310e+01 2.603e+01 2.915e+01 4.398e+01, threshold=5.206e+01, percent-clipped=0.0 2024-08-15 01:36:44,952 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 19 from LS+wenet, 21 from Vox, 15 fro AS 2024-08-15 01:36:50,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2944750.0, ans=0.125 2024-08-15 01:36:53,789 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 4650, loss[loss=0.1118, beats_loss=0.008822, ecapa_loss=0.0001752, whisper_loss=0.1012, over 21213.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01053, ecapa_loss=0.0001528, whisper_loss=0.09103, over 3886664.05 frames. ], batch size: 88, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:36:54,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2944850.0, ans=0.0 2024-08-15 01:36:59,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2944850.0, ans=0.125 2024-08-15 01:37:03,313 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2944850.0, ans=0.09899494936611666 2024-08-15 01:37:08,937 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2944950.0, ans=0.125 2024-08-15 01:37:11,166 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 27 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-15 01:37:28,622 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 22 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-15 01:37:37,660 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2945150.0, ans=0.125 2024-08-15 01:37:46,421 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 16 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-15 01:37:59,188 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 4700, loss[loss=0.09571, beats_loss=0.009809, ecapa_loss=0.0001703, whisper_loss=0.08419, over 21965.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01058, ecapa_loss=0.0001524, whisper_loss=0.09066, over 3881933.79 frames. ], batch size: 91, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:37:59,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2945350.0, ans=0.2 2024-08-15 01:38:21,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2945450.0, ans=10.0 2024-08-15 01:38:23,457 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.33 vs. limit=15.0 2024-08-15 01:38:33,617 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 23 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-15 01:38:37,435 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2945650.0, ans=0.1 2024-08-15 01:38:41,022 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-15 01:38:43,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2945650.0, ans=0.2 2024-08-15 01:38:44,921 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.361e+01 2.586e+01 2.935e+01 3.925e+01, threshold=5.173e+01, percent-clipped=0.0 2024-08-15 01:38:48,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2945650.0, ans=0.0 2024-08-15 01:39:05,275 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 4750, loss[loss=0.1206, beats_loss=0.009451, ecapa_loss=0.0001814, whisper_loss=0.1093, over 21086.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01065, ecapa_loss=0.0001507, whisper_loss=0.09073, over 3896056.70 frames. ], batch size: 84, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:39:27,807 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.518e-03 2024-08-15 01:39:40,822 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2946050.0, ans=0.125 2024-08-15 01:39:43,218 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 22 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-15 01:39:52,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2946150.0, ans=0.0 2024-08-15 01:39:59,338 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.72 vs. limit=15.0 2024-08-15 01:40:04,946 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.45 vs. limit=10.0 2024-08-15 01:40:14,058 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 4800, loss[loss=0.1178, beats_loss=0.01084, ecapa_loss=0.0001589, whisper_loss=0.1054, over 23427.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01071, ecapa_loss=0.000151, whisper_loss=0.09049, over 3908719.38 frames. ], batch size: 93, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:40:14,181 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 26 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-15 01:40:14,474 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2946350.0, ans=0.1 2024-08-15 01:40:18,494 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 20 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-15 01:40:23,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2946350.0, ans=0.125 2024-08-15 01:40:43,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2946550.0, ans=0.025 2024-08-15 01:40:45,130 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2946550.0, ans=0.1 2024-08-15 01:40:48,253 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.65 vs. limit=15.0 2024-08-15 01:41:06,799 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2946650.0, ans=0.125 2024-08-15 01:41:07,534 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.617e+01 2.219e+01 2.446e+01 2.733e+01 3.979e+01, threshold=4.892e+01, percent-clipped=0.0 2024-08-15 01:41:15,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2946750.0, ans=0.0 2024-08-15 01:41:30,755 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 4850, loss[loss=0.1184, beats_loss=0.00945, ecapa_loss=0.0001508, whisper_loss=0.1075, over 22915.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01075, ecapa_loss=0.0001516, whisper_loss=0.09025, over 3930248.05 frames. ], batch size: 90, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:41:36,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2946850.0, ans=0.1 2024-08-15 01:41:37,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2946850.0, ans=0.0 2024-08-15 01:41:44,399 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.44 vs. limit=22.5 2024-08-15 01:41:45,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2946950.0, ans=0.2 2024-08-15 01:41:53,381 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 26 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-15 01:41:58,948 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2946950.0, ans=0.0 2024-08-15 01:41:59,891 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 19 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-15 01:42:03,821 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 28 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-15 01:42:31,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2947150.0, ans=0.1 2024-08-15 01:42:32,763 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 22 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-15 01:42:35,357 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-15 01:42:41,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2947250.0, ans=0.0 2024-08-15 01:42:46,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2947250.0, ans=0.125 2024-08-15 01:42:49,377 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 4900, loss[loss=0.09403, beats_loss=0.009736, ecapa_loss=0.0001689, whisper_loss=0.08261, over 14483.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01067, ecapa_loss=0.0001522, whisper_loss=0.09066, over 3904983.51 frames. ], batch size: 60, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:42:49,564 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 28 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-15 01:42:51,455 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2947350.0, ans=0.0 2024-08-15 01:42:57,254 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.29 vs. limit=15.0 2024-08-15 01:43:04,494 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 18 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-15 01:43:04,737 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2947450.0, ans=0.2 2024-08-15 01:43:08,793 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2947450.0, ans=0.125 2024-08-15 01:43:08,809 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.370e+00 2024-08-15 01:43:39,026 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2947650.0, ans=0.0 2024-08-15 01:43:40,579 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.76 vs. limit=15.0 2024-08-15 01:43:42,816 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.972e+01 2.371e+01 2.641e+01 2.917e+01 5.290e+01, threshold=5.283e+01, percent-clipped=1.0 2024-08-15 01:43:43,017 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 21 from LS+wenet, 34 from Vox, 28 fro AS 2024-08-15 01:43:44,467 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 23 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-15 01:43:45,738 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 10 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-15 01:43:54,653 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.41 vs. limit=15.0 2024-08-15 01:44:01,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2947750.0, ans=0.125 2024-08-15 01:44:04,843 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 4950, loss[loss=0.08795, beats_loss=0.01044, ecapa_loss=0.0001908, whisper_loss=0.0756, over 16490.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01064, ecapa_loss=0.0001518, whisper_loss=0.09012, over 3855030.59 frames. ], batch size: 69, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:44:18,246 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 16 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-15 01:44:18,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2947950.0, ans=0.0 2024-08-15 01:44:30,785 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-15 01:44:32,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2948050.0, ans=0.125 2024-08-15 01:44:36,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2948050.0, ans=0.125 2024-08-15 01:44:43,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2948050.0, ans=0.125 2024-08-15 01:44:51,377 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 41 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-15 01:44:58,100 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.37 vs. limit=10.0 2024-08-15 01:45:13,216 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 5000, loss[loss=0.1103, beats_loss=0.0119, ecapa_loss=0.0001586, whisper_loss=0.09679, over 21955.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01069, ecapa_loss=0.0001525, whisper_loss=0.09009, over 3844677.31 frames. ], batch size: 91, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:45:19,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2948350.0, ans=0.035 2024-08-15 01:45:22,491 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2948350.0, ans=0.0 2024-08-15 01:45:30,729 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2948450.0, ans=0.125 2024-08-15 01:45:43,892 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2948550.0, ans=0.2 2024-08-15 01:45:58,644 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.804e+01 2.280e+01 2.510e+01 2.795e+01 4.349e+01, threshold=5.019e+01, percent-clipped=0.0 2024-08-15 01:46:10,526 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 44 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-15 01:46:17,855 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 5050, loss[loss=0.09772, beats_loss=0.01004, ecapa_loss=0.000147, whisper_loss=0.08622, over 15984.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01074, ecapa_loss=0.0001525, whisper_loss=0.0905, over 3865893.09 frames. ], batch size: 62, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:46:32,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2948950.0, ans=0.0 2024-08-15 01:46:35,189 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-15 01:46:40,412 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 18 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-15 01:46:47,240 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-15 01:47:00,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2949150.0, ans=0.0 2024-08-15 01:47:08,048 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 17 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-15 01:47:11,713 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 26 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-15 01:47:23,241 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 5100, loss[loss=0.106, beats_loss=0.007708, ecapa_loss=0.000168, whisper_loss=0.09657, over 19816.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01077, ecapa_loss=0.000152, whisper_loss=0.09035, over 3898481.46 frames. ], batch size: 79, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:47:25,850 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 24 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-15 01:47:26,552 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.57 vs. limit=15.0 2024-08-15 01:47:29,208 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2949350.0, ans=0.125 2024-08-15 01:47:36,703 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2949450.0, ans=0.125 2024-08-15 01:47:54,359 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-15 01:47:57,233 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2949550.0, ans=0.1 2024-08-15 01:48:06,545 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 01:48:08,692 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.924e+01 2.324e+01 2.645e+01 2.910e+01 4.236e+01, threshold=5.291e+01, percent-clipped=0.0 2024-08-15 01:48:09,208 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2949650.0, ans=0.0 2024-08-15 01:48:19,150 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2949750.0, ans=0.1 2024-08-15 01:48:19,216 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2949750.0, ans=0.1 2024-08-15 01:48:23,000 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 27 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-15 01:48:26,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2949750.0, ans=0.1 2024-08-15 01:48:28,092 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 5150, loss[loss=0.1098, beats_loss=0.00862, ecapa_loss=0.0001604, whisper_loss=0.09961, over 21977.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01076, ecapa_loss=0.0001529, whisper_loss=0.09032, over 3876731.66 frames. ], batch size: 87, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:48:32,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2949850.0, ans=0.125 2024-08-15 01:49:01,670 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 37 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-15 01:49:06,204 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.52 vs. limit=15.0 2024-08-15 01:49:13,699 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2950150.0, ans=0.125 2024-08-15 01:49:20,033 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-15 01:49:21,351 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 21 from LS+wenet, 19 from Vox, 49 fro AS 2024-08-15 01:49:22,170 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.34 vs. limit=22.5 2024-08-15 01:49:22,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2950250.0, ans=0.1 2024-08-15 01:49:27,177 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2950250.0, ans=0.125 2024-08-15 01:49:33,256 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 5200, loss[loss=0.08068, beats_loss=0.01066, ecapa_loss=0.0001675, whisper_loss=0.06834, over 15157.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01072, ecapa_loss=0.000153, whisper_loss=0.0905, over 3848317.92 frames. ], batch size: 59, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:49:33,485 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-15 01:49:50,174 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.73 vs. limit=15.0 2024-08-15 01:50:19,925 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.309e+01 2.539e+01 2.841e+01 2.667e+02, threshold=5.077e+01, percent-clipped=2.0 2024-08-15 01:50:21,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2950650.0, ans=0.0 2024-08-15 01:50:40,238 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 5250, loss[loss=0.1121, beats_loss=0.008494, ecapa_loss=0.0001883, whisper_loss=0.1017, over 22452.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01081, ecapa_loss=0.000152, whisper_loss=0.08965, over 3832830.68 frames. ], batch size: 93, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:50:40,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2950850.0, ans=0.07 2024-08-15 01:50:53,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2950950.0, ans=0.125 2024-08-15 01:50:59,787 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 21 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-15 01:51:02,261 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 28 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-15 01:51:15,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2951050.0, ans=0.2 2024-08-15 01:51:16,858 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2951050.0, ans=0.125 2024-08-15 01:51:22,543 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 28 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-15 01:51:38,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2951250.0, ans=0.2 2024-08-15 01:51:45,308 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 17 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-15 01:51:45,640 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2951250.0, ans=0.2 2024-08-15 01:51:48,271 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 17 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-15 01:51:50,458 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.57 vs. limit=15.0 2024-08-15 01:51:51,134 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 5300, loss[loss=0.1141, beats_loss=0.0113, ecapa_loss=0.0001107, whisper_loss=0.1017, over 20303.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0107, ecapa_loss=0.0001529, whisper_loss=0.09014, over 3837268.25 frames. ], batch size: 78, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:51:52,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2951350.0, ans=0.035 2024-08-15 01:52:06,961 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.17 vs. limit=15.0 2024-08-15 01:52:12,024 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 24 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-15 01:52:22,095 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2951550.0, ans=0.125 2024-08-15 01:52:25,740 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 14 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-15 01:52:43,977 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.836e+01 2.269e+01 2.498e+01 2.805e+01 4.853e+01, threshold=4.996e+01, percent-clipped=0.0 2024-08-15 01:52:56,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2951750.0, ans=0.125 2024-08-15 01:52:59,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2951750.0, ans=0.1 2024-08-15 01:53:04,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2951750.0, ans=0.125 2024-08-15 01:53:07,553 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 5350, loss[loss=0.09259, beats_loss=0.01208, ecapa_loss=9.829e-05, whisper_loss=0.07952, over 17374.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01064, ecapa_loss=0.0001526, whisper_loss=0.09022, over 3831247.01 frames. ], batch size: 64, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:53:10,050 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.88 vs. limit=22.5 2024-08-15 01:53:16,140 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 21 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-15 01:53:22,923 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 24 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-15 01:53:27,879 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 17 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-15 01:53:53,991 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.95 vs. limit=22.5 2024-08-15 01:54:10,870 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2952250.0, ans=0.125 2024-08-15 01:54:11,404 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.95 vs. limit=22.5 2024-08-15 01:54:15,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2952250.0, ans=0.0 2024-08-15 01:54:15,755 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.49 vs. limit=22.5 2024-08-15 01:54:16,142 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=14.48 vs. limit=15.0 2024-08-15 01:54:18,193 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2952250.0, ans=0.05 2024-08-15 01:54:26,194 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.15 vs. limit=12.0 2024-08-15 01:54:26,675 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 5400, loss[loss=0.1116, beats_loss=0.008874, ecapa_loss=0.0001519, whisper_loss=0.1012, over 22365.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01068, ecapa_loss=0.0001523, whisper_loss=0.0903, over 3855717.24 frames. ], batch size: 88, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:54:27,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2952350.0, ans=0.125 2024-08-15 01:54:40,759 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.59 vs. limit=15.0 2024-08-15 01:55:06,317 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-15 01:55:19,514 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.928e+01 2.348e+01 2.605e+01 2.891e+01 6.130e+01, threshold=5.210e+01, percent-clipped=1.0 2024-08-15 01:55:20,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2952650.0, ans=0.2 2024-08-15 01:55:26,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2952650.0, ans=0.2 2024-08-15 01:55:43,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2952850.0, ans=0.125 2024-08-15 01:55:44,271 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 5450, loss[loss=0.0911, beats_loss=0.01104, ecapa_loss=0.0001433, whisper_loss=0.07863, over 21721.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01072, ecapa_loss=0.0001526, whisper_loss=0.09029, over 3878756.81 frames. ], batch size: 88, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:55:53,355 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.52 vs. limit=12.0 2024-08-15 01:55:57,544 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 37 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-15 01:56:08,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2952950.0, ans=0.1 2024-08-15 01:56:14,092 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.88 vs. limit=15.0 2024-08-15 01:56:23,915 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2953050.0, ans=0.04949747468305833 2024-08-15 01:56:37,994 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 32 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-15 01:56:44,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2953150.0, ans=0.5 2024-08-15 01:56:53,620 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2953250.0, ans=0.2 2024-08-15 01:57:06,014 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 5500, loss[loss=0.09279, beats_loss=0.01286, ecapa_loss=0.0001474, whisper_loss=0.07846, over 21813.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01068, ecapa_loss=0.0001515, whisper_loss=0.09131, over 3917570.17 frames. ], batch size: 88, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:57:14,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2953350.0, ans=0.0 2024-08-15 01:57:18,158 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=2953350.0, ans=0.025 2024-08-15 01:57:25,602 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.68 vs. limit=22.5 2024-08-15 01:57:32,059 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 20 from LS+wenet, 25 from Vox, 20 fro AS 2024-08-15 01:57:34,091 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2953450.0, ans=0.125 2024-08-15 01:58:02,371 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.45 vs. limit=22.5 2024-08-15 01:58:04,864 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.740e+01 2.277e+01 2.522e+01 2.854e+01 1.046e+02, threshold=5.045e+01, percent-clipped=1.0 2024-08-15 01:58:12,325 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 26 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-15 01:58:16,248 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.42 vs. limit=15.0 2024-08-15 01:58:18,579 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-15 01:58:27,374 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 5550, loss[loss=0.1181, beats_loss=0.011, ecapa_loss=0.0001201, whisper_loss=0.1059, over 24793.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01066, ecapa_loss=0.000153, whisper_loss=0.09113, over 3917633.57 frames. ], batch size: 93, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:58:29,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2953850.0, ans=0.125 2024-08-15 01:58:31,052 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 29 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-15 01:59:00,517 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2954050.0, ans=0.0 2024-08-15 01:59:03,422 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 21 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-15 01:59:06,602 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 28 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-15 01:59:17,988 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.23 vs. limit=15.0 2024-08-15 01:59:23,408 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2954150.0, ans=0.2 2024-08-15 01:59:47,410 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 5600, loss[loss=0.08279, beats_loss=0.01186, ecapa_loss=0.0001413, whisper_loss=0.06952, over 18466.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01062, ecapa_loss=0.0001524, whisper_loss=0.09151, over 3915849.13 frames. ], batch size: 72, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:59:59,258 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 28 from Vox, 18 fro AS 2024-08-15 02:00:12,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2954450.0, ans=0.125 2024-08-15 02:00:23,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2954550.0, ans=0.0 2024-08-15 02:00:35,782 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=2954650.0, ans=10.0 2024-08-15 02:00:36,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2954650.0, ans=0.0 2024-08-15 02:00:40,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2954650.0, ans=0.1 2024-08-15 02:00:43,112 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.248e+01 2.477e+01 2.810e+01 7.862e+01, threshold=4.953e+01, percent-clipped=1.0 2024-08-15 02:00:46,885 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-15 02:00:54,853 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-15 02:01:06,686 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 5650, loss[loss=0.06534, beats_loss=0.01499, ecapa_loss=0.0001188, whisper_loss=0.04916, over 18642.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01068, ecapa_loss=0.0001524, whisper_loss=0.09108, over 3928813.72 frames. ], batch size: 76, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:01:10,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2954850.0, ans=0.125 2024-08-15 02:01:13,142 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2954850.0, ans=0.0 2024-08-15 02:01:34,353 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.52 vs. limit=15.0 2024-08-15 02:01:35,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2955050.0, ans=0.125 2024-08-15 02:01:36,646 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 24 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-15 02:01:36,977 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-15 02:01:41,366 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-15 02:01:43,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2955050.0, ans=0.125 2024-08-15 02:02:09,697 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2955250.0, ans=0.125 2024-08-15 02:02:14,862 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2955250.0, ans=0.2 2024-08-15 02:02:18,561 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 5700, loss[loss=0.1026, beats_loss=0.01141, ecapa_loss=0.0001707, whisper_loss=0.08944, over 21274.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01055, ecapa_loss=0.0001535, whisper_loss=0.09197, over 3913163.79 frames. ], batch size: 90, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:02:21,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2955350.0, ans=0.125 2024-08-15 02:02:24,154 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-15 02:02:42,771 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 24 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-15 02:02:48,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_na.min_abs, batch_count=2955550.0, ans=0.02 2024-08-15 02:03:02,472 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.42 vs. limit=22.5 2024-08-15 02:03:03,748 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 21 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-15 02:03:06,365 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.916e+01 2.504e+01 2.865e+01 3.291e+01 2.428e+02, threshold=5.731e+01, percent-clipped=5.0 2024-08-15 02:03:22,054 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 32 from Vox, 29 fro AS 2024-08-15 02:03:27,309 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 5750, loss[loss=0.08232, beats_loss=0.01198, ecapa_loss=0.0001327, whisper_loss=0.06901, over 16325.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01069, ecapa_loss=0.000152, whisper_loss=0.09125, over 3922233.69 frames. ], batch size: 64, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:03:32,151 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 16 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-15 02:03:50,217 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 28 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-15 02:03:53,179 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.54 vs. limit=22.5 2024-08-15 02:03:58,044 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2956050.0, ans=0.0 2024-08-15 02:04:02,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2956050.0, ans=0.125 2024-08-15 02:04:29,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2956250.0, ans=0.0 2024-08-15 02:04:30,858 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.97 vs. limit=15.0 2024-08-15 02:04:34,940 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.90 vs. limit=15.0 2024-08-15 02:04:35,544 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 5800, loss[loss=0.09356, beats_loss=0.01231, ecapa_loss=0.0001748, whisper_loss=0.0795, over 21947.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01062, ecapa_loss=0.000153, whisper_loss=0.0919, over 3913786.70 frames. ], batch size: 91, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:04:43,593 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=5.524e-03 2024-08-15 02:04:58,675 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2956450.0, ans=0.125 2024-08-15 02:05:05,595 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 18 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-15 02:05:19,972 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 12 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-15 02:05:24,253 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 19 from LS+wenet, 33 from Vox, 39 fro AS 2024-08-15 02:05:25,423 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.822e+01 2.342e+01 2.666e+01 2.998e+01 4.632e+01, threshold=5.332e+01, percent-clipped=0.0 2024-08-15 02:05:44,474 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2956850.0, ans=0.125 2024-08-15 02:05:45,515 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 5850, loss[loss=0.07196, beats_loss=0.01212, ecapa_loss=0.0001644, whisper_loss=0.0582, over 22042.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01067, ecapa_loss=0.0001525, whisper_loss=0.09139, over 3916119.86 frames. ], batch size: 92, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:05:49,836 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-15 02:05:52,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2956850.0, ans=0.0 2024-08-15 02:05:59,080 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-15 02:06:34,897 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 19 from LS+wenet, 9 from Vox, 31 fro AS 2024-08-15 02:06:53,792 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 24 from LS+wenet, 13 from Vox, 39 fro AS 2024-08-15 02:06:55,582 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 21 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-15 02:06:58,617 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2957350.0, ans=0.125 2024-08-15 02:06:59,395 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 5900, loss[loss=0.08917, beats_loss=0.01199, ecapa_loss=0.0001135, whisper_loss=0.07605, over 16222.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01066, ecapa_loss=0.0001526, whisper_loss=0.09071, over 3896360.33 frames. ], batch size: 64, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:07:03,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2957350.0, ans=0.125 2024-08-15 02:07:03,772 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.75 vs. limit=22.5 2024-08-15 02:07:07,401 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 18 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-15 02:07:07,741 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2957350.0, ans=0.1 2024-08-15 02:07:07,923 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.55 vs. limit=15.0 2024-08-15 02:07:09,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2957350.0, ans=0.05 2024-08-15 02:07:11,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2957350.0, ans=0.125 2024-08-15 02:07:15,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2957450.0, ans=0.125 2024-08-15 02:07:26,529 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2957450.0, ans=0.125 2024-08-15 02:07:37,723 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 32 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-15 02:07:54,755 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.695e+01 2.296e+01 2.523e+01 2.887e+01 4.052e+01, threshold=5.046e+01, percent-clipped=0.0 2024-08-15 02:07:55,188 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 27 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-15 02:07:55,400 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2957650.0, ans=0.125 2024-08-15 02:07:57,140 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.35 vs. limit=12.0 2024-08-15 02:08:07,399 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.05 vs. limit=15.0 2024-08-15 02:08:15,784 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 5950, loss[loss=0.1123, beats_loss=0.01054, ecapa_loss=0.0001618, whisper_loss=0.1001, over 22895.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01068, ecapa_loss=0.0001526, whisper_loss=0.09021, over 3914149.40 frames. ], batch size: 90, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:08:19,855 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 18 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-15 02:08:35,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2957950.0, ans=0.125 2024-08-15 02:08:36,624 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 13 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-15 02:08:41,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2957950.0, ans=0.125 2024-08-15 02:08:42,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2957950.0, ans=0.125 2024-08-15 02:08:53,694 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-15 02:08:55,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2958050.0, ans=0.125 2024-08-15 02:08:57,005 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2958050.0, ans=0.1 2024-08-15 02:09:04,536 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-15 02:09:06,191 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2958150.0, ans=0.125 2024-08-15 02:09:10,190 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 26 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-15 02:09:15,655 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-15 02:09:23,872 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.18 vs. limit=15.0 2024-08-15 02:09:38,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2958250.0, ans=0.0 2024-08-15 02:09:39,830 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2958350.0, ans=0.1 2024-08-15 02:09:40,777 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 6000, loss[loss=0.1073, beats_loss=0.01081, ecapa_loss=0.0001704, whisper_loss=0.09478, over 22233.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0107, ecapa_loss=0.0001526, whisper_loss=0.09012, over 3888315.58 frames. ], batch size: 93, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:09:40,779 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-15 02:10:47,998 INFO [train_multi_KD3.py:1149] (0/4) Epoch 21, validation on ASR_libri: loss=0.2535, beats_loss=0, ecapa_loss=0.0005526, whisper_loss=0.2479, over 922467.00 frames. 2024-08-15 02:11:14,102 INFO [train_multi_KD3.py:1149] (0/4) Epoch 21, validation on SV_voxceleb1: loss=0.004315, beats_loss=0, ecapa_loss=0.0004315, whisper_loss=0, over 939242.00 frames. 2024-08-15 02:11:41,559 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.1652, 1.7864, 1.6586, 1.5572], device='cuda:0') 2024-08-15 02:13:55,480 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.0032, 2.5443, 2.2535, 1.9510], device='cuda:0') 2024-08-15 02:14:20,452 INFO [train_multi_KD3.py:1149] (0/4) Epoch 21, validation on AT_audioset: loss=0.0235, beats_loss=0.0235, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 02:14:20,457 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32601MB 2024-08-15 02:14:32,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2958350.0, ans=0.0 2024-08-15 02:14:36,950 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.97 vs. limit=15.0 2024-08-15 02:14:52,130 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 15 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-15 02:15:15,279 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-15 02:15:17,964 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.856e+01 2.230e+01 2.610e+01 2.865e+01 2.775e+02, threshold=5.221e+01, percent-clipped=3.0 2024-08-15 02:15:22,131 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-15 02:15:26,091 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 22 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-15 02:15:27,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2958750.0, ans=0.125 2024-08-15 02:15:37,291 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2958850.0, ans=0.2 2024-08-15 02:15:38,016 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 6050, loss[loss=0.1485, beats_loss=0.007264, ecapa_loss=0.0001454, whisper_loss=0.1398, over 25132.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01068, ecapa_loss=0.0001519, whisper_loss=0.0902, over 3890526.94 frames. ], batch size: 92, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:15:50,918 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 19 from LS+wenet, 9 from Vox, 26 fro AS 2024-08-15 02:15:52,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2958950.0, ans=0.1 2024-08-15 02:15:54,591 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 21 from LS+wenet, 24 from Vox, 19 fro AS 2024-08-15 02:15:54,813 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2958950.0, ans=0.125 2024-08-15 02:15:57,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2958950.0, ans=0.125 2024-08-15 02:16:00,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2958950.0, ans=0.0 2024-08-15 02:16:16,633 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.95 vs. limit=12.0 2024-08-15 02:16:19,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2959150.0, ans=0.2 2024-08-15 02:16:19,710 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.83 vs. limit=10.0 2024-08-15 02:16:43,669 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 6100, loss[loss=0.1013, beats_loss=0.01173, ecapa_loss=0.0001115, whisper_loss=0.08846, over 22691.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01069, ecapa_loss=0.0001525, whisper_loss=0.09033, over 3900395.99 frames. ], batch size: 89, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:16:54,445 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 25 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-15 02:17:27,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2959650.0, ans=0.0 2024-08-15 02:17:28,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2959650.0, ans=0.125 2024-08-15 02:17:29,584 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.864e+01 2.268e+01 2.472e+01 3.000e+01 1.337e+02, threshold=4.943e+01, percent-clipped=1.0 2024-08-15 02:17:49,704 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 6150, loss[loss=0.1177, beats_loss=0.008769, ecapa_loss=0.00018, whisper_loss=0.1071, over 17628.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01068, ecapa_loss=0.0001533, whisper_loss=0.09032, over 3875948.67 frames. ], batch size: 72, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:17:53,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2959850.0, ans=0.1 2024-08-15 02:17:56,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2959850.0, ans=0.2 2024-08-15 02:17:58,740 WARNING [optim.py:496] (0/4) Scaling gradients by 0.09473193436861038, model_norm_threshold=49.43223190307617 2024-08-15 02:17:58,949 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.0.norm.log_scale with proportion 0.09, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.424e+04, grad_sumsq=2.424e+04, orig_rms_sq=1.000e+00 2024-08-15 02:18:08,434 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-296000.pt 2024-08-15 02:18:12,909 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 20 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-15 02:18:15,528 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 24 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-15 02:18:32,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2960050.0, ans=0.1 2024-08-15 02:18:54,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2960250.0, ans=0.0 2024-08-15 02:19:01,166 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 6200, loss[loss=0.09238, beats_loss=0.0125, ecapa_loss=0.0001877, whisper_loss=0.078, over 20250.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01075, ecapa_loss=0.0001534, whisper_loss=0.09001, over 3904436.68 frames. ], batch size: 88, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:19:12,535 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.66 vs. limit=12.0 2024-08-15 02:19:33,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2960550.0, ans=0.0 2024-08-15 02:19:49,404 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.384e+01 2.613e+01 3.033e+01 5.218e+02, threshold=5.226e+01, percent-clipped=4.0 2024-08-15 02:19:50,014 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.01 vs. limit=22.5 2024-08-15 02:19:54,270 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.77 vs. limit=15.0 2024-08-15 02:19:56,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2960750.0, ans=0.0 2024-08-15 02:20:04,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2960750.0, ans=0.0 2024-08-15 02:20:09,407 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 6250, loss[loss=0.1253, beats_loss=0.008997, ecapa_loss=0.0001507, whisper_loss=0.1148, over 16431.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01081, ecapa_loss=0.0001533, whisper_loss=0.08957, over 3884168.53 frames. ], batch size: 62, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:20:12,189 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 21 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-15 02:20:30,059 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 24 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-15 02:20:36,454 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.64 vs. limit=6.0 2024-08-15 02:20:58,848 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2961150.0, ans=0.1 2024-08-15 02:21:11,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2961250.0, ans=0.0 2024-08-15 02:21:12,803 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-15 02:21:13,963 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-15 02:21:15,219 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 14 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-15 02:21:17,828 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 6300, loss[loss=0.09594, beats_loss=0.01206, ecapa_loss=0.0001085, whisper_loss=0.0828, over 15765.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01088, ecapa_loss=0.0001523, whisper_loss=0.0893, over 3857622.31 frames. ], batch size: 60, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:21:19,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2961350.0, ans=10.0 2024-08-15 02:21:22,341 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.39 vs. limit=15.0 2024-08-15 02:21:44,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2961550.0, ans=0.0 2024-08-15 02:21:46,331 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.26 vs. limit=6.0 2024-08-15 02:21:50,295 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.57 vs. limit=15.0 2024-08-15 02:21:59,343 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2961650.0, ans=0.125 2024-08-15 02:22:04,211 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.810e+01 2.318e+01 2.564e+01 2.783e+01 4.377e+01, threshold=5.129e+01, percent-clipped=0.0 2024-08-15 02:22:09,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2961750.0, ans=0.125 2024-08-15 02:22:17,405 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2961750.0, ans=0.0 2024-08-15 02:22:20,772 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 26 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-15 02:22:23,228 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 6350, loss[loss=0.07727, beats_loss=0.008687, ecapa_loss=0.0001749, whisper_loss=0.06683, over 13473.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01073, ecapa_loss=0.000154, whisper_loss=0.08993, over 3868356.00 frames. ], batch size: 54, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:22:33,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2961850.0, ans=0.125 2024-08-15 02:22:36,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2961950.0, ans=0.0 2024-08-15 02:22:40,264 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.37 vs. limit=15.0 2024-08-15 02:22:42,681 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-15 02:22:45,623 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2961950.0, ans=0.0 2024-08-15 02:22:57,929 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.87 vs. limit=22.5 2024-08-15 02:23:09,525 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.58 vs. limit=15.0 2024-08-15 02:23:10,780 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.97 vs. limit=6.0 2024-08-15 02:23:20,357 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 13 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-15 02:23:23,323 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.30 vs. limit=15.0 2024-08-15 02:23:29,212 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 6400, loss[loss=0.09604, beats_loss=0.01095, ecapa_loss=0.000141, whisper_loss=0.08367, over 20968.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01073, ecapa_loss=0.0001545, whisper_loss=0.08995, over 3860484.83 frames. ], batch size: 84, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:23:30,490 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 13 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-15 02:23:36,836 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-15 02:23:38,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2962350.0, ans=0.1 2024-08-15 02:24:14,869 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.481e+01 2.409e+01 2.750e+01 3.071e+01 4.179e+02, threshold=5.499e+01, percent-clipped=4.0 2024-08-15 02:24:24,123 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-15 02:24:30,933 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-15 02:24:34,433 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 6450, loss[loss=0.08528, beats_loss=0.01365, ecapa_loss=0.0001673, whisper_loss=0.06996, over 17886.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0107, ecapa_loss=0.0001545, whisper_loss=0.09041, over 3888220.43 frames. ], batch size: 74, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:24:37,383 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2962850.0, ans=0.125 2024-08-15 02:24:57,172 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-15 02:25:06,920 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2963050.0, ans=0.125 2024-08-15 02:25:09,866 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2963050.0, ans=0.1 2024-08-15 02:25:28,017 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.36 vs. limit=12.0 2024-08-15 02:25:30,510 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 22 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-15 02:25:36,755 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-15 02:25:40,486 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 6500, loss[loss=0.1101, beats_loss=0.0106, ecapa_loss=0.0001601, whisper_loss=0.09786, over 22731.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01071, ecapa_loss=0.0001539, whisper_loss=0.09071, over 3889889.43 frames. ], batch size: 90, lr: 2.97e-03, grad_scale: 1.152921504606847e+18 2024-08-15 02:25:46,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2963350.0, ans=0.125 2024-08-15 02:25:52,227 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.66 vs. limit=15.0 2024-08-15 02:25:53,401 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.84 vs. limit=6.0 2024-08-15 02:25:55,906 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.13 vs. limit=15.0 2024-08-15 02:26:19,215 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 22 from LS+wenet, 13 from Vox, 51 fro AS 2024-08-15 02:26:22,029 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2963650.0, ans=0.1 2024-08-15 02:26:25,670 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 28 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-15 02:26:26,305 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.12 vs. limit=22.5 2024-08-15 02:26:26,997 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.262e+01 2.539e+01 2.761e+01 6.579e+01, threshold=5.077e+01, percent-clipped=1.0 2024-08-15 02:26:46,357 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 6550, loss[loss=0.09472, beats_loss=0.01231, ecapa_loss=0.0001312, whisper_loss=0.08109, over 15884.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01074, ecapa_loss=0.0001533, whisper_loss=0.09074, over 3920587.92 frames. ], batch size: 61, lr: 2.97e-03, grad_scale: 1.152921504606847e+18 2024-08-15 02:27:44,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2964250.0, ans=0.95 2024-08-15 02:27:49,661 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-15 02:27:50,826 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 6600, loss[loss=0.1187, beats_loss=0.01028, ecapa_loss=0.0001552, whisper_loss=0.1068, over 22476.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01073, ecapa_loss=0.0001535, whisper_loss=0.09109, over 3964978.72 frames. ], batch size: 89, lr: 2.97e-03, grad_scale: 1.152921504606847e+18 2024-08-15 02:27:53,557 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-15 02:27:57,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2964350.0, ans=0.125 2024-08-15 02:27:59,632 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 22 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-15 02:28:05,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2964450.0, ans=0.125 2024-08-15 02:28:16,476 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 27 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-15 02:28:17,922 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2964550.0, ans=0.125 2024-08-15 02:28:20,427 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2964550.0, ans=0.125 2024-08-15 02:28:27,205 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2964550.0, ans=0.0 2024-08-15 02:28:31,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2964650.0, ans=0.125 2024-08-15 02:28:35,630 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.905e+01 2.343e+01 2.654e+01 2.960e+01 4.414e+01, threshold=5.309e+01, percent-clipped=0.0 2024-08-15 02:28:50,821 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2964750.0, ans=0.125 2024-08-15 02:28:55,386 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 6650, loss[loss=0.1113, beats_loss=0.0104, ecapa_loss=0.0001407, whisper_loss=0.09948, over 21405.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01075, ecapa_loss=0.0001532, whisper_loss=0.09104, over 3995671.56 frames. ], batch size: 84, lr: 2.97e-03, grad_scale: 1.152921504606847e+18 2024-08-15 02:29:09,993 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 24 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-15 02:29:25,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2965050.0, ans=0.125 2024-08-15 02:29:26,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2965050.0, ans=0.125 2024-08-15 02:29:27,965 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.04 vs. limit=22.5 2024-08-15 02:29:37,089 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2965150.0, ans=0.125 2024-08-15 02:29:40,798 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-15 02:29:59,714 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 02:29:59,757 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 02:29:59,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2965250.0, ans=0.125 2024-08-15 02:30:01,852 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 6700, loss[loss=0.1003, beats_loss=0.01168, ecapa_loss=0.0001463, whisper_loss=0.08711, over 18939.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01073, ecapa_loss=0.0001537, whisper_loss=0.09098, over 3952987.37 frames. ], batch size: 74, lr: 2.97e-03, grad_scale: 1.152921504606847e+18 2024-08-15 02:30:15,573 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 23 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-15 02:30:18,825 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.28 vs. limit=15.0 2024-08-15 02:30:22,622 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 27 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-15 02:30:24,742 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.91 vs. limit=22.5 2024-08-15 02:30:51,822 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.875e+01 2.368e+01 2.669e+01 2.995e+01 9.040e+01, threshold=5.338e+01, percent-clipped=3.0 2024-08-15 02:30:52,681 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.17 vs. limit=15.0 2024-08-15 02:31:06,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2965750.0, ans=0.125 2024-08-15 02:31:14,444 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 6750, loss[loss=0.1065, beats_loss=0.009487, ecapa_loss=0.0001488, whisper_loss=0.09549, over 16745.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01075, ecapa_loss=0.000153, whisper_loss=0.0906, over 3961659.13 frames. ], batch size: 65, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:31:20,850 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 21 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-15 02:31:33,137 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2965950.0, ans=0.0 2024-08-15 02:31:44,858 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-15 02:31:47,273 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2966050.0, ans=0.125 2024-08-15 02:31:51,606 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 18 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-15 02:31:53,121 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2966050.0, ans=0.125 2024-08-15 02:31:53,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2966050.0, ans=0.09899494936611666 2024-08-15 02:31:57,052 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 27 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-15 02:31:59,746 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 38 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-15 02:32:03,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2966150.0, ans=0.0 2024-08-15 02:32:08,277 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.69 vs. limit=22.5 2024-08-15 02:32:10,380 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-15 02:32:29,997 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 6800, loss[loss=0.1036, beats_loss=0.01116, ecapa_loss=0.000152, whisper_loss=0.09093, over 18117.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01059, ecapa_loss=0.0001547, whisper_loss=0.09113, over 3934131.81 frames. ], batch size: 72, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:32:33,303 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 30 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-15 02:32:50,398 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2966450.0, ans=0.125 2024-08-15 02:33:03,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2966550.0, ans=0.0 2024-08-15 02:33:09,484 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 15 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-15 02:33:14,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2966650.0, ans=0.125 2024-08-15 02:33:25,172 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.273e+01 2.576e+01 2.834e+01 4.792e+01, threshold=5.152e+01, percent-clipped=0.0 2024-08-15 02:33:28,401 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2966650.0, ans=0.1 2024-08-15 02:33:30,211 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 23 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-15 02:33:37,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2966750.0, ans=0.05 2024-08-15 02:33:39,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2966750.0, ans=0.125 2024-08-15 02:33:43,181 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-15 02:33:45,831 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2966850.0, ans=0.125 2024-08-15 02:33:46,678 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 6850, loss[loss=0.07622, beats_loss=0.01388, ecapa_loss=0.0001319, whisper_loss=0.06102, over 21953.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01056, ecapa_loss=0.0001547, whisper_loss=0.09132, over 3919444.41 frames. ], batch size: 94, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:34:16,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2967050.0, ans=0.0 2024-08-15 02:34:19,285 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 18 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-15 02:34:40,252 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 24 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-15 02:34:46,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2967150.0, ans=0.1 2024-08-15 02:34:50,407 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.53 vs. limit=15.0 2024-08-15 02:34:53,112 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 9 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-15 02:35:05,692 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 6900, loss[loss=0.1119, beats_loss=0.01069, ecapa_loss=0.000149, whisper_loss=0.09975, over 22461.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01059, ecapa_loss=0.0001536, whisper_loss=0.09108, over 3899095.32 frames. ], batch size: 89, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:35:51,334 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-15 02:36:02,116 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.900e+01 2.306e+01 2.522e+01 2.757e+01 3.704e+01, threshold=5.043e+01, percent-clipped=0.0 2024-08-15 02:36:15,820 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.88 vs. limit=15.0 2024-08-15 02:36:20,205 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2967750.0, ans=0.125 2024-08-15 02:36:24,120 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 6950, loss[loss=0.09836, beats_loss=0.01381, ecapa_loss=0.00011, whisper_loss=0.08345, over 20948.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01061, ecapa_loss=0.0001525, whisper_loss=0.09052, over 3866843.90 frames. ], batch size: 81, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:36:27,958 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-15 02:36:33,930 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 15 from LS+wenet, 22 from Vox, 18 fro AS 2024-08-15 02:37:01,075 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-15 02:37:19,995 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.37 vs. limit=15.0 2024-08-15 02:37:31,470 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 26 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-15 02:37:34,415 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 22 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-15 02:37:40,298 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 7000, loss[loss=0.1163, beats_loss=0.00949, ecapa_loss=0.0001709, whisper_loss=0.1051, over 19987.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01061, ecapa_loss=0.000152, whisper_loss=0.09086, over 3889652.33 frames. ], batch size: 78, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:37:43,643 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-15 02:37:55,599 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-15 02:37:58,155 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 25 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-15 02:38:08,906 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 16 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-15 02:38:22,141 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 28 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-15 02:38:22,376 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2968550.0, ans=0.125 2024-08-15 02:38:24,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2968550.0, ans=0.0 2024-08-15 02:38:38,369 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.758e+01 2.236e+01 2.500e+01 2.764e+01 4.319e+01, threshold=5.000e+01, percent-clipped=0.0 2024-08-15 02:38:41,886 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2968650.0, ans=0.125 2024-08-15 02:39:00,340 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 7050, loss[loss=0.08135, beats_loss=0.01323, ecapa_loss=0.0001633, whisper_loss=0.06649, over 20749.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01066, ecapa_loss=0.0001527, whisper_loss=0.09053, over 3881122.32 frames. ], batch size: 88, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:39:14,405 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2968850.0, ans=0.1 2024-08-15 02:39:27,705 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.07 vs. limit=15.0 2024-08-15 02:39:41,173 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.55 vs. limit=15.0 2024-08-15 02:39:47,682 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2969150.0, ans=0.2 2024-08-15 02:40:20,736 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 7100, loss[loss=0.1047, beats_loss=0.01098, ecapa_loss=0.0001428, whisper_loss=0.09231, over 22316.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0106, ecapa_loss=0.0001532, whisper_loss=0.09151, over 3879373.74 frames. ], batch size: 90, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:40:30,188 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 22 from LS+wenet, 11 from Vox, 20 fro AS 2024-08-15 02:40:34,965 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 24 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-15 02:40:41,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2969450.0, ans=0.1 2024-08-15 02:41:06,633 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 19 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-15 02:41:09,141 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.96 vs. limit=10.0 2024-08-15 02:41:19,180 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.954e+01 2.324e+01 2.523e+01 2.719e+01 3.184e+02, threshold=5.045e+01, percent-clipped=4.0 2024-08-15 02:41:21,499 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2969650.0, ans=0.2 2024-08-15 02:41:42,027 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 7150, loss[loss=0.1045, beats_loss=0.01068, ecapa_loss=0.0001602, whisper_loss=0.09223, over 19764.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01066, ecapa_loss=0.0001526, whisper_loss=0.09091, over 3875470.96 frames. ], batch size: 78, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:41:44,652 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2969850.0, ans=0.1 2024-08-15 02:41:46,074 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2969850.0, ans=0.125 2024-08-15 02:42:13,449 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.73 vs. limit=15.0 2024-08-15 02:42:14,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2970050.0, ans=0.125 2024-08-15 02:42:43,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2970150.0, ans=0.05 2024-08-15 02:43:03,725 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 7200, loss[loss=0.1046, beats_loss=0.009861, ecapa_loss=0.0001759, whisper_loss=0.09301, over 22088.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01068, ecapa_loss=0.0001512, whisper_loss=0.09112, over 3900572.89 frames. ], batch size: 90, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:43:28,751 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.14 vs. limit=10.0 2024-08-15 02:43:57,660 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 23 from LS+wenet, 30 from Vox, 24 fro AS 2024-08-15 02:43:59,741 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=2970650.0, ans=0.025 2024-08-15 02:44:03,146 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.642e+01 2.341e+01 2.613e+01 2.912e+01 4.502e+01, threshold=5.226e+01, percent-clipped=0.0 2024-08-15 02:44:24,546 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 7250, loss[loss=0.1065, beats_loss=0.01264, ecapa_loss=0.0001058, whisper_loss=0.09283, over 14976.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01078, ecapa_loss=0.0001507, whisper_loss=0.09038, over 3922077.01 frames. ], batch size: 57, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:44:33,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2970850.0, ans=0.2 2024-08-15 02:44:35,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2970850.0, ans=0.125 2024-08-15 02:44:44,481 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-15 02:44:51,660 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-15 02:45:01,499 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2971050.0, ans=0.0 2024-08-15 02:45:05,855 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-15 02:45:30,403 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.79 vs. limit=12.0 2024-08-15 02:45:31,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2971250.0, ans=0.2 2024-08-15 02:45:47,118 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 7300, loss[loss=0.09763, beats_loss=0.01165, ecapa_loss=0.0001547, whisper_loss=0.08444, over 22825.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01071, ecapa_loss=0.0001515, whisper_loss=0.09032, over 3911071.23 frames. ], batch size: 92, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:45:56,539 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.67 vs. limit=6.0 2024-08-15 02:46:00,777 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2971350.0, ans=0.125 2024-08-15 02:46:05,757 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2971450.0, ans=0.125 2024-08-15 02:46:05,866 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 02:46:24,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2971550.0, ans=0.0 2024-08-15 02:46:29,530 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.85 vs. limit=15.0 2024-08-15 02:46:37,306 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-15 02:46:39,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2971650.0, ans=0.125 2024-08-15 02:46:41,553 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 43 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-15 02:46:46,679 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.778e+01 2.342e+01 2.606e+01 2.963e+01 2.884e+02, threshold=5.213e+01, percent-clipped=2.0 2024-08-15 02:47:00,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2971750.0, ans=0.0 2024-08-15 02:47:09,937 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 7350, loss[loss=0.1174, beats_loss=0.009354, ecapa_loss=0.0001585, whisper_loss=0.1065, over 17596.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01063, ecapa_loss=0.0001521, whisper_loss=0.09033, over 3887679.73 frames. ], batch size: 68, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:47:15,319 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 24 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-15 02:47:15,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2971850.0, ans=0.125 2024-08-15 02:48:07,360 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 22 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-15 02:48:32,593 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 7400, loss[loss=0.08795, beats_loss=0.008945, ecapa_loss=0.0001579, whisper_loss=0.07742, over 14715.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01064, ecapa_loss=0.0001515, whisper_loss=0.09057, over 3891966.00 frames. ], batch size: 59, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:48:39,519 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.27 vs. limit=15.0 2024-08-15 02:48:44,155 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2972350.0, ans=0.2 2024-08-15 02:48:47,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2972450.0, ans=0.1 2024-08-15 02:48:58,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2972450.0, ans=0.125 2024-08-15 02:49:03,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2972550.0, ans=0.0 2024-08-15 02:49:31,641 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.322e+01 2.605e+01 2.983e+01 4.527e+01, threshold=5.211e+01, percent-clipped=0.0 2024-08-15 02:49:38,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2972750.0, ans=0.2 2024-08-15 02:49:48,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2972750.0, ans=0.125 2024-08-15 02:49:48,762 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2972750.0, ans=0.125 2024-08-15 02:49:50,058 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 02:49:53,894 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 7450, loss[loss=0.1145, beats_loss=0.00997, ecapa_loss=0.0001553, whisper_loss=0.103, over 21456.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01062, ecapa_loss=0.0001519, whisper_loss=0.09132, over 3908959.08 frames. ], batch size: 87, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:49:58,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2972850.0, ans=0.125 2024-08-15 02:50:00,071 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2972850.0, ans=0.0 2024-08-15 02:50:04,438 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2972850.0, ans=0.125 2024-08-15 02:50:22,005 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 22 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-15 02:50:29,470 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-15 02:50:40,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2973050.0, ans=0.0 2024-08-15 02:50:50,856 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 24 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-15 02:50:51,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2973150.0, ans=0.1 2024-08-15 02:51:15,785 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 29 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-15 02:51:16,929 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 7500, loss[loss=0.1166, beats_loss=0.00987, ecapa_loss=0.0001747, whisper_loss=0.105, over 19661.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01064, ecapa_loss=0.0001518, whisper_loss=0.09092, over 3884049.69 frames. ], batch size: 78, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:51:17,411 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-15 02:51:22,891 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 24 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-15 02:51:37,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2973450.0, ans=0.125 2024-08-15 02:51:41,785 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-15 02:52:01,081 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.89 vs. limit=15.0 2024-08-15 02:52:15,812 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.930e+01 2.356e+01 2.622e+01 2.952e+01 4.347e+01, threshold=5.245e+01, percent-clipped=0.0 2024-08-15 02:52:27,469 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-15 02:52:27,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2973750.0, ans=0.125 2024-08-15 02:52:38,996 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 7550, loss[loss=0.1007, beats_loss=0.01259, ecapa_loss=0.0001166, whisper_loss=0.08691, over 20881.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01063, ecapa_loss=0.0001526, whisper_loss=0.09082, over 3854342.69 frames. ], batch size: 77, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:52:49,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2973850.0, ans=0.125 2024-08-15 02:52:49,399 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.09 vs. limit=10.0 2024-08-15 02:52:58,433 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2973950.0, ans=0.0 2024-08-15 02:53:07,184 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 21 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-15 02:53:16,667 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 20 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-15 02:53:46,401 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 17 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-15 02:53:55,681 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-15 02:53:57,355 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 16 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-15 02:53:58,679 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 7600, loss[loss=0.08729, beats_loss=0.00889, ecapa_loss=0.0002065, whisper_loss=0.07633, over 16826.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01064, ecapa_loss=0.0001543, whisper_loss=0.09017, over 3849691.51 frames. ], batch size: 71, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:54:09,461 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 34 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-15 02:54:26,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2974450.0, ans=0.0 2024-08-15 02:54:54,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2974650.0, ans=0.125 2024-08-15 02:54:55,782 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.830e+01 2.311e+01 2.587e+01 3.162e+01 4.205e+02, threshold=5.175e+01, percent-clipped=3.0 2024-08-15 02:54:56,607 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.80 vs. limit=6.0 2024-08-15 02:55:01,433 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 18 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-15 02:55:06,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2974750.0, ans=0.2 2024-08-15 02:55:06,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2974750.0, ans=0.0 2024-08-15 02:55:08,952 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2974750.0, ans=0.05 2024-08-15 02:55:15,583 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.03 vs. limit=15.0 2024-08-15 02:55:16,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2974850.0, ans=0.125 2024-08-15 02:55:17,920 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 7650, loss[loss=0.08607, beats_loss=0.009734, ecapa_loss=0.0001426, whisper_loss=0.07491, over 15158.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0106, ecapa_loss=0.0001539, whisper_loss=0.09012, over 3869356.06 frames. ], batch size: 60, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:55:22,864 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-15 02:55:23,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2974850.0, ans=0.125 2024-08-15 02:55:35,297 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-15 02:55:51,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2975050.0, ans=0.125 2024-08-15 02:56:02,519 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2975150.0, ans=0.09899494936611666 2024-08-15 02:56:32,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2975250.0, ans=0.1 2024-08-15 02:56:35,094 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 7700, loss[loss=0.09754, beats_loss=0.01137, ecapa_loss=0.0001296, whisper_loss=0.08487, over 14967.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01065, ecapa_loss=0.0001529, whisper_loss=0.08983, over 3870363.68 frames. ], batch size: 57, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:57:22,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2975650.0, ans=0.125 2024-08-15 02:57:31,902 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.790e+01 2.248e+01 2.489e+01 2.817e+01 2.674e+02, threshold=4.978e+01, percent-clipped=0.0 2024-08-15 02:57:42,786 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-15 02:57:46,361 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.90 vs. limit=12.0 2024-08-15 02:57:50,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2975750.0, ans=0.125 2024-08-15 02:57:52,851 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 7750, loss[loss=0.08971, beats_loss=0.01308, ecapa_loss=0.0001171, whisper_loss=0.07547, over 15669.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01066, ecapa_loss=0.0001536, whisper_loss=0.08984, over 3880722.08 frames. ], batch size: 61, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:57:54,688 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 24 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-15 02:58:18,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2975950.0, ans=0.0 2024-08-15 02:58:25,419 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 20 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-15 02:58:25,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2976050.0, ans=0.5 2024-08-15 02:58:25,720 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2976050.0, ans=0.0 2024-08-15 02:58:44,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2976150.0, ans=0.125 2024-08-15 02:59:05,960 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2976250.0, ans=0.0 2024-08-15 02:59:09,844 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 7800, loss[loss=0.09684, beats_loss=0.01432, ecapa_loss=9.7e-05, whisper_loss=0.08155, over 22264.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01072, ecapa_loss=0.0001522, whisper_loss=0.0903, over 3894238.97 frames. ], batch size: 84, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:59:12,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2976350.0, ans=0.0 2024-08-15 02:59:13,256 INFO [train_multi_KD3.py:844] (0/4) A total of 96 cuts. 25 from LS+wenet, 31 from Vox, 40 fro AS 2024-08-15 02:59:14,681 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 25 from LS+wenet, 16 from Vox, 15 fro AS 2024-08-15 02:59:22,467 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 27 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-15 02:59:25,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2976450.0, ans=0.07 2024-08-15 02:59:34,582 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 20 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-15 02:59:37,698 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2976450.0, ans=0.125 2024-08-15 02:59:45,806 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2976550.0, ans=0.125 2024-08-15 02:59:58,696 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-15 03:00:01,999 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 25 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-15 03:00:06,179 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.382e+01 2.617e+01 2.970e+01 1.321e+02, threshold=5.235e+01, percent-clipped=4.0 2024-08-15 03:00:29,544 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 21 from LS+wenet, 31 from Vox, 38 fro AS 2024-08-15 03:00:30,889 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 7850, loss[loss=0.08762, beats_loss=0.01166, ecapa_loss=0.0001588, whisper_loss=0.07437, over 21726.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01066, ecapa_loss=0.000153, whisper_loss=0.09072, over 3897929.74 frames. ], batch size: 90, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:00:40,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2976850.0, ans=0.125 2024-08-15 03:00:58,146 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.63 vs. limit=15.0 2024-08-15 03:01:02,769 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2977050.0, ans=0.125 2024-08-15 03:01:53,645 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 7900, loss[loss=0.1092, beats_loss=0.01174, ecapa_loss=0.0001264, whisper_loss=0.09618, over 22301.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01069, ecapa_loss=0.0001523, whisper_loss=0.09099, over 3930569.71 frames. ], batch size: 85, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:02:00,026 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2977350.0, ans=0.125 2024-08-15 03:02:02,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2977350.0, ans=0.125 2024-08-15 03:02:07,179 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2977350.0, ans=0.125 2024-08-15 03:02:09,723 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2977450.0, ans=0.0 2024-08-15 03:02:27,151 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2977550.0, ans=0.1 2024-08-15 03:02:53,971 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.836e+01 2.325e+01 2.726e+01 3.089e+01 1.885e+02, threshold=5.452e+01, percent-clipped=1.0 2024-08-15 03:03:15,697 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 7950, loss[loss=0.09423, beats_loss=0.01111, ecapa_loss=0.0001428, whisper_loss=0.08169, over 23010.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0107, ecapa_loss=0.000152, whisper_loss=0.09068, over 3904385.14 frames. ], batch size: 88, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:03:16,893 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.97 vs. limit=22.5 2024-08-15 03:03:17,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2977850.0, ans=0.125 2024-08-15 03:03:29,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2977850.0, ans=0.0 2024-08-15 03:03:32,126 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2977950.0, ans=0.0 2024-08-15 03:03:42,397 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.79 vs. limit=15.0 2024-08-15 03:03:45,196 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 25 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-15 03:04:03,592 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.82 vs. limit=15.0 2024-08-15 03:04:04,343 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.579e-03 2024-08-15 03:04:10,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2978150.0, ans=0.05 2024-08-15 03:04:37,485 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 8000, loss[loss=0.1055, beats_loss=0.01253, ecapa_loss=0.000128, whisper_loss=0.0917, over 23062.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0108, ecapa_loss=0.0001519, whisper_loss=0.09033, over 3893848.12 frames. ], batch size: 92, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:04:54,603 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.59 vs. limit=6.0 2024-08-15 03:04:55,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2978450.0, ans=0.125 2024-08-15 03:05:19,108 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-15 03:05:35,832 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.399e+01 2.701e+01 3.155e+01 4.080e+02, threshold=5.401e+01, percent-clipped=3.0 2024-08-15 03:05:45,871 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.74 vs. limit=10.0 2024-08-15 03:05:51,976 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-15 03:05:52,168 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2978750.0, ans=0.125 2024-08-15 03:05:55,579 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.52 vs. limit=15.0 2024-08-15 03:05:55,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=2978750.0, ans=15.0 2024-08-15 03:05:57,645 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 8050, loss[loss=0.1102, beats_loss=0.00953, ecapa_loss=0.0001665, whisper_loss=0.099, over 23454.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01075, ecapa_loss=0.0001519, whisper_loss=0.09043, over 3903410.26 frames. ], batch size: 91, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:06:13,166 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 22 from LS+wenet, 33 from Vox, 37 fro AS 2024-08-15 03:06:16,201 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-15 03:06:17,844 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 19 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-15 03:06:38,242 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 23 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-15 03:06:40,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff3.min_abs, batch_count=2979050.0, ans=0.2 2024-08-15 03:06:45,918 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 35 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-15 03:06:53,235 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2979150.0, ans=0.0 2024-08-15 03:07:01,882 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 30 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-15 03:07:04,171 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.19 vs. limit=22.5 2024-08-15 03:07:17,150 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 8100, loss[loss=0.1065, beats_loss=0.008944, ecapa_loss=0.0001768, whisper_loss=0.09574, over 18011.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01071, ecapa_loss=0.0001524, whisper_loss=0.09048, over 3884604.16 frames. ], batch size: 71, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:07:20,760 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-15 03:07:30,799 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 18 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-15 03:07:35,228 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 15 from LS+wenet, 11 from Vox, 35 fro AS 2024-08-15 03:07:36,949 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 29 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-15 03:08:08,780 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 21 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-15 03:08:16,164 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.692e+01 2.318e+01 2.516e+01 2.878e+01 5.938e+01, threshold=5.033e+01, percent-clipped=1.0 2024-08-15 03:08:24,683 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.91 vs. limit=6.0 2024-08-15 03:08:25,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2979750.0, ans=0.0 2024-08-15 03:08:34,400 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2979750.0, ans=0.1 2024-08-15 03:08:38,455 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 8150, loss[loss=0.1116, beats_loss=0.01028, ecapa_loss=0.000131, whisper_loss=0.1, over 22629.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01067, ecapa_loss=0.000151, whisper_loss=0.09053, over 3871298.85 frames. ], batch size: 89, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:08:57,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2979950.0, ans=0.2 2024-08-15 03:09:07,757 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.40 vs. limit=15.0 2024-08-15 03:09:12,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2980050.0, ans=0.125 2024-08-15 03:09:21,746 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 15 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-15 03:09:23,305 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 19 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-15 03:09:27,890 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 27 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-15 03:09:44,193 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 21 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-15 03:10:00,809 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 8200, loss[loss=0.09499, beats_loss=0.01069, ecapa_loss=0.0001613, whisper_loss=0.08268, over 17669.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01064, ecapa_loss=0.0001511, whisper_loss=0.09094, over 3867182.76 frames. ], batch size: 71, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:10:00,975 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-15 03:10:10,490 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2980350.0, ans=0.125 2024-08-15 03:10:19,114 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 03:10:49,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2980650.0, ans=0.125 2024-08-15 03:10:57,940 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.43 vs. limit=6.0 2024-08-15 03:11:00,726 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.326e+01 2.553e+01 2.974e+01 4.367e+01, threshold=5.105e+01, percent-clipped=0.0 2024-08-15 03:11:12,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2980750.0, ans=0.125 2024-08-15 03:11:14,316 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 22 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-15 03:11:15,681 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 21 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-15 03:11:23,109 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 8250, loss[loss=0.09566, beats_loss=0.01075, ecapa_loss=0.000155, whisper_loss=0.08336, over 16053.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01067, ecapa_loss=0.0001514, whisper_loss=0.08982, over 3854370.86 frames. ], batch size: 64, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:11:26,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2980850.0, ans=0.125 2024-08-15 03:11:50,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2980950.0, ans=0.0 2024-08-15 03:11:59,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2981050.0, ans=0.125 2024-08-15 03:12:11,226 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 25 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-15 03:12:20,974 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 22 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-15 03:12:32,704 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2981250.0, ans=0.125 2024-08-15 03:12:34,294 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.01 vs. limit=15.0 2024-08-15 03:12:47,527 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 8300, loss[loss=0.1088, beats_loss=0.009048, ecapa_loss=0.0001367, whisper_loss=0.09834, over 16280.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01059, ecapa_loss=0.0001508, whisper_loss=0.09018, over 3838899.40 frames. ], batch size: 61, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:12:56,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2981350.0, ans=0.125 2024-08-15 03:13:15,670 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=4.257e-02 2024-08-15 03:13:20,678 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 21 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-15 03:13:21,118 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.91 vs. limit=15.0 2024-08-15 03:13:41,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2981650.0, ans=0.0 2024-08-15 03:13:47,046 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.03 vs. limit=15.0 2024-08-15 03:13:47,552 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.954e+01 2.355e+01 2.573e+01 2.835e+01 6.620e+01, threshold=5.146e+01, percent-clipped=1.0 2024-08-15 03:13:47,806 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-15 03:13:52,044 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.53 vs. limit=10.0 2024-08-15 03:13:58,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2981750.0, ans=0.125 2024-08-15 03:14:11,157 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 8350, loss[loss=0.1307, beats_loss=0.0102, ecapa_loss=0.0001406, whisper_loss=0.1191, over 17654.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01058, ecapa_loss=0.0001526, whisper_loss=0.09003, over 3865005.99 frames. ], batch size: 68, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:14:14,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2981850.0, ans=0.0 2024-08-15 03:14:19,807 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 20 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-15 03:14:20,816 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.15 vs. limit=6.0 2024-08-15 03:14:22,573 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.77 vs. limit=15.0 2024-08-15 03:14:29,224 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.07 vs. limit=15.0 2024-08-15 03:14:29,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2981950.0, ans=0.125 2024-08-15 03:15:21,764 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2982250.0, ans=0.2 2024-08-15 03:15:31,921 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 26 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-15 03:15:33,735 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.97 vs. limit=15.0 2024-08-15 03:15:34,331 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 8400, loss[loss=0.08174, beats_loss=0.0112, ecapa_loss=0.0001512, whisper_loss=0.06902, over 22325.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01058, ecapa_loss=0.0001525, whisper_loss=0.09047, over 3894463.31 frames. ], batch size: 91, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:15:37,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2982350.0, ans=0.0 2024-08-15 03:15:52,835 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 27 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-15 03:15:59,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2982450.0, ans=0.125 2024-08-15 03:16:04,430 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2982450.0, ans=0.0 2024-08-15 03:16:11,249 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-15 03:16:15,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=2982550.0, ans=15.0 2024-08-15 03:16:20,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2982550.0, ans=0.1 2024-08-15 03:16:35,462 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.05 vs. limit=15.0 2024-08-15 03:16:36,523 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.895e+01 2.324e+01 2.482e+01 2.790e+01 5.297e+01, threshold=4.963e+01, percent-clipped=1.0 2024-08-15 03:17:02,461 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 8450, loss[loss=0.1027, beats_loss=0.009612, ecapa_loss=0.0001682, whisper_loss=0.09138, over 21024.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01056, ecapa_loss=0.0001535, whisper_loss=0.09112, over 3904893.08 frames. ], batch size: 86, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:17:04,401 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2982850.0, ans=0.125 2024-08-15 03:17:06,568 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.32 vs. limit=15.0 2024-08-15 03:17:17,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2982950.0, ans=0.0 2024-08-15 03:17:22,888 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 29 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-15 03:17:24,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2982950.0, ans=0.125 2024-08-15 03:17:51,335 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 25 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-15 03:18:18,393 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2024-08-15 03:18:21,663 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.61 vs. limit=15.0 2024-08-15 03:18:23,508 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 8500, loss[loss=0.1069, beats_loss=0.01214, ecapa_loss=0.0001169, whisper_loss=0.09362, over 23631.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01054, ecapa_loss=0.0001531, whisper_loss=0.09148, over 3924282.41 frames. ], batch size: 92, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:18:30,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2983350.0, ans=0.0 2024-08-15 03:18:48,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2983450.0, ans=0.1 2024-08-15 03:18:49,988 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-15 03:19:01,177 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 29 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-15 03:19:01,497 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2983550.0, ans=0.125 2024-08-15 03:19:21,477 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.872e+01 2.305e+01 2.558e+01 2.940e+01 1.198e+02, threshold=5.115e+01, percent-clipped=1.0 2024-08-15 03:19:42,704 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2983750.0, ans=0.0 2024-08-15 03:19:46,075 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 8550, loss[loss=0.1318, beats_loss=0.009935, ecapa_loss=0.0001075, whisper_loss=0.1208, over 17818.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01055, ecapa_loss=0.0001529, whisper_loss=0.09163, over 3922874.31 frames. ], batch size: 64, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:20:12,526 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-15 03:20:18,245 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 26 from LS+wenet, 20 from Vox, 16 fro AS 2024-08-15 03:20:22,677 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2984050.0, ans=0.125 2024-08-15 03:20:35,346 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 18 from LS+wenet, 27 from Vox, 23 fro AS 2024-08-15 03:20:59,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2984250.0, ans=0.0 2024-08-15 03:21:07,718 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 8600, loss[loss=0.1122, beats_loss=0.00997, ecapa_loss=0.0001387, whisper_loss=0.1008, over 14841.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01058, ecapa_loss=0.0001528, whisper_loss=0.09171, over 3888916.92 frames. ], batch size: 56, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:21:25,103 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 25 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-15 03:21:33,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2984450.0, ans=0.125 2024-08-15 03:21:53,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2984550.0, ans=0.125 2024-08-15 03:22:05,399 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.06 vs. limit=10.0 2024-08-15 03:22:06,154 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.966e+01 2.403e+01 2.689e+01 2.856e+01 3.948e+01, threshold=5.378e+01, percent-clipped=0.0 2024-08-15 03:22:17,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2984750.0, ans=0.1 2024-08-15 03:22:29,093 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 8650, loss[loss=0.114, beats_loss=0.009765, ecapa_loss=0.0001678, whisper_loss=0.1026, over 18378.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01068, ecapa_loss=0.0001533, whisper_loss=0.09085, over 3895012.80 frames. ], batch size: 73, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:22:29,487 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 19 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-15 03:22:39,091 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 24 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-15 03:22:55,990 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 16 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-15 03:22:59,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2984950.0, ans=0.125 2024-08-15 03:23:01,115 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 15 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-15 03:23:11,413 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 15 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-15 03:23:18,597 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.24 vs. limit=10.0 2024-08-15 03:23:24,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2985150.0, ans=0.125 2024-08-15 03:23:26,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2985150.0, ans=0.0 2024-08-15 03:23:27,186 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=10.76 vs. limit=12.0 2024-08-15 03:23:31,291 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 16 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-15 03:23:43,461 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2985250.0, ans=0.125 2024-08-15 03:23:47,650 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 32 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-15 03:23:55,810 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 8700, loss[loss=0.1259, beats_loss=0.008358, ecapa_loss=0.0001726, whisper_loss=0.1158, over 20025.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01061, ecapa_loss=0.0001544, whisper_loss=0.09073, over 3856875.40 frames. ], batch size: 80, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:24:00,236 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-15 03:24:09,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2985350.0, ans=0.0 2024-08-15 03:24:09,247 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2985350.0, ans=0.2 2024-08-15 03:24:14,579 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.93 vs. limit=15.0 2024-08-15 03:25:05,504 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.433e+01 2.657e+01 2.885e+01 1.161e+02, threshold=5.314e+01, percent-clipped=2.0 2024-08-15 03:25:24,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2985750.0, ans=0.1 2024-08-15 03:25:30,699 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 8750, loss[loss=0.09256, beats_loss=0.01178, ecapa_loss=0.0001388, whisper_loss=0.07939, over 22400.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01056, ecapa_loss=0.0001549, whisper_loss=0.09088, over 3846621.43 frames. ], batch size: 89, lr: 2.96e-03, grad_scale: 1.152921504606847e+18 2024-08-15 03:25:43,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=2985850.0, ans=22.5 2024-08-15 03:25:50,593 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 28 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-15 03:26:31,444 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2986150.0, ans=0.125 2024-08-15 03:26:33,682 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.19 vs. limit=15.0 2024-08-15 03:26:34,894 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2986150.0, ans=0.125 2024-08-15 03:26:45,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2986250.0, ans=0.1 2024-08-15 03:26:51,165 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 34 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-15 03:26:52,487 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 15 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-15 03:27:02,113 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 8800, loss[loss=0.103, beats_loss=0.01183, ecapa_loss=0.0001519, whisper_loss=0.08962, over 22817.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01057, ecapa_loss=0.0001549, whisper_loss=0.09068, over 3832042.40 frames. ], batch size: 90, lr: 2.96e-03, grad_scale: 1.152921504606847e+18 2024-08-15 03:27:04,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2986350.0, ans=0.0 2024-08-15 03:27:30,738 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.58 vs. limit=6.0 2024-08-15 03:28:05,500 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.747e+01 2.266e+01 2.513e+01 2.884e+01 4.202e+01, threshold=5.025e+01, percent-clipped=0.0 2024-08-15 03:28:21,169 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-15 03:28:28,955 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 8850, loss[loss=0.1049, beats_loss=0.008601, ecapa_loss=0.0001568, whisper_loss=0.09472, over 14661.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01057, ecapa_loss=0.000155, whisper_loss=0.09052, over 3818613.11 frames. ], batch size: 54, lr: 2.96e-03, grad_scale: 1.152921504606847e+18 2024-08-15 03:28:36,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2986850.0, ans=0.125 2024-08-15 03:28:49,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2986950.0, ans=0.125 2024-08-15 03:29:16,488 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2987050.0, ans=0.0 2024-08-15 03:29:56,664 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 8900, loss[loss=0.08128, beats_loss=0.01289, ecapa_loss=0.0001454, whisper_loss=0.06694, over 18434.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01064, ecapa_loss=0.0001547, whisper_loss=0.09024, over 3842246.13 frames. ], batch size: 73, lr: 2.96e-03, grad_scale: 1.152921504606847e+18 2024-08-15 03:30:05,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2987350.0, ans=0.125 2024-08-15 03:30:07,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2987350.0, ans=0.0 2024-08-15 03:30:15,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2987450.0, ans=0.0 2024-08-15 03:31:01,337 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.851e+01 2.296e+01 2.671e+01 2.935e+01 5.477e+01, threshold=5.343e+01, percent-clipped=1.0 2024-08-15 03:31:08,189 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 29 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-15 03:31:18,854 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-15 03:31:27,909 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 8950, loss[loss=0.1051, beats_loss=0.01073, ecapa_loss=0.0001687, whisper_loss=0.09268, over 22116.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01068, ecapa_loss=0.000155, whisper_loss=0.08982, over 3814722.25 frames. ], batch size: 92, lr: 2.96e-03, grad_scale: 1.152921504606847e+18 2024-08-15 03:31:31,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=2987850.0, ans=0.025 2024-08-15 03:31:35,490 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 24 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-15 03:31:35,724 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2987850.0, ans=0.0 2024-08-15 03:31:58,400 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2987950.0, ans=0.0 2024-08-15 03:32:00,158 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 20 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-15 03:32:24,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2988150.0, ans=0.125 2024-08-15 03:32:33,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2988150.0, ans=0.0 2024-08-15 03:32:49,913 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 31 from Vox, 32 fro AS 2024-08-15 03:32:59,176 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 9000, loss[loss=0.09745, beats_loss=0.01082, ecapa_loss=0.0001492, whisper_loss=0.08513, over 18028.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01059, ecapa_loss=0.0001549, whisper_loss=0.09052, over 3823607.34 frames. ], batch size: 75, lr: 2.96e-03, grad_scale: 1.152921504606847e+18 2024-08-15 03:32:59,179 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-15 03:33:42,745 INFO [train_multi_KD3.py:1149] (0/4) Epoch 21, validation on ASR_libri: loss=0.2525, beats_loss=0, ecapa_loss=0.0005419, whisper_loss=0.2471, over 922467.00 frames. 2024-08-15 03:34:03,175 INFO [train_multi_KD3.py:1149] (0/4) Epoch 21, validation on SV_voxceleb1: loss=0.004236, beats_loss=0, ecapa_loss=0.0004236, whisper_loss=0, over 939242.00 frames. 2024-08-15 03:34:57,315 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.1170, 3.1282, 3.2687, 3.0202], device='cuda:0') 2024-08-15 03:35:55,386 INFO [train_multi_KD3.py:1149] (0/4) Epoch 21, validation on AT_audioset: loss=0.02341, beats_loss=0.02341, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 03:35:55,391 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32601MB 2024-08-15 03:35:55,624 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-15 03:35:56,044 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0 2024-08-15 03:35:57,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2988350.0, ans=0.0 2024-08-15 03:36:35,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2988550.0, ans=0.125 2024-08-15 03:36:43,217 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 21 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-15 03:36:48,201 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2988650.0, ans=0.2 2024-08-15 03:36:53,229 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 26 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-15 03:36:54,593 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 17 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-15 03:36:54,886 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2988650.0, ans=0.07 2024-08-15 03:37:00,298 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.851e+01 2.331e+01 2.598e+01 2.772e+01 4.416e+01, threshold=5.195e+01, percent-clipped=0.0 2024-08-15 03:37:22,809 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 9050, loss[loss=0.0986, beats_loss=0.01076, ecapa_loss=0.0001531, whisper_loss=0.08631, over 22426.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0105, ecapa_loss=0.0001542, whisper_loss=0.09157, over 3814655.74 frames. ], batch size: 91, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:37:33,611 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=2988850.0, ans=15.0 2024-08-15 03:38:03,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2989050.0, ans=0.2 2024-08-15 03:38:10,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2989050.0, ans=0.2 2024-08-15 03:38:26,695 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.52 vs. limit=15.0 2024-08-15 03:38:30,060 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2989150.0, ans=0.2 2024-08-15 03:38:33,435 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2989150.0, ans=0.1 2024-08-15 03:38:37,250 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 18 from LS+wenet, 8 from Vox, 34 fro AS 2024-08-15 03:38:43,201 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 28 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-15 03:38:56,710 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 9100, loss[loss=0.1175, beats_loss=0.008767, ecapa_loss=0.0001623, whisper_loss=0.1072, over 23142.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01045, ecapa_loss=0.0001544, whisper_loss=0.09197, over 3852468.88 frames. ], batch size: 93, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:38:59,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2989350.0, ans=0.125 2024-08-15 03:39:01,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2989350.0, ans=0.95 2024-08-15 03:39:01,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2989350.0, ans=0.0 2024-08-15 03:39:04,106 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 11 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-15 03:39:23,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2989450.0, ans=0.0 2024-08-15 03:39:33,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2989450.0, ans=0.125 2024-08-15 03:39:35,390 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.13 vs. limit=22.5 2024-08-15 03:40:10,702 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.764e+01 2.390e+01 2.731e+01 3.078e+01 3.225e+02, threshold=5.461e+01, percent-clipped=2.0 2024-08-15 03:40:20,037 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 23 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-15 03:40:35,401 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 9150, loss[loss=0.1227, beats_loss=0.00837, ecapa_loss=0.0002054, whisper_loss=0.1123, over 22137.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01046, ecapa_loss=0.0001534, whisper_loss=0.09182, over 3862240.47 frames. ], batch size: 92, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:40:38,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2989850.0, ans=0.2 2024-08-15 03:40:45,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2989850.0, ans=0.125 2024-08-15 03:40:45,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2989850.0, ans=0.5 2024-08-15 03:41:00,973 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.01 vs. limit=15.0 2024-08-15 03:41:30,680 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 25 from LS+wenet, 10 from Vox, 34 fro AS 2024-08-15 03:41:51,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=2990250.0, ans=0.5 2024-08-15 03:42:04,730 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 9200, loss[loss=0.08763, beats_loss=0.01122, ecapa_loss=0.0001399, whisper_loss=0.07501, over 14151.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01049, ecapa_loss=0.0001531, whisper_loss=0.09194, over 3874954.87 frames. ], batch size: 56, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:42:13,385 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 27 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-15 03:42:13,569 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2990350.0, ans=0.125 2024-08-15 03:42:34,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2990450.0, ans=0.0 2024-08-15 03:42:36,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2990450.0, ans=0.0 2024-08-15 03:42:39,388 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 19 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-15 03:42:46,149 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-15 03:42:48,951 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.95 vs. limit=15.0 2024-08-15 03:43:12,179 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.340e+01 2.560e+01 2.896e+01 2.197e+02, threshold=5.119e+01, percent-clipped=4.0 2024-08-15 03:43:18,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2990750.0, ans=0.125 2024-08-15 03:43:23,724 INFO [train_multi_KD3.py:844] (0/4) A total of 96 cuts. 34 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-15 03:43:35,939 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 9250, loss[loss=0.1217, beats_loss=0.009523, ecapa_loss=0.0001518, whisper_loss=0.1106, over 17106.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01046, ecapa_loss=0.000154, whisper_loss=0.09211, over 3874875.03 frames. ], batch size: 65, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:43:48,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2990850.0, ans=0.125 2024-08-15 03:44:10,365 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-15 03:44:21,211 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2991050.0, ans=0.1 2024-08-15 03:44:47,647 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-15 03:44:49,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2991150.0, ans=0.125 2024-08-15 03:44:51,156 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 27 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-15 03:44:58,704 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 23 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-15 03:44:58,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2991250.0, ans=0.125 2024-08-15 03:45:08,502 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 9300, loss[loss=0.0881, beats_loss=0.01118, ecapa_loss=0.0001357, whisper_loss=0.07556, over 21752.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01057, ecapa_loss=0.0001524, whisper_loss=0.0916, over 3890204.67 frames. ], batch size: 88, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:45:25,591 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2991450.0, ans=0.2 2024-08-15 03:45:42,530 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2991450.0, ans=0.0 2024-08-15 03:45:43,613 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-15 03:46:10,503 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2991650.0, ans=0.125 2024-08-15 03:46:18,974 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.028e+01 2.307e+01 2.589e+01 2.834e+01 3.793e+01, threshold=5.178e+01, percent-clipped=0.0 2024-08-15 03:46:40,773 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.07 vs. limit=22.5 2024-08-15 03:46:42,678 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 9350, loss[loss=0.07207, beats_loss=0.01173, ecapa_loss=0.0001478, whisper_loss=0.05887, over 22047.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01053, ecapa_loss=0.0001531, whisper_loss=0.09193, over 3923114.77 frames. ], batch size: 94, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:46:44,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2991850.0, ans=0.125 2024-08-15 03:46:46,222 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 19 from LS+wenet, 25 from Vox, 22 fro AS 2024-08-15 03:46:48,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2991850.0, ans=0.0 2024-08-15 03:46:55,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2991850.0, ans=0.0 2024-08-15 03:47:08,197 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.19 vs. limit=12.0 2024-08-15 03:47:22,994 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.04 vs. limit=15.0 2024-08-15 03:47:33,255 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-15 03:48:08,540 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 9400, loss[loss=0.1034, beats_loss=0.01164, ecapa_loss=0.0001556, whisper_loss=0.09025, over 15588.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01058, ecapa_loss=0.0001527, whisper_loss=0.09147, over 3922594.29 frames. ], batch size: 64, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:48:12,607 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2992350.0, ans=0.125 2024-08-15 03:48:25,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2992450.0, ans=0.0 2024-08-15 03:48:30,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2992450.0, ans=0.2 2024-08-15 03:48:34,647 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.58 vs. limit=12.0 2024-08-15 03:49:03,842 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.264e+05 2024-08-15 03:49:11,275 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.839e+01 2.345e+01 2.543e+01 2.847e+01 7.002e+01, threshold=5.086e+01, percent-clipped=1.0 2024-08-15 03:49:16,573 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 23 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-15 03:49:21,511 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 32 from Vox, 35 fro AS 2024-08-15 03:49:21,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2992750.0, ans=0.125 2024-08-15 03:49:24,817 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-15 03:49:29,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2992750.0, ans=0.125 2024-08-15 03:49:32,141 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 9450, loss[loss=0.09471, beats_loss=0.01188, ecapa_loss=0.0001334, whisper_loss=0.08149, over 22528.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01062, ecapa_loss=0.0001524, whisper_loss=0.09107, over 3912001.46 frames. ], batch size: 92, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:49:39,689 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.13 vs. limit=15.0 2024-08-15 03:49:47,335 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 32 from Vox, 31 fro AS 2024-08-15 03:49:47,523 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2992950.0, ans=0.125 2024-08-15 03:49:49,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=2992950.0, ans=0.025 2024-08-15 03:50:02,931 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-15 03:50:16,064 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 21 from LS+wenet, 29 from Vox, 40 fro AS 2024-08-15 03:50:19,001 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2993050.0, ans=0.125 2024-08-15 03:50:21,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2993050.0, ans=0.1 2024-08-15 03:50:34,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2993150.0, ans=0.125 2024-08-15 03:50:40,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2993250.0, ans=0.0 2024-08-15 03:50:51,270 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2993250.0, ans=0.1 2024-08-15 03:50:58,892 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 9500, loss[loss=0.1098, beats_loss=0.009151, ecapa_loss=0.0001567, whisper_loss=0.09912, over 15003.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01058, ecapa_loss=0.0001521, whisper_loss=0.09137, over 3926656.88 frames. ], batch size: 57, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:50:59,270 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2993350.0, ans=0.0 2024-08-15 03:51:00,901 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.59 vs. limit=12.0 2024-08-15 03:51:25,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2993450.0, ans=0.125 2024-08-15 03:51:26,126 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.72 vs. limit=22.5 2024-08-15 03:51:30,813 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2993450.0, ans=0.07 2024-08-15 03:51:32,647 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 26 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-15 03:51:38,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=2993550.0, ans=6.0 2024-08-15 03:51:45,430 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2993550.0, ans=0.0 2024-08-15 03:51:47,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2993550.0, ans=0.2 2024-08-15 03:52:05,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2993650.0, ans=0.0 2024-08-15 03:52:05,983 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.771e+01 2.283e+01 2.545e+01 2.911e+01 4.109e+01, threshold=5.090e+01, percent-clipped=0.0 2024-08-15 03:52:06,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2993650.0, ans=0.125 2024-08-15 03:52:08,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2993650.0, ans=0.2 2024-08-15 03:52:15,606 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 25 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-15 03:52:29,555 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 9550, loss[loss=0.07408, beats_loss=0.01146, ecapa_loss=0.0001535, whisper_loss=0.06109, over 13624.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01065, ecapa_loss=0.0001527, whisper_loss=0.09069, over 3905732.06 frames. ], batch size: 57, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:52:33,932 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2993850.0, ans=0.0 2024-08-15 03:52:41,913 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=14.53 vs. limit=15.0 2024-08-15 03:53:05,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2994050.0, ans=0.2 2024-08-15 03:53:13,392 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.56 vs. limit=22.5 2024-08-15 03:53:16,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2994050.0, ans=0.0 2024-08-15 03:53:21,589 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.03 vs. limit=22.5 2024-08-15 03:53:23,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=2994050.0, ans=10.0 2024-08-15 03:53:35,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2994150.0, ans=0.1 2024-08-15 03:53:46,891 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2994250.0, ans=0.05 2024-08-15 03:54:00,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=2994350.0, ans=10.0 2024-08-15 03:54:00,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2994350.0, ans=0.09899494936611666 2024-08-15 03:54:00,908 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 9600, loss[loss=0.0968, beats_loss=0.01041, ecapa_loss=0.0001925, whisper_loss=0.08446, over 14258.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01064, ecapa_loss=0.0001525, whisper_loss=0.09049, over 3890012.90 frames. ], batch size: 59, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:54:06,463 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 19 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-15 03:54:17,470 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.96 vs. limit=6.0 2024-08-15 03:54:24,594 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 14 from Vox, 47 fro AS 2024-08-15 03:54:55,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2994650.0, ans=0.04949747468305833 2024-08-15 03:54:58,375 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.89 vs. limit=10.0 2024-08-15 03:55:00,536 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.29 vs. limit=15.0 2024-08-15 03:55:03,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2994650.0, ans=0.1 2024-08-15 03:55:13,059 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.774e+01 2.340e+01 2.536e+01 2.906e+01 4.631e+01, threshold=5.072e+01, percent-clipped=0.0 2024-08-15 03:55:21,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2994750.0, ans=0.125 2024-08-15 03:55:33,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=2994750.0, ans=0.2 2024-08-15 03:55:43,070 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 9650, loss[loss=0.1125, beats_loss=0.01034, ecapa_loss=0.0001536, whisper_loss=0.1006, over 22398.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01061, ecapa_loss=0.0001523, whisper_loss=0.09049, over 3852229.38 frames. ], batch size: 91, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:55:52,023 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 20 from LS+wenet, 20 from Vox, 15 fro AS 2024-08-15 03:55:52,233 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2994850.0, ans=0.07 2024-08-15 03:55:58,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2994850.0, ans=0.125 2024-08-15 03:56:10,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2994950.0, ans=0.125 2024-08-15 03:56:12,108 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-15 03:56:18,094 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.96 vs. limit=6.0 2024-08-15 03:56:31,433 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2995050.0, ans=0.0 2024-08-15 03:56:37,393 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.58 vs. limit=15.0 2024-08-15 03:57:14,135 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 20 from LS+wenet, 35 from Vox, 36 fro AS 2024-08-15 03:57:21,932 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 21 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-15 03:57:29,321 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 9700, loss[loss=0.07687, beats_loss=0.01307, ecapa_loss=0.0001685, whisper_loss=0.06211, over 21401.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01054, ecapa_loss=0.0001534, whisper_loss=0.09075, over 3860423.44 frames. ], batch size: 94, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:58:18,462 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-15 03:58:18,749 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=9.573e-02 2024-08-15 03:58:24,908 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 24 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-15 03:59:10,202 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.634e+01 2.369e+01 2.652e+01 2.894e+01 3.989e+01, threshold=5.305e+01, percent-clipped=0.0 2024-08-15 03:59:12,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2995650.0, ans=0.2 2024-08-15 03:59:12,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2995650.0, ans=0.2 2024-08-15 03:59:45,166 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 24 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-15 03:59:46,142 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 9750, loss[loss=0.09763, beats_loss=0.01028, ecapa_loss=0.0001311, whisper_loss=0.08603, over 21413.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01051, ecapa_loss=0.0001524, whisper_loss=0.09098, over 3878725.68 frames. ], batch size: 84, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:00:12,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2995950.0, ans=0.0 2024-08-15 04:00:27,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2995950.0, ans=0.1 2024-08-15 04:00:33,902 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.69 vs. limit=15.0 2024-08-15 04:00:36,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2996050.0, ans=0.09899494936611666 2024-08-15 04:01:05,294 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-15 04:01:08,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2996150.0, ans=0.125 2024-08-15 04:01:42,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2996250.0, ans=0.0 2024-08-15 04:01:45,608 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-15 04:01:53,975 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 9800, loss[loss=0.117, beats_loss=0.01038, ecapa_loss=0.0001671, whisper_loss=0.105, over 22197.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01063, ecapa_loss=0.0001521, whisper_loss=0.0901, over 3884670.20 frames. ], batch size: 89, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:01:58,849 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 15 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-15 04:02:15,843 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.01 vs. limit=10.0 2024-08-15 04:02:18,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=2996350.0, ans=15.0 2024-08-15 04:02:33,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2996450.0, ans=0.0 2024-08-15 04:02:47,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2996550.0, ans=0.0 2024-08-15 04:03:15,634 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 04:03:22,342 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.39 vs. limit=15.0 2024-08-15 04:03:25,826 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.760e+01 2.294e+01 2.579e+01 3.082e+01 3.957e+01, threshold=5.158e+01, percent-clipped=0.0 2024-08-15 04:03:31,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2996750.0, ans=0.125 2024-08-15 04:03:38,215 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2996750.0, ans=0.1 2024-08-15 04:03:39,309 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 28 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-15 04:03:44,234 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 28 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-15 04:03:45,316 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 9850, loss[loss=0.1018, beats_loss=0.01067, ecapa_loss=0.0001422, whisper_loss=0.08973, over 23181.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0106, ecapa_loss=0.0001531, whisper_loss=0.09086, over 3895965.94 frames. ], batch size: 94, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:03:47,180 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.30 vs. limit=15.0 2024-08-15 04:03:54,830 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.48 vs. limit=12.0 2024-08-15 04:04:02,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2996950.0, ans=0.125 2024-08-15 04:04:42,726 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 18 from LS+wenet, 25 from Vox, 19 fro AS 2024-08-15 04:05:11,116 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 9900, loss[loss=0.09398, beats_loss=0.01171, ecapa_loss=0.0001437, whisper_loss=0.08083, over 22323.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0106, ecapa_loss=0.0001518, whisper_loss=0.09129, over 3907129.12 frames. ], batch size: 91, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:05:38,128 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-15 04:06:02,467 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-15 04:06:12,392 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.606e+01 2.308e+01 2.598e+01 3.035e+01 4.044e+01, threshold=5.195e+01, percent-clipped=0.0 2024-08-15 04:06:27,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2997750.0, ans=0.1 2024-08-15 04:06:28,071 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.71 vs. limit=15.0 2024-08-15 04:06:34,725 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 9950, loss[loss=0.08913, beats_loss=0.01014, ecapa_loss=0.0001944, whisper_loss=0.07705, over 22052.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01059, ecapa_loss=0.0001508, whisper_loss=0.09174, over 3906152.67 frames. ], batch size: 91, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:06:36,607 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 30 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-15 04:06:39,044 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-15 04:06:46,444 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.02 vs. limit=15.0 2024-08-15 04:07:03,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2997950.0, ans=0.0 2024-08-15 04:07:52,695 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 14 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-15 04:08:01,345 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 10000, loss[loss=0.09826, beats_loss=0.01157, ecapa_loss=0.0001208, whisper_loss=0.08548, over 20064.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01065, ecapa_loss=0.0001507, whisper_loss=0.09153, over 3897880.81 frames. ], batch size: 78, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:08:55,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2998650.0, ans=0.0 2024-08-15 04:09:02,403 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.788e+01 2.337e+01 2.587e+01 2.892e+01 1.142e+02, threshold=5.175e+01, percent-clipped=1.0 2024-08-15 04:09:18,055 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.88 vs. limit=22.5 2024-08-15 04:09:23,499 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 10050, loss[loss=0.1038, beats_loss=0.009104, ecapa_loss=0.0001318, whisper_loss=0.09342, over 16112.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.0105, ecapa_loss=0.0001513, whisper_loss=0.09204, over 3898886.71 frames. ], batch size: 60, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:09:47,575 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2998950.0, ans=0.2 2024-08-15 04:10:03,927 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2999050.0, ans=0.125 2024-08-15 04:10:09,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2999050.0, ans=0.125 2024-08-15 04:10:16,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2999150.0, ans=0.125 2024-08-15 04:10:19,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2999150.0, ans=0.125 2024-08-15 04:10:21,808 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 26 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-15 04:10:21,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2999150.0, ans=0.2 2024-08-15 04:10:30,243 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2999250.0, ans=0.125 2024-08-15 04:10:45,010 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=2999250.0, ans=15.0 2024-08-15 04:10:47,441 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 10100, loss[loss=0.102, beats_loss=0.01177, ecapa_loss=0.0001459, whisper_loss=0.0888, over 21572.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0106, ecapa_loss=0.0001513, whisper_loss=0.09145, over 3900647.08 frames. ], batch size: 88, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:10:47,551 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-15 04:11:04,978 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 22 from LS+wenet, 11 from Vox, 45 fro AS 2024-08-15 04:11:43,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2999650.0, ans=0.1 2024-08-15 04:11:44,479 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.436e+01 2.693e+01 3.002e+01 5.180e+01, threshold=5.387e+01, percent-clipped=1.0 2024-08-15 04:12:05,226 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 10150, loss[loss=0.09116, beats_loss=0.01016, ecapa_loss=0.0001549, whisper_loss=0.07945, over 16080.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01069, ecapa_loss=0.0001516, whisper_loss=0.09034, over 3891794.61 frames. ], batch size: 64, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:12:13,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2999850.0, ans=0.0 2024-08-15 04:12:19,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2999950.0, ans=0.0 2024-08-15 04:12:26,919 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-300000.pt 2024-08-15 04:12:33,904 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2999950.0, ans=0.125 2024-08-15 04:12:46,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3000050.0, ans=0.125 2024-08-15 04:12:46,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3000050.0, ans=0.125 2024-08-15 04:13:00,493 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-15 04:13:11,619 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-15 04:13:12,355 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.77 vs. limit=15.0 2024-08-15 04:13:25,142 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 10200, loss[loss=0.1064, beats_loss=0.01101, ecapa_loss=0.00013, whisper_loss=0.09412, over 14509.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01074, ecapa_loss=0.0001519, whisper_loss=0.08982, over 3902837.82 frames. ], batch size: 55, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:13:27,328 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 29 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-15 04:13:38,310 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-15 04:13:55,209 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.82 vs. limit=15.0 2024-08-15 04:13:59,533 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.86 vs. limit=6.0 2024-08-15 04:14:09,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3000550.0, ans=0.1 2024-08-15 04:14:09,961 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.49 vs. limit=22.5 2024-08-15 04:14:18,951 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.47 vs. limit=22.5 2024-08-15 04:14:20,297 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.50 vs. limit=22.5 2024-08-15 04:14:23,542 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.355e+01 2.535e+01 2.807e+01 5.755e+01, threshold=5.070e+01, percent-clipped=1.0 2024-08-15 04:14:40,422 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 37 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-15 04:14:42,144 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 29 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-15 04:14:43,175 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 10250, loss[loss=0.1249, beats_loss=0.009294, ecapa_loss=0.0001558, whisper_loss=0.114, over 19706.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01069, ecapa_loss=0.0001514, whisper_loss=0.09036, over 3903640.13 frames. ], batch size: 77, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:15:24,386 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-15 04:15:30,105 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 27 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-15 04:15:30,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3001150.0, ans=0.0 2024-08-15 04:15:32,222 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.51 vs. limit=15.0 2024-08-15 04:15:40,642 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-15 04:15:56,752 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3001250.0, ans=0.0 2024-08-15 04:15:59,418 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-15 04:16:00,705 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 10300, loss[loss=0.08741, beats_loss=0.01046, ecapa_loss=0.0001559, whisper_loss=0.07539, over 21814.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01066, ecapa_loss=0.000152, whisper_loss=0.09029, over 3889605.78 frames. ], batch size: 90, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:16:07,470 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3001350.0, ans=0.125 2024-08-15 04:16:11,273 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.03 vs. limit=15.0 2024-08-15 04:16:12,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3001350.0, ans=0.5 2024-08-15 04:16:17,985 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2024-08-15 04:16:25,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3001450.0, ans=0.0 2024-08-15 04:16:59,016 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.405e+01 2.691e+01 3.048e+01 4.748e+01, threshold=5.382e+01, percent-clipped=0.0 2024-08-15 04:17:03,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3001750.0, ans=0.125 2024-08-15 04:17:03,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3001750.0, ans=0.125 2024-08-15 04:17:19,573 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 10350, loss[loss=0.09974, beats_loss=0.01083, ecapa_loss=0.0001421, whisper_loss=0.08748, over 21665.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01063, ecapa_loss=0.0001515, whisper_loss=0.09041, over 3906450.16 frames. ], batch size: 87, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:17:26,594 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.10 vs. limit=15.0 2024-08-15 04:17:28,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3001850.0, ans=0.2 2024-08-15 04:17:29,335 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 14 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-15 04:17:40,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3001950.0, ans=0.1 2024-08-15 04:17:44,909 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-15 04:17:51,288 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 25 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-15 04:18:15,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3002150.0, ans=0.125 2024-08-15 04:18:28,891 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 11 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-15 04:18:30,758 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3002250.0, ans=0.0 2024-08-15 04:18:40,862 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 10400, loss[loss=0.1155, beats_loss=0.01019, ecapa_loss=0.0001329, whisper_loss=0.104, over 22463.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01064, ecapa_loss=0.0001513, whisper_loss=0.09, over 3900233.78 frames. ], batch size: 88, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:19:13,677 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3002550.0, ans=0.0 2024-08-15 04:19:14,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3002550.0, ans=0.0 2024-08-15 04:19:26,569 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3002650.0, ans=0.125 2024-08-15 04:19:34,950 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.912e+01 2.336e+01 2.571e+01 2.760e+01 5.271e+01, threshold=5.142e+01, percent-clipped=0.0 2024-08-15 04:19:53,656 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 10450, loss[loss=0.1182, beats_loss=0.009873, ecapa_loss=0.0001298, whisper_loss=0.107, over 19693.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01058, ecapa_loss=0.0001514, whisper_loss=0.09057, over 3858865.83 frames. ], batch size: 75, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:20:06,329 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.65 vs. limit=6.0 2024-08-15 04:20:13,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3002950.0, ans=0.125 2024-08-15 04:20:26,920 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 25 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-15 04:20:30,886 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 17 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-15 04:20:50,168 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-15 04:20:53,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3003250.0, ans=0.0 2024-08-15 04:21:03,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3003350.0, ans=0.125 2024-08-15 04:21:04,017 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 10500, loss[loss=0.1091, beats_loss=0.009184, ecapa_loss=0.0001539, whisper_loss=0.09836, over 20722.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01063, ecapa_loss=0.0001524, whisper_loss=0.09028, over 3865421.93 frames. ], batch size: 84, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:21:06,974 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3003350.0, ans=0.2 2024-08-15 04:21:10,624 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 27 from Vox, 25 fro AS 2024-08-15 04:21:16,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3003450.0, ans=0.125 2024-08-15 04:21:31,142 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-15 04:21:50,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3003650.0, ans=0.125 2024-08-15 04:21:54,329 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.921e+01 2.267e+01 2.471e+01 2.846e+01 8.765e+01, threshold=4.941e+01, percent-clipped=1.0 2024-08-15 04:22:12,628 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 10550, loss[loss=0.08379, beats_loss=0.009787, ecapa_loss=0.0001695, whisper_loss=0.07231, over 16429.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01058, ecapa_loss=0.0001523, whisper_loss=0.09027, over 3867701.18 frames. ], batch size: 63, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:22:14,524 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3003850.0, ans=0.125 2024-08-15 04:22:15,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3003850.0, ans=0.025 2024-08-15 04:22:34,425 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3003950.0, ans=0.125 2024-08-15 04:22:43,206 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 20 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-15 04:22:47,771 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 23 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-15 04:23:13,956 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3004250.0, ans=0.125 2024-08-15 04:23:21,344 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 10600, loss[loss=0.1156, beats_loss=0.01065, ecapa_loss=0.0001498, whisper_loss=0.1034, over 23318.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01053, ecapa_loss=0.0001535, whisper_loss=0.09077, over 3888631.10 frames. ], batch size: 94, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:23:31,235 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-15 04:23:44,169 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.32 vs. limit=22.5 2024-08-15 04:23:49,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3004550.0, ans=0.0 2024-08-15 04:23:58,442 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 24 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-15 04:24:12,003 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.948e+01 2.388e+01 2.629e+01 3.044e+01 4.366e+02, threshold=5.258e+01, percent-clipped=2.0 2024-08-15 04:24:25,239 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 19 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-15 04:24:25,525 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3004750.0, ans=0.125 2024-08-15 04:24:26,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3004750.0, ans=0.0 2024-08-15 04:24:29,554 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 16 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-15 04:24:30,781 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 10650, loss[loss=0.08841, beats_loss=0.01133, ecapa_loss=0.0001093, whisper_loss=0.07599, over 17033.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01053, ecapa_loss=0.0001518, whisper_loss=0.09156, over 3884385.14 frames. ], batch size: 65, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:24:41,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3004850.0, ans=0.0 2024-08-15 04:24:44,802 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 12 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-15 04:25:08,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3005050.0, ans=0.1 2024-08-15 04:25:09,994 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 38 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-15 04:25:14,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3005150.0, ans=0.125 2024-08-15 04:25:15,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3005150.0, ans=0.125 2024-08-15 04:25:25,931 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 23 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-15 04:25:35,988 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3005250.0, ans=0.0 2024-08-15 04:25:43,969 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 10700, loss[loss=0.09253, beats_loss=0.00969, ecapa_loss=0.0001638, whisper_loss=0.08121, over 17718.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01065, ecapa_loss=0.0001502, whisper_loss=0.09093, over 3884618.44 frames. ], batch size: 69, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:25:46,620 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.40 vs. limit=6.0 2024-08-15 04:25:51,017 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 22 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-15 04:26:08,568 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 20 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-15 04:26:08,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3005450.0, ans=0.0 2024-08-15 04:26:14,682 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-15 04:26:25,791 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.59 vs. limit=10.0 2024-08-15 04:26:32,622 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-15 04:26:39,707 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.854e+01 2.322e+01 2.552e+01 2.947e+01 1.324e+02, threshold=5.105e+01, percent-clipped=0.0 2024-08-15 04:26:47,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3005750.0, ans=0.0 2024-08-15 04:26:49,031 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3005750.0, ans=0.0 2024-08-15 04:26:52,973 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.31 vs. limit=12.0 2024-08-15 04:26:59,476 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 10750, loss[loss=0.1183, beats_loss=0.009558, ecapa_loss=0.0001457, whisper_loss=0.1073, over 19778.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01058, ecapa_loss=0.0001513, whisper_loss=0.09155, over 3874911.44 frames. ], batch size: 75, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:27:21,050 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.99 vs. limit=22.5 2024-08-15 04:27:30,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3006050.0, ans=0.125 2024-08-15 04:27:38,234 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-15 04:27:50,904 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3006150.0, ans=0.1 2024-08-15 04:28:10,571 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-15 04:28:16,866 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 10800, loss[loss=0.1111, beats_loss=0.009686, ecapa_loss=0.0001488, whisper_loss=0.09991, over 20776.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01055, ecapa_loss=0.0001512, whisper_loss=0.09208, over 3857287.65 frames. ], batch size: 80, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:28:21,645 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-15 04:28:24,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3006350.0, ans=0.125 2024-08-15 04:28:24,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3006350.0, ans=0.125 2024-08-15 04:28:38,745 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3006450.0, ans=0.07 2024-08-15 04:29:12,511 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.977e+01 2.426e+01 2.732e+01 3.113e+01 1.619e+02, threshold=5.464e+01, percent-clipped=2.0 2024-08-15 04:29:16,857 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 19 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-15 04:29:18,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3006750.0, ans=0.0 2024-08-15 04:29:18,821 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3006750.0, ans=0.125 2024-08-15 04:29:31,755 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 10850, loss[loss=0.108, beats_loss=0.01164, ecapa_loss=0.0001721, whisper_loss=0.0946, over 20795.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01066, ecapa_loss=0.0001508, whisper_loss=0.09177, over 3886454.24 frames. ], batch size: 88, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:29:52,729 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3006950.0, ans=0.2 2024-08-15 04:30:16,173 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3007050.0, ans=0.2 2024-08-15 04:30:39,755 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 04:30:45,616 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 15 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-15 04:30:50,682 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 10900, loss[loss=0.1211, beats_loss=0.00829, ecapa_loss=0.0001414, whisper_loss=0.1114, over 23286.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0107, ecapa_loss=0.0001505, whisper_loss=0.09141, over 3927811.64 frames. ], batch size: 88, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:30:52,340 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.518e-03 2024-08-15 04:30:55,411 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3007350.0, ans=0.2 2024-08-15 04:30:56,545 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 28 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-15 04:31:03,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3007350.0, ans=0.125 2024-08-15 04:31:03,818 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.38 vs. limit=6.0 2024-08-15 04:31:05,087 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2024-08-15 04:31:11,916 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3007450.0, ans=0.125 2024-08-15 04:31:15,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3007450.0, ans=0.0 2024-08-15 04:31:23,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3007550.0, ans=0.1 2024-08-15 04:31:47,148 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.315e+01 2.550e+01 2.913e+01 4.386e+01, threshold=5.099e+01, percent-clipped=0.0 2024-08-15 04:31:51,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3007750.0, ans=0.2 2024-08-15 04:31:56,530 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 26 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-15 04:32:02,778 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3007750.0, ans=0.125 2024-08-15 04:32:05,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3007850.0, ans=0.5 2024-08-15 04:32:06,998 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 10950, loss[loss=0.1156, beats_loss=0.01171, ecapa_loss=0.0001574, whisper_loss=0.1023, over 21532.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.0107, ecapa_loss=0.0001499, whisper_loss=0.09174, over 3939632.33 frames. ], batch size: 89, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:32:07,355 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3007850.0, ans=0.125 2024-08-15 04:32:11,178 INFO [train_multi_KD3.py:844] (0/4) A total of 99 cuts. 32 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-15 04:32:14,270 INFO [train_multi_KD3.py:844] (0/4) A total of 98 cuts. 26 from LS+wenet, 28 from Vox, 44 fro AS 2024-08-15 04:32:22,766 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3007950.0, ans=0.0 2024-08-15 04:32:56,052 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-15 04:33:16,328 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 34 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-15 04:33:21,401 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 21 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-15 04:33:22,431 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 11000, loss[loss=0.09279, beats_loss=0.01055, ecapa_loss=0.0001376, whisper_loss=0.08086, over 18282.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01069, ecapa_loss=0.0001509, whisper_loss=0.0915, over 3942371.86 frames. ], batch size: 72, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:33:41,541 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-15 04:33:46,180 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-15 04:33:57,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3008550.0, ans=0.125 2024-08-15 04:34:03,654 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 30 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-15 04:34:03,927 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3008550.0, ans=0.0 2024-08-15 04:34:20,285 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.899e+01 2.435e+01 2.579e+01 2.993e+01 2.045e+02, threshold=5.158e+01, percent-clipped=2.0 2024-08-15 04:34:20,469 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 22 from LS+wenet, 25 from Vox, 47 fro AS 2024-08-15 04:34:22,524 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.06 vs. limit=15.0 2024-08-15 04:34:37,241 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 11050, loss[loss=0.08167, beats_loss=0.01048, ecapa_loss=0.0001737, whisper_loss=0.06945, over 17936.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.0107, ecapa_loss=0.0001511, whisper_loss=0.09127, over 3937113.95 frames. ], batch size: 71, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:34:45,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3008850.0, ans=0.125 2024-08-15 04:34:55,306 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.90 vs. limit=15.0 2024-08-15 04:34:56,870 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.84 vs. limit=15.0 2024-08-15 04:35:05,706 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 23 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-15 04:35:10,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3009050.0, ans=0.0 2024-08-15 04:35:15,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3009050.0, ans=0.1 2024-08-15 04:35:20,781 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 25 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-15 04:35:21,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3009150.0, ans=0.2 2024-08-15 04:35:22,440 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 23 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-15 04:35:27,075 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.31 vs. limit=6.0 2024-08-15 04:35:31,044 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.46 vs. limit=15.0 2024-08-15 04:35:32,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3009150.0, ans=0.0 2024-08-15 04:35:52,399 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 11100, loss[loss=0.1109, beats_loss=0.01031, ecapa_loss=0.0001438, whisper_loss=0.09912, over 18262.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0107, ecapa_loss=0.0001508, whisper_loss=0.09098, over 3893177.25 frames. ], batch size: 71, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:36:02,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3009350.0, ans=0.0 2024-08-15 04:36:08,799 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3009450.0, ans=0.0 2024-08-15 04:36:14,227 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3009450.0, ans=0.1 2024-08-15 04:36:24,465 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3009550.0, ans=0.07 2024-08-15 04:36:32,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3009550.0, ans=0.0 2024-08-15 04:36:32,236 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.28 vs. limit=15.0 2024-08-15 04:36:36,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3009650.0, ans=0.125 2024-08-15 04:36:48,768 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.388e+01 2.670e+01 2.959e+01 6.163e+01, threshold=5.341e+01, percent-clipped=1.0 2024-08-15 04:36:59,161 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 23 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-15 04:37:00,919 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2024-08-15 04:37:02,025 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 23 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-15 04:37:07,415 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 11150, loss[loss=0.09308, beats_loss=0.01143, ecapa_loss=0.0001461, whisper_loss=0.0802, over 20229.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01063, ecapa_loss=0.0001516, whisper_loss=0.09087, over 3875722.50 frames. ], batch size: 83, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:37:10,070 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.39 vs. limit=15.0 2024-08-15 04:37:11,057 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3009850.0, ans=0.125 2024-08-15 04:37:43,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3010050.0, ans=0.125 2024-08-15 04:37:45,223 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 12 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-15 04:37:54,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3010150.0, ans=0.125 2024-08-15 04:37:56,824 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 22 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-15 04:38:19,748 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 11200, loss[loss=0.07343, beats_loss=0.01265, ecapa_loss=0.0001721, whisper_loss=0.05906, over 18879.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01061, ecapa_loss=0.0001505, whisper_loss=0.09161, over 3859620.90 frames. ], batch size: 83, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:38:29,421 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.32 vs. limit=15.0 2024-08-15 04:38:34,496 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 31 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-15 04:38:37,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3010450.0, ans=0.125 2024-08-15 04:38:43,237 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 24 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-15 04:39:02,525 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.11 vs. limit=22.5 2024-08-15 04:39:14,336 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3010650.0, ans=0.0 2024-08-15 04:39:15,040 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.986e+01 2.332e+01 2.561e+01 2.829e+01 4.358e+01, threshold=5.122e+01, percent-clipped=0.0 2024-08-15 04:39:20,695 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.11 vs. limit=10.0 2024-08-15 04:39:21,872 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 23 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-15 04:39:30,522 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.42 vs. limit=6.0 2024-08-15 04:39:32,964 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 21 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-15 04:39:33,978 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 11250, loss[loss=0.08677, beats_loss=0.009438, ecapa_loss=0.0001379, whisper_loss=0.07595, over 20158.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01061, ecapa_loss=0.0001516, whisper_loss=0.0911, over 3864610.78 frames. ], batch size: 77, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:39:56,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3010950.0, ans=0.125 2024-08-15 04:40:00,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=3010950.0, ans=0.5 2024-08-15 04:40:01,201 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-15 04:40:17,755 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 23 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-15 04:40:39,517 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3011250.0, ans=0.125 2024-08-15 04:40:45,442 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 34 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-15 04:40:50,863 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 11300, loss[loss=0.1247, beats_loss=0.01211, ecapa_loss=0.0001292, whisper_loss=0.1113, over 17060.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01058, ecapa_loss=0.0001507, whisper_loss=0.09144, over 3893800.00 frames. ], batch size: 64, lr: 2.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:41:05,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3011450.0, ans=0.125 2024-08-15 04:41:12,320 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.92 vs. limit=15.0 2024-08-15 04:41:17,717 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.28 vs. limit=15.0 2024-08-15 04:41:38,443 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 25 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-15 04:41:52,939 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.854e+01 2.314e+01 2.562e+01 2.942e+01 5.561e+01, threshold=5.125e+01, percent-clipped=1.0 2024-08-15 04:42:04,348 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 24 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-15 04:42:10,037 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 11350, loss[loss=0.08525, beats_loss=0.01086, ecapa_loss=0.0001546, whisper_loss=0.07284, over 21361.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01057, ecapa_loss=0.0001509, whisper_loss=0.09076, over 3872052.25 frames. ], batch size: 89, lr: 2.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:42:16,211 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-15 04:42:17,962 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.64 vs. limit=15.0 2024-08-15 04:42:26,974 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3011950.0, ans=0.125 2024-08-15 04:42:31,175 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 25 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-15 04:42:34,742 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.51 vs. limit=15.0 2024-08-15 04:42:37,553 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3011950.0, ans=0.1 2024-08-15 04:42:44,121 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3012050.0, ans=0.0 2024-08-15 04:42:48,504 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 23 from LS+wenet, 29 from Vox, 40 fro AS 2024-08-15 04:42:52,954 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-15 04:42:53,158 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3012050.0, ans=0.1 2024-08-15 04:43:22,854 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-15 04:43:25,269 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 11400, loss[loss=0.1141, beats_loss=0.01103, ecapa_loss=8.786e-05, whisper_loss=0.1022, over 18887.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01055, ecapa_loss=0.0001518, whisper_loss=0.09164, over 3885725.84 frames. ], batch size: 67, lr: 2.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:44:04,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3012550.0, ans=0.04949747468305833 2024-08-15 04:44:22,711 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+01 2.412e+01 2.712e+01 2.971e+01 3.918e+01, threshold=5.424e+01, percent-clipped=0.0 2024-08-15 04:44:26,682 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-15 04:44:35,831 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3012750.0, ans=0.125 2024-08-15 04:44:39,718 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 11450, loss[loss=0.1077, beats_loss=0.01009, ecapa_loss=0.0001411, whisper_loss=0.09621, over 23884.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01062, ecapa_loss=0.0001512, whisper_loss=0.09116, over 3901731.16 frames. ], batch size: 93, lr: 2.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:45:29,587 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-15 04:45:30,982 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 21 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-15 04:45:35,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3013150.0, ans=0.125 2024-08-15 04:45:43,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3013250.0, ans=0.125 2024-08-15 04:45:51,533 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 21 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-15 04:45:55,950 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 11500, loss[loss=0.1117, beats_loss=0.01158, ecapa_loss=0.0001541, whisper_loss=0.09862, over 22305.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01066, ecapa_loss=0.0001512, whisper_loss=0.09129, over 3930316.96 frames. ], batch size: 88, lr: 2.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:45:57,710 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 14 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-15 04:46:01,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3013350.0, ans=0.0 2024-08-15 04:46:14,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3013450.0, ans=0.125 2024-08-15 04:46:20,952 INFO [train_multi_KD3.py:844] (0/4) A total of 96 cuts. 27 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-15 04:46:32,469 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 27 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-15 04:46:33,829 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-15 04:46:52,430 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.820e+01 2.370e+01 2.550e+01 2.848e+01 7.027e+01, threshold=5.100e+01, percent-clipped=1.0 2024-08-15 04:47:01,028 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 34 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-15 04:47:08,653 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 11550, loss[loss=0.1106, beats_loss=0.0086, ecapa_loss=0.0001619, whisper_loss=0.1003, over 19066.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01064, ecapa_loss=0.0001512, whisper_loss=0.09058, over 3891524.26 frames. ], batch size: 76, lr: 2.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:47:14,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3013850.0, ans=0.2 2024-08-15 04:47:14,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3013850.0, ans=0.125 2024-08-15 04:47:47,044 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.92 vs. limit=15.0 2024-08-15 04:47:51,608 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3014050.0, ans=0.125 2024-08-15 04:47:56,461 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=3.320e-02 2024-08-15 04:48:19,012 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 25 from LS+wenet, 10 from Vox, 27 fro AS 2024-08-15 04:48:28,168 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3014350.0, ans=0.0 2024-08-15 04:48:29,164 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 11600, loss[loss=0.08205, beats_loss=0.01139, ecapa_loss=0.0001468, whisper_loss=0.06919, over 15743.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01063, ecapa_loss=0.0001518, whisper_loss=0.09069, over 3883652.49 frames. ], batch size: 64, lr: 2.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:48:51,962 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3014450.0, ans=0.0 2024-08-15 04:49:02,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3014550.0, ans=0.125 2024-08-15 04:49:31,680 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-15 04:49:32,701 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.834e+01 2.366e+01 2.590e+01 2.931e+01 3.199e+02, threshold=5.179e+01, percent-clipped=2.0 2024-08-15 04:49:45,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3014750.0, ans=0.0 2024-08-15 04:49:49,121 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 11650, loss[loss=0.1154, beats_loss=0.01105, ecapa_loss=0.0001208, whisper_loss=0.1031, over 23107.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01059, ecapa_loss=0.0001518, whisper_loss=0.09141, over 3911631.64 frames. ], batch size: 90, lr: 2.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:49:50,913 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 17 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-15 04:50:04,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3014950.0, ans=0.1 2024-08-15 04:50:20,502 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.36 vs. limit=22.5 2024-08-15 04:50:31,690 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 18 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-15 04:50:38,155 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 23 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-15 04:50:53,843 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.39 vs. limit=6.0 2024-08-15 04:51:01,015 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.66 vs. limit=15.0 2024-08-15 04:51:04,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3015250.0, ans=0.1 2024-08-15 04:51:05,810 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3015350.0, ans=0.125 2024-08-15 04:51:06,675 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 11700, loss[loss=0.09333, beats_loss=0.01243, ecapa_loss=0.0001207, whisper_loss=0.07969, over 17979.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01064, ecapa_loss=0.0001526, whisper_loss=0.09087, over 3881547.78 frames. ], batch size: 70, lr: 2.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:51:32,924 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 23 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-15 04:51:37,529 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-15 04:51:39,236 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-15 04:51:48,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3015550.0, ans=0.2 2024-08-15 04:51:50,055 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3015550.0, ans=0.125 2024-08-15 04:51:57,488 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-15 04:51:57,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3015650.0, ans=0.05 2024-08-15 04:51:59,036 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 19 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-15 04:51:59,621 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.76 vs. limit=12.0 2024-08-15 04:52:08,829 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.840e+01 2.391e+01 2.584e+01 2.894e+01 1.234e+02, threshold=5.167e+01, percent-clipped=2.0 2024-08-15 04:52:26,642 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 11750, loss[loss=0.09767, beats_loss=0.01138, ecapa_loss=0.0001077, whisper_loss=0.08522, over 15080.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01071, ecapa_loss=0.0001518, whisper_loss=0.09054, over 3894055.25 frames. ], batch size: 57, lr: 2.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:52:27,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3015850.0, ans=0.0 2024-08-15 04:52:33,247 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 20 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-15 04:52:43,475 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 22 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-15 04:52:43,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3015950.0, ans=0.2 2024-08-15 04:53:29,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3016150.0, ans=0.125 2024-08-15 04:53:38,413 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 28 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-15 04:53:47,783 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 11800, loss[loss=0.1057, beats_loss=0.01218, ecapa_loss=0.0001029, whisper_loss=0.0925, over 16700.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01063, ecapa_loss=0.0001522, whisper_loss=0.09104, over 3905299.11 frames. ], batch size: 63, lr: 2.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:53:49,626 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 21 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-15 04:53:54,421 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 27 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-15 04:54:08,628 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3016450.0, ans=0.125 2024-08-15 04:54:17,533 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-15 04:54:22,191 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3016550.0, ans=0.1 2024-08-15 04:54:37,631 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.63 vs. limit=15.0 2024-08-15 04:54:47,710 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.714e+01 2.361e+01 2.696e+01 3.037e+01 7.582e+01, threshold=5.392e+01, percent-clipped=2.0 2024-08-15 04:54:48,114 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3016750.0, ans=0.1 2024-08-15 04:55:01,292 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 24 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-15 04:55:05,085 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 11850, loss[loss=0.1065, beats_loss=0.009753, ecapa_loss=0.0001729, whisper_loss=0.09505, over 20155.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01062, ecapa_loss=0.0001529, whisper_loss=0.09098, over 3918377.82 frames. ], batch size: 82, lr: 2.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:55:06,920 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3016850.0, ans=0.0 2024-08-15 04:55:17,592 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 25 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-15 04:55:28,061 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.30 vs. limit=15.0 2024-08-15 04:55:29,473 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.92 vs. limit=15.0 2024-08-15 04:55:36,733 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 21 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-15 04:55:44,652 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3017050.0, ans=0.125 2024-08-15 04:55:47,716 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 22 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-15 04:55:49,487 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 28 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-15 04:56:15,013 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2024-08-15 04:56:20,988 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 11900, loss[loss=0.1138, beats_loss=0.01015, ecapa_loss=0.0001498, whisper_loss=0.1022, over 23201.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01064, ecapa_loss=0.0001527, whisper_loss=0.09104, over 3961179.20 frames. ], batch size: 93, lr: 2.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:56:22,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3017350.0, ans=0.1 2024-08-15 04:56:36,660 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.82 vs. limit=15.0 2024-08-15 04:56:37,659 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 23 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-15 04:56:46,119 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 25 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-15 04:56:55,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3017550.0, ans=0.125 2024-08-15 04:57:01,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3017550.0, ans=0.0 2024-08-15 04:57:11,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3017650.0, ans=0.0 2024-08-15 04:57:20,144 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.806e+01 2.277e+01 2.486e+01 2.850e+01 3.770e+01, threshold=4.972e+01, percent-clipped=0.0 2024-08-15 04:57:35,729 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 11950, loss[loss=0.09436, beats_loss=0.01018, ecapa_loss=0.0001805, whisper_loss=0.08238, over 14327.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01062, ecapa_loss=0.0001541, whisper_loss=0.091, over 3935003.38 frames. ], batch size: 57, lr: 2.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:57:38,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3017850.0, ans=0.125 2024-08-15 04:57:44,593 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 33 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-15 04:57:59,965 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=8.17 vs. limit=12.0 2024-08-15 04:58:06,756 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-15 04:58:08,061 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-15 04:58:16,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3018050.0, ans=0.0 2024-08-15 04:58:20,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3018150.0, ans=0.125 2024-08-15 04:58:48,132 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 12000, loss[loss=0.1011, beats_loss=0.01288, ecapa_loss=0.0001154, whisper_loss=0.0871, over 20857.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0106, ecapa_loss=0.0001521, whisper_loss=0.09115, over 3918368.70 frames. ], batch size: 82, lr: 2.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:58:48,133 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-15 04:59:32,705 INFO [train_multi_KD3.py:1149] (0/4) Epoch 21, validation on ASR_libri: loss=0.2527, beats_loss=0, ecapa_loss=0.0005394, whisper_loss=0.2473, over 922467.00 frames. 2024-08-15 04:59:53,149 INFO [train_multi_KD3.py:1149] (0/4) Epoch 21, validation on SV_voxceleb1: loss=0.004335, beats_loss=0, ecapa_loss=0.0004335, whisper_loss=0, over 939242.00 frames. 2024-08-15 05:01:54,668 INFO [train_multi_KD3.py:1149] (0/4) Epoch 21, validation on AT_audioset: loss=0.02336, beats_loss=0.02336, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 05:01:54,672 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32601MB 2024-08-15 05:02:25,617 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.03 vs. limit=22.5 2024-08-15 05:02:28,924 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 18 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-15 05:02:30,260 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 18 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-15 05:02:40,091 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-15 05:02:50,210 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.896e+01 2.329e+01 2.556e+01 2.882e+01 4.155e+01, threshold=5.113e+01, percent-clipped=0.0 2024-08-15 05:02:52,247 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.70 vs. limit=6.0 2024-08-15 05:03:04,940 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 12050, loss[loss=0.1136, beats_loss=0.007802, ecapa_loss=0.0001497, whisper_loss=0.1043, over 16920.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01059, ecapa_loss=0.0001514, whisper_loss=0.0908, over 3894011.80 frames. ], batch size: 64, lr: 2.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 05:03:28,582 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-15 05:03:34,146 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3019050.0, ans=0.125 2024-08-15 05:03:34,394 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.12 vs. limit=15.0 2024-08-15 05:03:39,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3019050.0, ans=0.125 2024-08-15 05:03:43,403 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3019050.0, ans=0.125 2024-08-15 05:03:52,195 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3019150.0, ans=0.125 2024-08-15 05:03:57,242 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3019150.0, ans=0.0 2024-08-15 05:03:59,859 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3019250.0, ans=0.09899494936611666 2024-08-15 05:04:05,378 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 23 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-15 05:04:13,354 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 12100, loss[loss=0.08946, beats_loss=0.0131, ecapa_loss=0.0001407, whisper_loss=0.07495, over 23215.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01064, ecapa_loss=0.0001515, whisper_loss=0.0902, over 3869715.35 frames. ], batch size: 95, lr: 2.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 05:04:14,694 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-15 05:04:30,568 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 23 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-15 05:04:42,207 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 27 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-15 05:04:44,053 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3019550.0, ans=0.125 2024-08-15 05:04:57,466 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 14 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-15 05:05:09,153 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.57 vs. limit=12.0 2024-08-15 05:05:09,570 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.760e+01 2.350e+01 2.548e+01 2.785e+01 3.671e+01, threshold=5.096e+01, percent-clipped=0.0 2024-08-15 05:05:26,329 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 12150, loss[loss=0.0849, beats_loss=0.01275, ecapa_loss=0.0001469, whisper_loss=0.07068, over 13945.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01065, ecapa_loss=0.0001523, whisper_loss=0.08982, over 3849323.85 frames. ], batch size: 56, lr: 2.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 05:05:36,136 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-15 05:05:36,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3019850.0, ans=0.1 2024-08-15 05:05:53,171 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 21 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-15 05:05:54,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3019950.0, ans=0.1 2024-08-15 05:05:56,486 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 32 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-15 05:06:02,832 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 22 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-15 05:06:39,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3020250.0, ans=0.0 2024-08-15 05:06:40,216 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.36 vs. limit=15.0 2024-08-15 05:06:43,436 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-15 05:06:46,615 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 12200, loss[loss=0.09556, beats_loss=0.01179, ecapa_loss=0.0001675, whisper_loss=0.08209, over 18377.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01065, ecapa_loss=0.0001516, whisper_loss=0.08974, over 3823070.15 frames. ], batch size: 75, lr: 2.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 05:07:06,679 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 26 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-15 05:07:11,684 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.57 vs. limit=12.0 2024-08-15 05:07:18,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3020550.0, ans=0.125 2024-08-15 05:07:28,069 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3020550.0, ans=0.1 2024-08-15 05:07:31,250 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=3.458e-02 2024-08-15 05:07:45,910 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.310e+01 2.623e+01 3.026e+01 6.571e+01, threshold=5.245e+01, percent-clipped=3.0 2024-08-15 05:07:51,892 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3020750.0, ans=0.125 2024-08-15 05:07:57,115 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.92 vs. limit=10.0 2024-08-15 05:08:03,540 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 12250, loss[loss=0.1024, beats_loss=0.01035, ecapa_loss=0.0001647, whisper_loss=0.0904, over 18434.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01067, ecapa_loss=0.0001537, whisper_loss=0.08955, over 3851831.46 frames. ], batch size: 73, lr: 2.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 05:08:11,239 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3020850.0, ans=0.1 2024-08-15 05:08:17,955 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 17 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-15 05:08:19,452 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 29 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-15 05:08:21,229 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3020950.0, ans=0.0 2024-08-15 05:08:30,733 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-15 05:08:40,884 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.479e+00 2024-08-15 05:08:42,305 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 19 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-15 05:08:42,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3021050.0, ans=0.09899494936611666 2024-08-15 05:08:43,636 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 25 from LS+wenet, 13 from Vox, 43 fro AS 2024-08-15 05:08:52,717 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3021150.0, ans=0.0 2024-08-15 05:08:56,898 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 22 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-15 05:09:04,739 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 22 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-15 05:09:04,895 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3021250.0, ans=0.125 2024-08-15 05:09:19,664 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 19 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-15 05:09:20,938 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 12300, loss[loss=0.08386, beats_loss=0.0102, ecapa_loss=0.0001428, whisper_loss=0.07223, over 19033.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01071, ecapa_loss=0.0001525, whisper_loss=0.08959, over 3846698.69 frames. ], batch size: 77, lr: 2.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 05:09:49,162 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 19 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-15 05:09:51,519 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 25 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-15 05:09:59,817 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 21 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-15 05:10:07,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3021650.0, ans=0.0 2024-08-15 05:10:21,421 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 20 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-15 05:10:22,769 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.930e+01 2.390e+01 2.646e+01 2.944e+01 2.237e+02, threshold=5.293e+01, percent-clipped=1.0 2024-08-15 05:10:34,482 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-15 05:10:38,181 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 12350, loss[loss=0.08906, beats_loss=0.01151, ecapa_loss=0.0001696, whisper_loss=0.07585, over 18570.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01065, ecapa_loss=0.0001534, whisper_loss=0.08977, over 3849581.92 frames. ], batch size: 74, lr: 2.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 05:10:46,243 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-15 05:10:51,220 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.15 vs. limit=10.0 2024-08-15 05:10:53,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3021950.0, ans=0.2 2024-08-15 05:11:00,991 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3021950.0, ans=0.0 2024-08-15 05:11:09,401 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=3022050.0, ans=15.0 2024-08-15 05:11:28,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3022150.0, ans=0.0 2024-08-15 05:11:28,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3022150.0, ans=0.09899494936611666 2024-08-15 05:11:32,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3022150.0, ans=0.125 2024-08-15 05:11:43,783 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 17 from LS+wenet, 20 from Vox, 16 fro AS 2024-08-15 05:11:46,286 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 37 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-15 05:11:46,964 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.33 vs. limit=15.0 2024-08-15 05:11:48,060 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3022250.0, ans=10.0 2024-08-15 05:11:50,377 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 12400, loss[loss=0.07656, beats_loss=0.0124, ecapa_loss=0.0001587, whisper_loss=0.06258, over 19085.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01068, ecapa_loss=0.0001518, whisper_loss=0.09032, over 3875566.32 frames. ], batch size: 80, lr: 2.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 05:11:56,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3022350.0, ans=0.125 2024-08-15 05:11:57,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3022350.0, ans=0.125 2024-08-15 05:12:01,397 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-15 05:12:05,406 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 27 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-15 05:12:08,319 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 22 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-15 05:12:18,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3022550.0, ans=0.95 2024-08-15 05:12:30,178 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3022650.0, ans=0.125 2024-08-15 05:12:41,722 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 19 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-15 05:12:42,905 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.329e+01 2.587e+01 2.851e+01 3.829e+01, threshold=5.175e+01, percent-clipped=0.0 2024-08-15 05:12:55,143 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3022750.0, ans=0.125 2024-08-15 05:12:58,007 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 12450, loss[loss=0.1077, beats_loss=0.01267, ecapa_loss=0.0001252, whisper_loss=0.09381, over 16610.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01062, ecapa_loss=0.0001514, whisper_loss=0.09041, over 3857162.89 frames. ], batch size: 67, lr: 2.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 05:12:58,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3022850.0, ans=0.1 2024-08-15 05:13:32,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3023050.0, ans=0.0 2024-08-15 05:13:36,097 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3023050.0, ans=0.0 2024-08-15 05:14:04,746 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 12500, loss[loss=0.1186, beats_loss=0.009188, ecapa_loss=0.0001481, whisper_loss=0.1079, over 19741.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01061, ecapa_loss=0.0001519, whisper_loss=0.09084, over 3883506.29 frames. ], batch size: 75, lr: 2.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 05:14:06,992 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.99 vs. limit=15.0 2024-08-15 05:14:09,861 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 28 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-15 05:14:24,350 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3023450.0, ans=0.2 2024-08-15 05:14:31,926 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-15 05:14:32,397 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.20 vs. limit=15.0 2024-08-15 05:14:35,873 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 35 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-15 05:14:39,041 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3023550.0, ans=0.0 2024-08-15 05:14:43,453 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.55 vs. limit=22.5 2024-08-15 05:14:57,795 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.752e+01 2.324e+01 2.569e+01 2.941e+01 3.163e+02, threshold=5.138e+01, percent-clipped=2.0 2024-08-15 05:15:00,069 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3023750.0, ans=0.125 2024-08-15 05:15:12,647 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 12550, loss[loss=0.1221, beats_loss=0.01002, ecapa_loss=0.0001888, whisper_loss=0.1102, over 18363.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01063, ecapa_loss=0.0001511, whisper_loss=0.09065, over 3910193.77 frames. ], batch size: 74, lr: 2.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 05:15:19,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=3023850.0, ans=22.5 2024-08-15 05:15:31,334 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 21 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-15 05:16:06,737 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.19 vs. limit=15.0 2024-08-15 05:16:13,575 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3024250.0, ans=0.125 2024-08-15 05:16:20,258 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 12600, loss[loss=0.1341, beats_loss=0.00794, ecapa_loss=0.0001769, whisper_loss=0.1244, over 23023.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0107, ecapa_loss=0.0001513, whisper_loss=0.09058, over 3909421.58 frames. ], batch size: 89, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:16:24,274 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 19 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-15 05:16:29,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3024350.0, ans=0.0 2024-08-15 05:16:32,524 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3024450.0, ans=0.125 2024-08-15 05:16:36,528 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-15 05:16:44,795 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 17 from LS+wenet, 25 from Vox, 49 fro AS 2024-08-15 05:16:54,189 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3024550.0, ans=0.125 2024-08-15 05:17:06,430 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3024650.0, ans=0.1 2024-08-15 05:17:07,517 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 22 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-15 05:17:08,969 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3024650.0, ans=0.125 2024-08-15 05:17:13,737 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.257e+01 2.680e+01 2.970e+01 2.910e+02, threshold=5.361e+01, percent-clipped=1.0 2024-08-15 05:17:17,413 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=17.62 vs. limit=15.0 2024-08-15 05:17:27,251 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 12650, loss[loss=0.06672, beats_loss=0.01411, ecapa_loss=0.000111, whisper_loss=0.0515, over 17955.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01077, ecapa_loss=0.0001525, whisper_loss=0.09051, over 3922391.08 frames. ], batch size: 69, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:17:27,417 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-15 05:17:33,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3024850.0, ans=0.125 2024-08-15 05:17:35,549 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.22 vs. limit=15.0 2024-08-15 05:17:39,285 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 15 from LS+wenet, 23 from Vox, 16 fro AS 2024-08-15 05:17:42,180 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 13 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-15 05:17:44,957 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 23 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-15 05:17:53,813 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 20 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-15 05:18:19,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3025250.0, ans=0.125 2024-08-15 05:18:22,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3025250.0, ans=0.0 2024-08-15 05:18:28,390 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 19 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-15 05:18:33,340 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 12700, loss[loss=0.08934, beats_loss=0.009844, ecapa_loss=0.0001559, whisper_loss=0.07794, over 14880.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01075, ecapa_loss=0.0001518, whisper_loss=0.09111, over 3907543.11 frames. ], batch size: 60, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:18:33,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3025350.0, ans=0.125 2024-08-15 05:18:34,779 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 30 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-15 05:18:37,978 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 05:19:06,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3025550.0, ans=0.0 2024-08-15 05:19:26,765 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.807e+01 2.352e+01 2.609e+01 2.982e+01 1.854e+02, threshold=5.218e+01, percent-clipped=2.0 2024-08-15 05:19:32,188 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-15 05:19:32,406 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3025750.0, ans=0.125 2024-08-15 05:19:39,864 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 12750, loss[loss=0.1031, beats_loss=0.01104, ecapa_loss=0.0001208, whisper_loss=0.09088, over 19097.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01075, ecapa_loss=0.0001524, whisper_loss=0.0904, over 3890852.93 frames. ], batch size: 74, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:19:52,175 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3025950.0, ans=0.1 2024-08-15 05:19:53,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3025950.0, ans=0.0 2024-08-15 05:20:02,617 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 23 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-15 05:20:05,755 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-15 05:20:14,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3026050.0, ans=0.125 2024-08-15 05:20:20,805 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 24 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-15 05:20:22,341 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3026150.0, ans=0.0 2024-08-15 05:20:24,894 WARNING [optim.py:496] (0/4) Scaling gradients by 0.023750245571136475, model_norm_threshold=52.18341064453125 2024-08-15 05:20:25,075 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.16, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.496e+05, grad_sumsq=7.496e+05, orig_rms_sq=1.000e+00 2024-08-15 05:20:45,889 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 12800, loss[loss=0.08056, beats_loss=0.01397, ecapa_loss=0.0001291, whisper_loss=0.0653, over 17527.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01077, ecapa_loss=0.0001555, whisper_loss=0.09006, over 3897574.77 frames. ], batch size: 70, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:21:20,909 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3026550.0, ans=0.0 2024-08-15 05:21:27,278 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 23 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-15 05:21:34,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3026650.0, ans=0.125 2024-08-15 05:21:39,059 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.818e+01 2.369e+01 2.658e+01 2.978e+01 2.197e+03, threshold=5.317e+01, percent-clipped=3.0 2024-08-15 05:21:39,499 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3026750.0, ans=0.125 2024-08-15 05:21:48,938 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-15 05:21:52,665 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 12850, loss[loss=0.1224, beats_loss=0.009184, ecapa_loss=0.0001698, whisper_loss=0.1116, over 21517.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01065, ecapa_loss=0.0001563, whisper_loss=0.0902, over 3880404.15 frames. ], batch size: 88, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:21:52,792 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 20 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-15 05:21:53,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3026850.0, ans=0.0 2024-08-15 05:21:54,460 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 12 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-15 05:22:05,762 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3026950.0, ans=0.125 2024-08-15 05:22:08,564 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.78 vs. limit=15.0 2024-08-15 05:22:24,944 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 26 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-15 05:22:26,189 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 35 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-15 05:22:31,765 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-15 05:22:50,599 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3027250.0, ans=0.0 2024-08-15 05:22:54,317 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 23 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-15 05:22:59,386 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 12900, loss[loss=0.09884, beats_loss=0.01138, ecapa_loss=0.0001413, whisper_loss=0.08605, over 23136.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01066, ecapa_loss=0.0001561, whisper_loss=0.08964, over 3858681.43 frames. ], batch size: 93, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:23:00,052 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3027350.0, ans=0.1 2024-08-15 05:23:14,562 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 25 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-15 05:23:25,503 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3027550.0, ans=0.125 2024-08-15 05:23:29,999 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.977e-01 2024-08-15 05:23:31,064 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 24 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-15 05:23:33,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3027550.0, ans=0.0 2024-08-15 05:23:42,746 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 28 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-15 05:23:53,731 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.303e+01 2.501e+01 2.765e+01 4.358e+01, threshold=5.003e+01, percent-clipped=0.0 2024-08-15 05:24:00,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3027750.0, ans=0.0 2024-08-15 05:24:06,526 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 12950, loss[loss=0.107, beats_loss=0.01024, ecapa_loss=0.0001589, whisper_loss=0.09516, over 20535.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01067, ecapa_loss=0.0001544, whisper_loss=0.08992, over 3839021.65 frames. ], batch size: 83, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:24:13,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3027850.0, ans=0.0 2024-08-15 05:24:13,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3027850.0, ans=0.0 2024-08-15 05:24:25,926 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 21 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-15 05:24:38,004 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-15 05:24:45,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3028150.0, ans=0.0 2024-08-15 05:25:01,980 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3028250.0, ans=0.125 2024-08-15 05:25:02,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3028250.0, ans=0.2 2024-08-15 05:25:10,114 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3028250.0, ans=0.0 2024-08-15 05:25:13,675 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 13000, loss[loss=0.114, beats_loss=0.01028, ecapa_loss=0.0001678, whisper_loss=0.102, over 22719.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01069, ecapa_loss=0.0001539, whisper_loss=0.08996, over 3901089.76 frames. ], batch size: 93, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:25:19,445 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=5.096e-03 2024-08-15 05:25:31,767 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3028450.0, ans=0.1 2024-08-15 05:25:37,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3028450.0, ans=0.125 2024-08-15 05:25:42,406 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.88 vs. limit=22.5 2024-08-15 05:25:50,147 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-15 05:25:51,566 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-15 05:26:07,342 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.787e+01 2.449e+01 2.686e+01 3.098e+01 1.940e+02, threshold=5.373e+01, percent-clipped=2.0 2024-08-15 05:26:09,213 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3028750.0, ans=0.2 2024-08-15 05:26:17,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3028750.0, ans=0.125 2024-08-15 05:26:21,100 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 13050, loss[loss=0.1017, beats_loss=0.01045, ecapa_loss=0.0001422, whisper_loss=0.08978, over 22505.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01065, ecapa_loss=0.0001542, whisper_loss=0.09031, over 3897205.40 frames. ], batch size: 90, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:26:25,058 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.40 vs. limit=15.0 2024-08-15 05:26:29,488 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 16 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-15 05:26:33,663 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-15 05:26:48,094 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 26 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-15 05:26:53,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3029050.0, ans=0.125 2024-08-15 05:27:00,205 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3029050.0, ans=0.1 2024-08-15 05:27:03,007 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 23 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-15 05:27:09,096 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.74 vs. limit=15.0 2024-08-15 05:27:14,846 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3029150.0, ans=0.125 2024-08-15 05:27:30,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3029250.0, ans=0.1 2024-08-15 05:27:33,579 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 13100, loss[loss=0.1055, beats_loss=0.0102, ecapa_loss=0.0001287, whisper_loss=0.09398, over 19124.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01069, ecapa_loss=0.0001539, whisper_loss=0.08993, over 3872735.72 frames. ], batch size: 72, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:28:16,846 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 34 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-15 05:28:17,134 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3029550.0, ans=0.125 2024-08-15 05:28:36,028 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.005e+01 2.367e+01 2.624e+01 3.031e+01 1.630e+02, threshold=5.247e+01, percent-clipped=4.0 2024-08-15 05:28:41,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3029750.0, ans=0.0 2024-08-15 05:28:50,981 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 13150, loss[loss=0.07275, beats_loss=0.012, ecapa_loss=0.0001493, whisper_loss=0.05925, over 16144.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0107, ecapa_loss=0.0001505, whisper_loss=0.09024, over 3869492.82 frames. ], batch size: 66, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:28:59,819 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.67 vs. limit=22.5 2024-08-15 05:29:05,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3029950.0, ans=0.0 2024-08-15 05:29:13,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3029950.0, ans=0.125 2024-08-15 05:29:28,834 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3030050.0, ans=0.0 2024-08-15 05:29:46,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3030150.0, ans=0.0 2024-08-15 05:30:09,231 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 13200, loss[loss=0.09319, beats_loss=0.009574, ecapa_loss=0.0001805, whisper_loss=0.08181, over 13928.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0106, ecapa_loss=0.0001517, whisper_loss=0.09076, over 3849038.42 frames. ], batch size: 55, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:30:12,172 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.43 vs. limit=15.0 2024-08-15 05:30:28,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3030450.0, ans=0.125 2024-08-15 05:30:31,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3030450.0, ans=0.0 2024-08-15 05:30:37,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3030450.0, ans=0.04949747468305833 2024-08-15 05:30:44,366 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 21 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-15 05:31:01,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3030650.0, ans=0.125 2024-08-15 05:31:04,270 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=3030650.0, ans=0.2 2024-08-15 05:31:11,631 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.934e+01 2.273e+01 2.515e+01 2.855e+01 4.648e+01, threshold=5.030e+01, percent-clipped=0.0 2024-08-15 05:31:14,201 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.79 vs. limit=6.0 2024-08-15 05:31:15,518 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.53 vs. limit=15.0 2024-08-15 05:31:26,954 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 13250, loss[loss=0.1423, beats_loss=0.006642, ecapa_loss=0.0001413, whisper_loss=0.1343, over 14792.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01059, ecapa_loss=0.000152, whisper_loss=0.0906, over 3862200.29 frames. ], batch size: 54, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:31:28,892 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3030850.0, ans=0.125 2024-08-15 05:31:34,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3030850.0, ans=0.09899494936611666 2024-08-15 05:31:36,363 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.57 vs. limit=15.0 2024-08-15 05:31:37,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3030850.0, ans=0.125 2024-08-15 05:32:10,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3031050.0, ans=0.125 2024-08-15 05:32:16,810 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.91 vs. limit=15.0 2024-08-15 05:32:34,748 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 16 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-15 05:32:41,696 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 13300, loss[loss=0.1031, beats_loss=0.009505, ecapa_loss=0.0001652, whisper_loss=0.0919, over 22215.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0106, ecapa_loss=0.0001514, whisper_loss=0.08995, over 3836222.19 frames. ], batch size: 88, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:32:48,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3031350.0, ans=0.1 2024-08-15 05:32:50,040 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 30 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-15 05:32:51,524 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 34 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-15 05:33:06,071 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 24 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-15 05:33:09,769 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.04 vs. limit=15.0 2024-08-15 05:33:16,569 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 23 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-15 05:33:18,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3031550.0, ans=0.125 2024-08-15 05:33:25,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3031650.0, ans=0.125 2024-08-15 05:33:41,223 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.965e+01 2.389e+01 2.602e+01 2.951e+01 3.808e+01, threshold=5.204e+01, percent-clipped=0.0 2024-08-15 05:33:47,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3031750.0, ans=0.125 2024-08-15 05:33:52,452 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 33 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-15 05:33:55,372 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 13350, loss[loss=0.1256, beats_loss=0.01039, ecapa_loss=0.0001302, whisper_loss=0.1139, over 16790.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01059, ecapa_loss=0.0001496, whisper_loss=0.09062, over 3843662.68 frames. ], batch size: 67, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:34:12,556 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3031950.0, ans=0.05 2024-08-15 05:34:12,861 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.56 vs. limit=15.0 2024-08-15 05:34:22,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3032050.0, ans=0.125 2024-08-15 05:34:31,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3032050.0, ans=0.125 2024-08-15 05:34:31,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3032050.0, ans=0.09899494936611666 2024-08-15 05:34:35,585 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3032050.0, ans=0.0 2024-08-15 05:34:42,325 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 27 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-15 05:34:48,630 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.49 vs. limit=12.0 2024-08-15 05:35:06,178 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 13400, loss[loss=0.1153, beats_loss=0.008608, ecapa_loss=0.0001741, whisper_loss=0.1049, over 17326.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01059, ecapa_loss=0.0001499, whisper_loss=0.09027, over 3836060.62 frames. ], batch size: 71, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:35:18,902 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.242e+01 2024-08-15 05:35:20,182 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3032450.0, ans=0.0 2024-08-15 05:35:37,039 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 13 from Vox, 47 fro AS 2024-08-15 05:35:46,599 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3032650.0, ans=0.125 2024-08-15 05:36:02,278 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.770e+01 2.320e+01 2.582e+01 2.828e+01 6.062e+01, threshold=5.164e+01, percent-clipped=1.0 2024-08-15 05:36:06,046 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3032750.0, ans=0.2 2024-08-15 05:36:14,193 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.28 vs. limit=15.0 2024-08-15 05:36:18,140 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 13450, loss[loss=0.08188, beats_loss=0.01173, ecapa_loss=0.0001679, whisper_loss=0.06847, over 22413.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01061, ecapa_loss=0.00015, whisper_loss=0.09018, over 3837616.91 frames. ], batch size: 94, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:36:42,033 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.04 vs. limit=12.0 2024-08-15 05:36:53,393 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 35 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-15 05:37:03,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3033150.0, ans=0.125 2024-08-15 05:37:10,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3033150.0, ans=0.0 2024-08-15 05:37:17,591 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 36 from Vox, 29 fro AS 2024-08-15 05:37:22,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3033250.0, ans=0.2 2024-08-15 05:37:23,639 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3033250.0, ans=0.125 2024-08-15 05:37:29,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3033350.0, ans=0.2 2024-08-15 05:37:30,438 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 13500, loss[loss=0.1208, beats_loss=0.00919, ecapa_loss=0.0001672, whisper_loss=0.1099, over 20701.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01057, ecapa_loss=0.0001523, whisper_loss=0.09002, over 3844085.39 frames. ], batch size: 82, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:37:32,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3033350.0, ans=0.04949747468305833 2024-08-15 05:37:33,525 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3033350.0, ans=0.125 2024-08-15 05:37:35,405 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3033350.0, ans=0.0 2024-08-15 05:37:40,520 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 18 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-15 05:37:59,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3033550.0, ans=0.1 2024-08-15 05:38:14,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3033650.0, ans=0.125 2024-08-15 05:38:26,869 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.796e+01 2.315e+01 2.561e+01 2.861e+01 3.892e+01, threshold=5.123e+01, percent-clipped=0.0 2024-08-15 05:38:35,438 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 14 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-15 05:38:41,136 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 13550, loss[loss=0.1216, beats_loss=0.01024, ecapa_loss=0.0001439, whisper_loss=0.1099, over 19465.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01053, ecapa_loss=0.0001526, whisper_loss=0.09117, over 3866642.19 frames. ], batch size: 76, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:38:50,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3033850.0, ans=0.0 2024-08-15 05:38:53,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3033850.0, ans=0.2 2024-08-15 05:38:55,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3033950.0, ans=0.0 2024-08-15 05:38:57,933 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-15 05:39:03,923 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 17 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-15 05:39:09,834 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3034050.0, ans=0.025 2024-08-15 05:39:15,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3034050.0, ans=0.125 2024-08-15 05:39:17,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3034050.0, ans=0.0 2024-08-15 05:39:22,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3034050.0, ans=0.015 2024-08-15 05:39:23,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3034150.0, ans=0.0 2024-08-15 05:39:36,929 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.23 vs. limit=22.5 2024-08-15 05:39:38,273 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3034250.0, ans=0.125 2024-08-15 05:39:40,670 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-15 05:39:47,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3034250.0, ans=0.0 2024-08-15 05:39:53,120 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 13600, loss[loss=0.1104, beats_loss=0.01224, ecapa_loss=0.0001126, whisper_loss=0.09699, over 18181.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01061, ecapa_loss=0.0001507, whisper_loss=0.09113, over 3872870.10 frames. ], batch size: 68, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:40:19,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3034450.0, ans=0.125 2024-08-15 05:40:24,524 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-15 05:40:34,072 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-15 05:40:37,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3034650.0, ans=0.0 2024-08-15 05:40:46,465 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 24 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-15 05:40:53,773 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.747e+01 2.277e+01 2.545e+01 2.819e+01 3.866e+01, threshold=5.090e+01, percent-clipped=0.0 2024-08-15 05:40:59,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3034750.0, ans=0.2 2024-08-15 05:41:01,449 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3034750.0, ans=0.2 2024-08-15 05:41:08,344 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 13650, loss[loss=0.0918, beats_loss=0.01245, ecapa_loss=0.0001475, whisper_loss=0.07787, over 16301.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01062, ecapa_loss=0.000151, whisper_loss=0.09113, over 3911045.61 frames. ], batch size: 66, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:41:23,142 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3034950.0, ans=0.0 2024-08-15 05:41:25,212 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.55 vs. limit=15.0 2024-08-15 05:41:43,595 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 14 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-15 05:42:09,408 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 10 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-15 05:42:11,234 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.64 vs. limit=15.0 2024-08-15 05:42:22,458 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 13700, loss[loss=0.08237, beats_loss=0.01084, ecapa_loss=0.0001311, whisper_loss=0.07022, over 14864.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01066, ecapa_loss=0.0001511, whisper_loss=0.09084, over 3900892.81 frames. ], batch size: 57, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:42:24,459 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3035350.0, ans=0.125 2024-08-15 05:42:41,128 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 17 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-15 05:42:41,792 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.52 vs. limit=6.0 2024-08-15 05:42:47,623 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 19 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-15 05:42:54,403 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 18 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-15 05:42:59,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3035550.0, ans=0.0 2024-08-15 05:43:11,869 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3035650.0, ans=0.125 2024-08-15 05:43:14,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3035650.0, ans=0.0 2024-08-15 05:43:23,416 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.259e+01 2.458e+01 2.754e+01 9.155e+01, threshold=4.917e+01, percent-clipped=1.0 2024-08-15 05:43:24,415 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.12 vs. limit=15.0 2024-08-15 05:43:38,634 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 13750, loss[loss=0.1159, beats_loss=0.01112, ecapa_loss=0.0001036, whisper_loss=0.1038, over 23898.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01054, ecapa_loss=0.0001514, whisper_loss=0.09096, over 3859388.03 frames. ], batch size: 88, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:43:47,248 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 19 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-15 05:43:52,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3035850.0, ans=0.125 2024-08-15 05:43:55,068 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 35 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-15 05:44:02,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3035950.0, ans=0.05 2024-08-15 05:44:19,743 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3036050.0, ans=0.125 2024-08-15 05:44:59,571 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3036350.0, ans=0.0 2024-08-15 05:45:00,489 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 13800, loss[loss=0.1167, beats_loss=0.01035, ecapa_loss=0.0001449, whisper_loss=0.1049, over 23003.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01049, ecapa_loss=0.0001503, whisper_loss=0.09193, over 3880839.05 frames. ], batch size: 92, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:45:12,888 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 25 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-15 05:45:36,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3036550.0, ans=0.0 2024-08-15 05:46:01,255 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-15 05:46:03,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3036650.0, ans=0.1 2024-08-15 05:46:05,925 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.641e+01 2.299e+01 2.505e+01 2.770e+01 3.939e+01, threshold=5.011e+01, percent-clipped=0.0 2024-08-15 05:46:12,031 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 36 from LS+wenet, 13 from Vox, 42 fro AS 2024-08-15 05:46:22,220 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 13850, loss[loss=0.1046, beats_loss=0.00871, ecapa_loss=0.0001782, whisper_loss=0.09408, over 16822.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01048, ecapa_loss=0.000151, whisper_loss=0.09205, over 3874856.58 frames. ], batch size: 68, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:46:31,551 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.61 vs. limit=15.0 2024-08-15 05:46:33,109 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.25 vs. limit=22.5 2024-08-15 05:47:17,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3037150.0, ans=0.0 2024-08-15 05:47:23,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3037150.0, ans=0.05 2024-08-15 05:47:28,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3037250.0, ans=0.125 2024-08-15 05:47:28,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3037250.0, ans=0.125 2024-08-15 05:47:30,630 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.53 vs. limit=15.0 2024-08-15 05:47:41,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3037350.0, ans=0.0 2024-08-15 05:47:42,790 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 13900, loss[loss=0.1248, beats_loss=0.01028, ecapa_loss=0.0001612, whisper_loss=0.1129, over 15662.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01045, ecapa_loss=0.0001507, whisper_loss=0.09223, over 3855409.31 frames. ], batch size: 60, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:47:45,442 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3037350.0, ans=0.125 2024-08-15 05:48:07,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3037450.0, ans=0.125 2024-08-15 05:48:39,418 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 23 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-15 05:48:44,167 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.66 vs. limit=15.0 2024-08-15 05:48:49,088 WARNING [optim.py:496] (0/4) Scaling gradients by 0.05059259384870529, model_norm_threshold=50.10878372192383 2024-08-15 05:48:49,292 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.37, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.601e+05, grad_sumsq=3.601e+05, orig_rms_sq=1.000e+00 2024-08-15 05:48:51,773 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.901e+01 2.379e+01 2.607e+01 2.936e+01 9.904e+02, threshold=5.213e+01, percent-clipped=4.0 2024-08-15 05:49:01,100 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.68 vs. limit=15.0 2024-08-15 05:49:06,852 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 13950, loss[loss=0.1272, beats_loss=0.01029, ecapa_loss=0.0001571, whisper_loss=0.1154, over 23180.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01045, ecapa_loss=0.0001506, whisper_loss=0.09214, over 3845218.06 frames. ], batch size: 92, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:49:11,048 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 24 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-15 05:49:37,278 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3037950.0, ans=0.0 2024-08-15 05:50:05,964 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-15 05:50:11,909 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-15 05:50:20,525 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 31 from Vox, 30 fro AS 2024-08-15 05:50:33,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3038250.0, ans=0.125 2024-08-15 05:50:37,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3038250.0, ans=0.1 2024-08-15 05:50:37,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3038250.0, ans=0.2 2024-08-15 05:50:44,609 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 14000, loss[loss=0.1168, beats_loss=0.0102, ecapa_loss=0.0001897, whisper_loss=0.1047, over 18078.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01053, ecapa_loss=0.0001502, whisper_loss=0.09127, over 3877060.32 frames. ], batch size: 71, lr: 2.93e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:50:50,955 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.87 vs. limit=6.0 2024-08-15 05:50:52,593 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3038350.0, ans=0.2 2024-08-15 05:50:52,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3038350.0, ans=0.125 2024-08-15 05:51:00,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3038350.0, ans=0.125 2024-08-15 05:51:03,603 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3038350.0, ans=0.0 2024-08-15 05:51:36,935 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.23 vs. limit=10.0 2024-08-15 05:51:48,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3038650.0, ans=0.0 2024-08-15 05:51:58,876 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3038650.0, ans=0.2 2024-08-15 05:52:11,439 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.958e+01 2.343e+01 2.615e+01 2.930e+01 6.184e+01, threshold=5.231e+01, percent-clipped=1.0 2024-08-15 05:52:35,287 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 14050, loss[loss=0.1034, beats_loss=0.009989, ecapa_loss=0.0001529, whisper_loss=0.09185, over 16981.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01053, ecapa_loss=0.0001495, whisper_loss=0.09139, over 3883047.83 frames. ], batch size: 64, lr: 2.93e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:53:00,999 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-15 05:53:04,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3038950.0, ans=0.0 2024-08-15 05:53:34,870 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3039150.0, ans=0.125 2024-08-15 05:53:41,238 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-15 05:53:52,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3039150.0, ans=0.2 2024-08-15 05:53:54,899 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3039250.0, ans=0.1 2024-08-15 05:54:03,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3039250.0, ans=0.125 2024-08-15 05:54:15,959 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 23 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-15 05:54:18,249 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 14100, loss[loss=0.08775, beats_loss=0.01111, ecapa_loss=0.0001444, whisper_loss=0.0752, over 21295.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01057, ecapa_loss=0.0001495, whisper_loss=0.09125, over 3909470.48 frames. ], batch size: 87, lr: 2.93e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:54:27,408 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.16 vs. limit=22.5 2024-08-15 05:54:30,821 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3039350.0, ans=0.2 2024-08-15 05:54:45,739 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.48 vs. limit=15.0 2024-08-15 05:55:21,245 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 26 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-15 05:55:24,038 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-15 05:55:27,905 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.018e+01 2.350e+01 2.664e+01 3.020e+01 1.564e+02, threshold=5.328e+01, percent-clipped=1.0 2024-08-15 05:55:36,763 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 13 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-15 05:55:41,433 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 14150, loss[loss=0.1203, beats_loss=0.01191, ecapa_loss=0.000169, whisper_loss=0.1067, over 15448.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01057, ecapa_loss=0.0001503, whisper_loss=0.09089, over 3867326.30 frames. ], batch size: 63, lr: 2.93e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:55:51,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3039850.0, ans=0.125 2024-08-15 05:56:01,238 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-304000.pt 2024-08-15 05:56:05,504 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 22 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-15 05:56:14,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3040050.0, ans=0.125 2024-08-15 05:56:34,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3040150.0, ans=0.125 2024-08-15 05:56:39,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3040150.0, ans=0.0 2024-08-15 05:56:43,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=3040250.0, ans=0.05 2024-08-15 05:56:49,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3040250.0, ans=0.04949747468305833 2024-08-15 05:56:58,512 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 14200, loss[loss=0.1211, beats_loss=0.007604, ecapa_loss=0.0001722, whisper_loss=0.1118, over 21769.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01056, ecapa_loss=0.0001495, whisper_loss=0.09112, over 3884061.50 frames. ], batch size: 88, lr: 2.93e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:57:02,057 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.57 vs. limit=15.0 2024-08-15 05:57:04,249 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3040350.0, ans=0.125 2024-08-15 05:57:08,282 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-15 05:57:29,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3040550.0, ans=0.0 2024-08-15 05:57:29,039 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3040550.0, ans=0.125 2024-08-15 05:57:31,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3040550.0, ans=0.1 2024-08-15 05:57:32,436 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3040550.0, ans=0.0 2024-08-15 05:57:35,264 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3040550.0, ans=0.125 2024-08-15 05:58:00,373 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.915e+01 2.326e+01 2.592e+01 2.924e+01 6.304e+01, threshold=5.183e+01, percent-clipped=1.0 2024-08-15 05:58:02,107 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-15 05:58:02,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3040750.0, ans=0.125 2024-08-15 05:58:08,737 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3040750.0, ans=0.1 2024-08-15 05:58:14,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3040850.0, ans=0.125 2024-08-15 05:58:15,398 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 14250, loss[loss=0.1112, beats_loss=0.01162, ecapa_loss=0.0001264, whisper_loss=0.0983, over 18616.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01054, ecapa_loss=0.0001482, whisper_loss=0.09133, over 3892765.06 frames. ], batch size: 73, lr: 2.93e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:58:15,586 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-15 05:58:28,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3040850.0, ans=0.1 2024-08-15 05:58:37,270 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 24 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-15 05:58:57,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3041050.0, ans=0.2 2024-08-15 05:59:06,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3041150.0, ans=0.125 2024-08-15 05:59:08,051 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.05 vs. limit=15.0 2024-08-15 05:59:10,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=3041150.0, ans=15.0 2024-08-15 05:59:23,821 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-15 05:59:32,034 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 26 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-15 05:59:36,771 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 14300, loss[loss=0.09323, beats_loss=0.01174, ecapa_loss=0.0001564, whisper_loss=0.07992, over 19042.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01057, ecapa_loss=0.0001487, whisper_loss=0.09132, over 3904125.87 frames. ], batch size: 78, lr: 2.93e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:59:41,840 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 18 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-15 05:59:46,045 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.21 vs. limit=5.0 2024-08-15 05:59:50,027 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-15 06:00:25,757 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 16 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-15 06:00:35,257 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3041650.0, ans=0.0 2024-08-15 06:00:36,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3041650.0, ans=0.04949747468305833 2024-08-15 06:00:42,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3041750.0, ans=0.125 2024-08-15 06:00:44,982 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.985e+01 2.480e+01 2.675e+01 2.988e+01 3.150e+02, threshold=5.350e+01, percent-clipped=2.0 2024-08-15 06:00:50,158 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 29 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-15 06:00:53,529 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3041750.0, ans=0.0 2024-08-15 06:01:00,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3041850.0, ans=0.125 2024-08-15 06:01:01,727 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 14350, loss[loss=0.09164, beats_loss=0.01021, ecapa_loss=0.0001891, whisper_loss=0.07954, over 19126.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01064, ecapa_loss=0.0001492, whisper_loss=0.09067, over 3872288.76 frames. ], batch size: 79, lr: 2.93e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 06:01:16,539 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3041950.0, ans=0.0 2024-08-15 06:01:34,693 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3042050.0, ans=0.125 2024-08-15 06:02:19,142 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 14400, loss[loss=0.1095, beats_loss=0.01157, ecapa_loss=0.0001406, whisper_loss=0.09652, over 22463.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0106, ecapa_loss=0.0001501, whisper_loss=0.09077, over 3902860.46 frames. ], batch size: 89, lr: 2.93e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 06:02:38,646 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-15 06:03:03,341 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 19 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-15 06:03:03,867 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.93 vs. limit=15.0 2024-08-15 06:03:08,419 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.514e+01 2024-08-15 06:03:11,547 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3042650.0, ans=0.0 2024-08-15 06:03:13,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3042650.0, ans=0.1 2024-08-15 06:03:21,267 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 33 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-15 06:03:23,793 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.938e+01 2.353e+01 2.673e+01 3.020e+01 3.990e+01, threshold=5.347e+01, percent-clipped=0.0 2024-08-15 06:03:40,349 INFO [train_multi_KD3.py:1116] (0/4) Epoch 21, batch 14450, loss[loss=0.0871, beats_loss=0.01183, ecapa_loss=0.0001503, whisper_loss=0.07377, over 22308.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01051, ecapa_loss=0.0001517, whisper_loss=0.09131, over 3906531.04 frames. ], batch size: 94, lr: 2.93e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 06:03:52,506 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.42 vs. limit=15.0 2024-08-15 06:03:57,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3042950.0, ans=0.0 2024-08-15 06:03:57,407 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3042950.0, ans=0.125 2024-08-15 06:04:08,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3042950.0, ans=0.125 2024-08-15 06:04:21,531 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 18 from LS+wenet, 31 from Vox, 19 fro AS 2024-08-15 06:04:26,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3043050.0, ans=0.125 2024-08-15 06:04:30,779 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3043150.0, ans=0.0 2024-08-15 06:04:49,866 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/epoch-21.pt 2024-08-15 06:05:22,780 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 0, loss[loss=0.07552, beats_loss=0.01052, ecapa_loss=0.0001451, whisper_loss=0.06354, over 16177.00 frames. ], tot_loss[loss=0.07552, beats_loss=0.01052, ecapa_loss=0.0001451, whisper_loss=0.06354, over 16177.00 frames. ], batch size: 62, lr: 2.86e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 06:05:22,782 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-15 06:06:01,370 INFO [train_multi_KD3.py:1149] (0/4) Epoch 22, validation on ASR_libri: loss=0.2521, beats_loss=0, ecapa_loss=0.0005383, whisper_loss=0.2468, over 922467.00 frames. 2024-08-15 06:06:18,397 INFO [train_multi_KD3.py:1149] (0/4) Epoch 22, validation on SV_voxceleb1: loss=0.004241, beats_loss=0, ecapa_loss=0.0004241, whisper_loss=0, over 939242.00 frames. 2024-08-15 06:08:04,722 INFO [train_multi_KD3.py:1149] (0/4) Epoch 22, validation on AT_audioset: loss=0.02334, beats_loss=0.02334, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 06:08:04,725 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32601MB 2024-08-15 06:08:08,328 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-15 06:08:11,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3043270.0, ans=0.125 2024-08-15 06:08:19,631 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3043270.0, ans=0.1 2024-08-15 06:08:48,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3043370.0, ans=0.125 2024-08-15 06:09:00,024 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.59 vs. limit=15.0 2024-08-15 06:09:20,803 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.69 vs. limit=15.0 2024-08-15 06:09:39,093 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 26 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-15 06:10:00,990 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.979e+01 2.592e+01 2.838e+01 3.156e+01 2.932e+02, threshold=5.677e+01, percent-clipped=2.0 2024-08-15 06:10:05,680 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 50, loss[loss=0.0948, beats_loss=0.00964, ecapa_loss=0.0001692, whisper_loss=0.08347, over 19699.00 frames. ], tot_loss[loss=0.09941, beats_loss=0.009974, ecapa_loss=0.0001547, whisper_loss=0.08789, over 892056.06 frames. ], batch size: 84, lr: 2.86e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 06:10:11,331 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-15 06:10:41,350 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-15 06:10:43,512 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 24 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-15 06:11:07,000 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-15 06:11:29,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3044070.0, ans=0.125 2024-08-15 06:11:46,869 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=3044170.0, ans=15.0 2024-08-15 06:11:48,573 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 15 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-15 06:11:49,098 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3044170.0, ans=0.0 2024-08-15 06:11:57,268 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 100, loss[loss=0.104, beats_loss=0.008469, ecapa_loss=0.0001646, whisper_loss=0.09389, over 19053.00 frames. ], tot_loss[loss=0.102, beats_loss=0.009542, ecapa_loss=0.0001547, whisper_loss=0.09086, over 1554887.97 frames. ], batch size: 76, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:12:03,573 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-15 06:12:20,817 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 17 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-15 06:12:36,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3044370.0, ans=0.125 2024-08-15 06:12:46,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3044470.0, ans=0.1 2024-08-15 06:12:58,293 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3044470.0, ans=0.0 2024-08-15 06:13:14,149 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3044570.0, ans=0.125 2024-08-15 06:13:20,889 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.58 vs. limit=22.5 2024-08-15 06:13:35,609 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-15 06:13:49,412 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.051e+01 2.691e+01 2.918e+01 3.263e+01 8.817e+01, threshold=5.837e+01, percent-clipped=1.0 2024-08-15 06:13:54,357 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 150, loss[loss=0.08188, beats_loss=0.01121, ecapa_loss=0.0001544, whisper_loss=0.06913, over 15554.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.00955, ecapa_loss=0.0001543, whisper_loss=0.09137, over 2094252.45 frames. ], batch size: 64, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:14:16,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3044870.0, ans=0.0 2024-08-15 06:14:27,003 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 23 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-15 06:14:31,099 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.72 vs. limit=15.0 2024-08-15 06:14:32,413 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-15 06:14:36,205 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3044970.0, ans=0.0 2024-08-15 06:15:01,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3045070.0, ans=0.125 2024-08-15 06:15:28,638 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 200, loss[loss=0.1165, beats_loss=0.009513, ecapa_loss=0.0001166, whisper_loss=0.1059, over 20981.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.009845, ecapa_loss=0.0001535, whisper_loss=0.09078, over 2481437.18 frames. ], batch size: 76, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:15:37,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3045270.0, ans=0.07 2024-08-15 06:15:42,998 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.93 vs. limit=22.5 2024-08-15 06:15:45,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3045370.0, ans=0.1 2024-08-15 06:16:09,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3045470.0, ans=0.0 2024-08-15 06:16:20,276 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3045570.0, ans=0.0 2024-08-15 06:16:26,332 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 25 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-15 06:16:28,622 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.89 vs. limit=12.0 2024-08-15 06:16:44,399 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.948e+01 2.328e+01 2.566e+01 2.862e+01 5.342e+01, threshold=5.131e+01, percent-clipped=0.0 2024-08-15 06:16:44,729 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 25 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-15 06:16:47,428 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 250, loss[loss=0.1069, beats_loss=0.009022, ecapa_loss=0.0001708, whisper_loss=0.09621, over 18620.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.009914, ecapa_loss=0.0001516, whisper_loss=0.09242, over 2775790.91 frames. ], batch size: 75, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:16:52,204 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 18 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-15 06:17:04,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=3045870.0, ans=0.05 2024-08-15 06:17:22,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3045970.0, ans=0.0 2024-08-15 06:17:31,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3045970.0, ans=0.0 2024-08-15 06:17:43,338 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.78 vs. limit=15.0 2024-08-15 06:18:01,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3046170.0, ans=0.125 2024-08-15 06:18:05,568 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 300, loss[loss=0.07124, beats_loss=0.01088, ecapa_loss=0.0001635, whisper_loss=0.05872, over 16771.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01007, ecapa_loss=0.0001504, whisper_loss=0.09162, over 3000931.29 frames. ], batch size: 66, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:18:10,247 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3046270.0, ans=0.0 2024-08-15 06:18:10,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3046270.0, ans=0.125 2024-08-15 06:18:22,858 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3046370.0, ans=0.1 2024-08-15 06:18:25,732 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 27 from LS+wenet, 14 from Vox, 14 fro AS 2024-08-15 06:18:53,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3046570.0, ans=0.125 2024-08-15 06:19:01,542 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3046570.0, ans=0.05 2024-08-15 06:19:10,189 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.26 vs. limit=8.0 2024-08-15 06:19:11,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3046670.0, ans=0.0 2024-08-15 06:19:19,541 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.288e+01 2.599e+01 2.904e+01 1.999e+02, threshold=5.198e+01, percent-clipped=4.0 2024-08-15 06:19:22,682 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 350, loss[loss=0.08275, beats_loss=0.01259, ecapa_loss=0.0001323, whisper_loss=0.06884, over 22154.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01011, ecapa_loss=0.0001509, whisper_loss=0.09134, over 3157611.16 frames. ], batch size: 88, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:19:24,527 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=7.828e-03 2024-08-15 06:19:28,891 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 15 from LS+wenet, 11 from Vox, 34 fro AS 2024-08-15 06:19:36,675 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3046870.0, ans=0.125 2024-08-15 06:19:49,208 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-15 06:19:53,397 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 31 from Vox, 29 fro AS 2024-08-15 06:19:55,266 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3046970.0, ans=0.125 2024-08-15 06:19:58,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3046970.0, ans=0.025 2024-08-15 06:20:11,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3047070.0, ans=0.0 2024-08-15 06:20:19,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3047070.0, ans=0.125 2024-08-15 06:20:26,843 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 17 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-15 06:20:27,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3047170.0, ans=0.2 2024-08-15 06:20:30,427 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3047170.0, ans=0.1 2024-08-15 06:20:37,839 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 26 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-15 06:20:39,718 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3047270.0, ans=0.0 2024-08-15 06:20:40,470 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 400, loss[loss=0.1142, beats_loss=0.008246, ecapa_loss=0.0001873, whisper_loss=0.1041, over 16209.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01019, ecapa_loss=0.0001519, whisper_loss=0.09031, over 3292862.08 frames. ], batch size: 65, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:20:43,952 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3047270.0, ans=0.125 2024-08-15 06:20:44,088 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3047270.0, ans=0.125 2024-08-15 06:20:55,206 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 21 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-15 06:21:13,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3047470.0, ans=0.125 2024-08-15 06:21:24,060 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3047470.0, ans=0.125 2024-08-15 06:21:26,294 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3047570.0, ans=0.125 2024-08-15 06:21:27,358 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-15 06:21:30,578 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3047570.0, ans=0.1 2024-08-15 06:21:39,509 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 15 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-15 06:21:51,447 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3047670.0, ans=0.125 2024-08-15 06:21:54,274 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.977e+01 2.307e+01 2.559e+01 2.888e+01 1.580e+02, threshold=5.118e+01, percent-clipped=5.0 2024-08-15 06:21:57,083 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 450, loss[loss=0.09798, beats_loss=0.01023, ecapa_loss=0.0001619, whisper_loss=0.08613, over 22075.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01026, ecapa_loss=0.0001524, whisper_loss=0.08982, over 3434428.40 frames. ], batch size: 87, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:22:03,899 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 18 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-15 06:22:05,404 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 23 from LS+wenet, 8 from Vox, 27 fro AS 2024-08-15 06:22:07,036 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 29 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-15 06:22:11,621 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 22 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-15 06:22:20,778 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3047870.0, ans=0.125 2024-08-15 06:22:34,472 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3047970.0, ans=0.0 2024-08-15 06:22:42,759 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.02 vs. limit=15.0 2024-08-15 06:22:48,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3048070.0, ans=0.125 2024-08-15 06:22:57,985 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3048170.0, ans=0.2 2024-08-15 06:23:08,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3048170.0, ans=0.0 2024-08-15 06:23:14,468 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 500, loss[loss=0.0952, beats_loss=0.01159, ecapa_loss=0.0001463, whisper_loss=0.08215, over 21761.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01027, ecapa_loss=0.0001515, whisper_loss=0.09008, over 3499711.85 frames. ], batch size: 89, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:23:19,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3048270.0, ans=0.125 2024-08-15 06:23:26,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3048270.0, ans=0.125 2024-08-15 06:23:27,181 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.39 vs. limit=8.0 2024-08-15 06:23:28,927 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.76 vs. limit=15.0 2024-08-15 06:23:34,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3048370.0, ans=0.125 2024-08-15 06:23:48,586 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.30 vs. limit=15.0 2024-08-15 06:23:49,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3048470.0, ans=0.07 2024-08-15 06:24:00,956 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.24 vs. limit=15.0 2024-08-15 06:24:19,249 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3048670.0, ans=0.125 2024-08-15 06:24:20,735 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 18 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-15 06:24:26,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3048670.0, ans=0.2 2024-08-15 06:24:27,801 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3048670.0, ans=0.1 2024-08-15 06:24:28,570 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.741e+01 2.268e+01 2.600e+01 2.909e+01 8.676e+01, threshold=5.200e+01, percent-clipped=1.0 2024-08-15 06:24:28,959 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 33 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-15 06:24:30,403 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3048770.0, ans=0.0 2024-08-15 06:24:31,580 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 550, loss[loss=0.1155, beats_loss=0.009793, ecapa_loss=0.0001393, whisper_loss=0.1044, over 16007.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01025, ecapa_loss=0.0001519, whisper_loss=0.09083, over 3575605.32 frames. ], batch size: 62, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:24:31,873 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 31 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-15 06:24:43,889 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 26 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-15 06:24:50,487 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 25 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-15 06:25:01,066 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-15 06:25:04,452 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3048970.0, ans=0.125 2024-08-15 06:25:13,259 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 21 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-15 06:25:32,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3049170.0, ans=10.0 2024-08-15 06:25:47,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3049270.0, ans=0.2 2024-08-15 06:25:48,288 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 600, loss[loss=0.1042, beats_loss=0.01057, ecapa_loss=0.0001634, whisper_loss=0.09202, over 16909.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01038, ecapa_loss=0.0001512, whisper_loss=0.09025, over 3651994.73 frames. ], batch size: 68, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:26:09,104 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 17 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-15 06:26:18,736 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.66 vs. limit=15.0 2024-08-15 06:26:25,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3049470.0, ans=0.2 2024-08-15 06:26:32,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=3049470.0, ans=0.05 2024-08-15 06:26:39,097 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3049570.0, ans=0.2 2024-08-15 06:26:39,192 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3049570.0, ans=0.125 2024-08-15 06:26:42,218 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3049570.0, ans=0.09899494936611666 2024-08-15 06:27:03,919 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.879e+01 2.381e+01 2.532e+01 2.729e+01 4.299e+01, threshold=5.065e+01, percent-clipped=0.0 2024-08-15 06:27:07,410 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 650, loss[loss=0.09711, beats_loss=0.01299, ecapa_loss=9.345e-05, whisper_loss=0.08318, over 18970.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01046, ecapa_loss=0.0001504, whisper_loss=0.09032, over 3702064.71 frames. ], batch size: 72, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:27:11,016 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3049770.0, ans=0.125 2024-08-15 06:27:11,247 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.50 vs. limit=15.0 2024-08-15 06:28:16,754 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 21 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-15 06:28:21,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3050170.0, ans=0.125 2024-08-15 06:28:23,815 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 700, loss[loss=0.08162, beats_loss=0.008227, ecapa_loss=0.0001774, whisper_loss=0.07162, over 13217.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0104, ecapa_loss=0.0001508, whisper_loss=0.09104, over 3730077.40 frames. ], batch size: 53, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:28:25,778 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 28 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-15 06:28:30,078 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-15 06:29:01,367 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3050470.0, ans=0.125 2024-08-15 06:29:01,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3050470.0, ans=0.125 2024-08-15 06:29:13,581 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3050570.0, ans=0.1 2024-08-15 06:29:15,957 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 12 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-15 06:29:16,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3050570.0, ans=0.2 2024-08-15 06:29:36,223 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2024-08-15 06:29:38,403 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.809e+01 2.268e+01 2.485e+01 2.982e+01 6.162e+01, threshold=4.969e+01, percent-clipped=2.0 2024-08-15 06:29:41,998 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 750, loss[loss=0.09631, beats_loss=0.01022, ecapa_loss=0.0001486, whisper_loss=0.0846, over 17941.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01045, ecapa_loss=0.0001502, whisper_loss=0.09059, over 3770268.03 frames. ], batch size: 69, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:29:47,115 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.16 vs. limit=15.0 2024-08-15 06:29:54,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3050770.0, ans=0.125 2024-08-15 06:30:00,161 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 35 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-15 06:30:31,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3051070.0, ans=0.07 2024-08-15 06:30:32,628 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3051070.0, ans=0.0 2024-08-15 06:30:33,724 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 17 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-15 06:30:38,620 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 29 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-15 06:30:53,969 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3051170.0, ans=0.125 2024-08-15 06:30:55,424 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3051170.0, ans=0.125 2024-08-15 06:30:57,845 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 800, loss[loss=0.1044, beats_loss=0.01068, ecapa_loss=0.0001624, whisper_loss=0.09208, over 21666.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01043, ecapa_loss=0.0001499, whisper_loss=0.09054, over 3782194.25 frames. ], batch size: 91, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:32:09,029 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 19 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-15 06:32:11,968 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.795e+01 2.305e+01 2.508e+01 2.943e+01 4.012e+02, threshold=5.016e+01, percent-clipped=1.0 2024-08-15 06:32:14,928 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 850, loss[loss=0.1015, beats_loss=0.009702, ecapa_loss=0.0001517, whisper_loss=0.09025, over 19671.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01037, ecapa_loss=0.0001488, whisper_loss=0.09066, over 3806121.30 frames. ], batch size: 76, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:33:01,805 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 06:33:10,758 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 36 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-15 06:33:15,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=3052070.0, ans=0.5 2024-08-15 06:33:19,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3052170.0, ans=0.125 2024-08-15 06:33:21,010 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3052170.0, ans=0.0 2024-08-15 06:33:26,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3052170.0, ans=0.0 2024-08-15 06:33:30,991 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 17 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-15 06:33:33,835 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 900, loss[loss=0.1237, beats_loss=0.009578, ecapa_loss=0.0001432, whisper_loss=0.1127, over 17936.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0103, ecapa_loss=0.0001494, whisper_loss=0.09161, over 3837921.13 frames. ], batch size: 70, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:34:01,145 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3052370.0, ans=0.125 2024-08-15 06:34:01,521 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.36 vs. limit=15.0 2024-08-15 06:34:27,700 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 17 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-15 06:34:29,096 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 28 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-15 06:34:37,367 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3052670.0, ans=0.0 2024-08-15 06:34:47,337 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.883e+01 2.316e+01 2.568e+01 3.064e+01 1.106e+02, threshold=5.136e+01, percent-clipped=1.0 2024-08-15 06:34:50,223 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 950, loss[loss=0.0663, beats_loss=0.01129, ecapa_loss=0.0001474, whisper_loss=0.05354, over 20211.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01032, ecapa_loss=0.0001483, whisper_loss=0.09108, over 3872320.91 frames. ], batch size: 80, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:35:06,532 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 22 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-15 06:35:15,229 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3052870.0, ans=0.0 2024-08-15 06:35:16,542 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3052870.0, ans=0.0 2024-08-15 06:35:21,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3052970.0, ans=0.5 2024-08-15 06:35:29,064 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 30 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-15 06:35:46,027 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.17 vs. limit=15.0 2024-08-15 06:35:51,876 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 35 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-15 06:35:55,901 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.54 vs. limit=15.0 2024-08-15 06:36:01,175 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3053170.0, ans=0.035 2024-08-15 06:36:08,414 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 1000, loss[loss=0.08326, beats_loss=0.01168, ecapa_loss=0.0001232, whisper_loss=0.07034, over 16740.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01035, ecapa_loss=0.0001479, whisper_loss=0.09043, over 3868256.51 frames. ], batch size: 63, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:36:24,220 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3053370.0, ans=0.125 2024-08-15 06:36:24,643 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.82 vs. limit=15.0 2024-08-15 06:36:25,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3053370.0, ans=0.125 2024-08-15 06:36:50,882 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 6 from Vox, 38 fro AS 2024-08-15 06:37:14,143 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3053670.0, ans=0.05 2024-08-15 06:37:16,411 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.83 vs. limit=15.0 2024-08-15 06:37:18,366 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 33 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-15 06:37:18,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3053670.0, ans=0.2 2024-08-15 06:37:23,049 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.275e+01 2.548e+01 2.900e+01 4.496e+01, threshold=5.097e+01, percent-clipped=0.0 2024-08-15 06:37:26,139 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 1050, loss[loss=0.1202, beats_loss=0.01195, ecapa_loss=0.0001299, whisper_loss=0.1069, over 15161.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01036, ecapa_loss=0.0001474, whisper_loss=0.09053, over 3879142.89 frames. ], batch size: 57, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:37:41,385 WARNING [optim.py:496] (0/4) Scaling gradients by 0.08964061737060547, model_norm_threshold=50.96524429321289 2024-08-15 06:37:41,574 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.34, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.115e+05, grad_sumsq=1.116e+07, orig_rms_sq=9.994e-03 2024-08-15 06:37:58,406 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3053970.0, ans=0.5 2024-08-15 06:38:07,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3053970.0, ans=0.2 2024-08-15 06:38:17,778 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3054070.0, ans=0.125 2024-08-15 06:38:18,938 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-15 06:38:43,801 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 1100, loss[loss=0.124, beats_loss=0.01021, ecapa_loss=0.0001414, whisper_loss=0.1124, over 14739.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01038, ecapa_loss=0.0001475, whisper_loss=0.09008, over 3839991.13 frames. ], batch size: 56, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:38:51,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3054270.0, ans=0.1 2024-08-15 06:39:14,104 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 32 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-15 06:39:38,770 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.86 vs. limit=22.5 2024-08-15 06:39:57,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3054570.0, ans=0.0 2024-08-15 06:39:57,512 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.97 vs. limit=15.0 2024-08-15 06:40:04,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3054570.0, ans=0.1 2024-08-15 06:40:09,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3054570.0, ans=0.125 2024-08-15 06:40:12,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3054570.0, ans=0.125 2024-08-15 06:40:15,109 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.59 vs. limit=15.0 2024-08-15 06:40:24,766 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 25 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-15 06:40:32,061 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.929e+01 2.402e+01 2.652e+01 3.039e+01 5.686e+02, threshold=5.304e+01, percent-clipped=1.0 2024-08-15 06:40:34,935 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 1150, loss[loss=0.1093, beats_loss=0.009332, ecapa_loss=0.0001718, whisper_loss=0.09822, over 21241.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01031, ecapa_loss=0.0001483, whisper_loss=0.09118, over 3848600.87 frames. ], batch size: 86, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:40:35,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3054770.0, ans=0.1 2024-08-15 06:40:38,576 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 30 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-15 06:40:44,428 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 20 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-15 06:40:48,901 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3054870.0, ans=0.1 2024-08-15 06:41:23,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3055070.0, ans=0.125 2024-08-15 06:41:33,048 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.51 vs. limit=15.0 2024-08-15 06:41:33,526 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 18 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-15 06:42:03,250 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 1200, loss[loss=0.09165, beats_loss=0.009452, ecapa_loss=0.0001466, whisper_loss=0.08073, over 15097.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01037, ecapa_loss=0.0001485, whisper_loss=0.09034, over 3830991.40 frames. ], batch size: 55, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:42:08,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3055270.0, ans=0.0 2024-08-15 06:42:13,660 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3055270.0, ans=0.0 2024-08-15 06:42:18,603 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 06:42:31,465 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3055370.0, ans=0.1 2024-08-15 06:42:34,039 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.99 vs. limit=15.0 2024-08-15 06:42:40,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3055370.0, ans=0.2 2024-08-15 06:42:49,434 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 33 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-15 06:43:01,109 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3055470.0, ans=0.125 2024-08-15 06:43:26,275 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 19 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-15 06:43:26,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3055570.0, ans=0.125 2024-08-15 06:43:26,764 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.44 vs. limit=15.0 2024-08-15 06:43:36,984 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 18 from LS+wenet, 29 from Vox, 42 fro AS 2024-08-15 06:43:43,610 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.652e+01 2.260e+01 2.457e+01 2.910e+01 3.777e+01, threshold=4.914e+01, percent-clipped=0.0 2024-08-15 06:43:48,954 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 1250, loss[loss=0.0913, beats_loss=0.01187, ecapa_loss=0.0001399, whisper_loss=0.07803, over 17426.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01042, ecapa_loss=0.0001479, whisper_loss=0.08996, over 3817173.45 frames. ], batch size: 69, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:44:12,755 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 30 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-15 06:44:50,217 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3055970.0, ans=0.0 2024-08-15 06:45:02,259 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 18 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-15 06:45:20,588 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 19 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-15 06:45:28,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3056170.0, ans=0.125 2024-08-15 06:45:43,136 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 13 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-15 06:45:53,420 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 1300, loss[loss=0.08899, beats_loss=0.01005, ecapa_loss=0.0001719, whisper_loss=0.07722, over 16585.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01044, ecapa_loss=0.0001471, whisper_loss=0.0897, over 3796263.67 frames. ], batch size: 67, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:46:03,264 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3056270.0, ans=0.125 2024-08-15 06:46:27,257 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 12 from LS+wenet, 28 from Vox, 24 fro AS 2024-08-15 06:46:41,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3056470.0, ans=0.2 2024-08-15 06:46:42,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3056470.0, ans=0.125 2024-08-15 06:46:42,969 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3056470.0, ans=0.5 2024-08-15 06:46:42,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3056470.0, ans=0.1 2024-08-15 06:47:05,462 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 22 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-15 06:47:11,978 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 26 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-15 06:47:29,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3056670.0, ans=0.09899494936611666 2024-08-15 06:47:37,043 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 35 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-15 06:47:51,354 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.719e+01 2.277e+01 2.465e+01 2.867e+01 3.912e+01, threshold=4.931e+01, percent-clipped=0.0 2024-08-15 06:47:52,238 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.16 vs. limit=15.0 2024-08-15 06:47:55,989 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 1350, loss[loss=0.1371, beats_loss=0.004295, ecapa_loss=0.0001439, whisper_loss=0.1313, over 16651.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01044, ecapa_loss=0.0001472, whisper_loss=0.08953, over 3776953.62 frames. ], batch size: 57, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:48:43,827 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 14 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-15 06:49:48,986 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 1400, loss[loss=0.0966, beats_loss=0.01256, ecapa_loss=0.0001295, whisper_loss=0.08274, over 18615.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01041, ecapa_loss=0.0001475, whisper_loss=0.09009, over 3791431.44 frames. ], batch size: 75, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:49:51,682 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 24 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-15 06:50:37,691 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.68 vs. limit=15.0 2024-08-15 06:50:40,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3057570.0, ans=0.1 2024-08-15 06:50:50,207 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.00 vs. limit=15.0 2024-08-15 06:50:54,018 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 17 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-15 06:50:58,859 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 19 from LS+wenet, 8 from Vox, 27 fro AS 2024-08-15 06:51:08,871 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3057670.0, ans=0.1 2024-08-15 06:51:10,279 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-15 06:51:12,001 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3057670.0, ans=0.125 2024-08-15 06:51:13,444 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.822e+01 2.207e+01 2.496e+01 2.856e+01 4.886e+01, threshold=4.993e+01, percent-clipped=0.0 2024-08-15 06:51:56,681 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 1450, loss[loss=0.08964, beats_loss=0.01035, ecapa_loss=0.0001275, whisper_loss=0.07801, over 14916.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01044, ecapa_loss=0.0001481, whisper_loss=0.08953, over 3798289.19 frames. ], batch size: 57, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:51:59,821 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=13.71 vs. limit=15.0 2024-08-15 06:52:03,988 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 15 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-15 06:52:29,307 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3057870.0, ans=0.0 2024-08-15 06:52:41,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3057970.0, ans=0.125 2024-08-15 06:52:54,738 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3058070.0, ans=0.2 2024-08-15 06:52:59,383 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 27 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-15 06:52:59,729 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3058070.0, ans=0.05 2024-08-15 06:53:05,265 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 24 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-15 06:53:06,623 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 14 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-15 06:53:08,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3058170.0, ans=0.0 2024-08-15 06:53:19,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3058170.0, ans=0.125 2024-08-15 06:53:28,630 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 1500, loss[loss=0.1184, beats_loss=0.008092, ecapa_loss=0.0001954, whisper_loss=0.1084, over 16134.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01048, ecapa_loss=0.0001471, whisper_loss=0.08915, over 3787591.22 frames. ], batch size: 63, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:53:39,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3058270.0, ans=0.07 2024-08-15 06:53:49,895 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3058370.0, ans=0.1 2024-08-15 06:53:51,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3058370.0, ans=0.0 2024-08-15 06:54:09,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3058470.0, ans=0.125 2024-08-15 06:54:50,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3058670.0, ans=0.2 2024-08-15 06:54:58,679 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.699e+01 2.187e+01 2.410e+01 2.690e+01 4.725e+01, threshold=4.819e+01, percent-clipped=0.0 2024-08-15 06:55:02,232 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 1550, loss[loss=0.09809, beats_loss=0.009516, ecapa_loss=0.0001405, whisper_loss=0.08717, over 22152.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01055, ecapa_loss=0.0001467, whisper_loss=0.089, over 3780109.24 frames. ], batch size: 85, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:55:02,607 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 24 from LS+wenet, 10 from Vox, 30 fro AS 2024-08-15 06:55:46,920 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3058970.0, ans=0.125 2024-08-15 06:55:47,097 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3058970.0, ans=0.07 2024-08-15 06:56:11,505 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3059070.0, ans=0.125 2024-08-15 06:56:19,918 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3059170.0, ans=0.0 2024-08-15 06:56:25,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3059170.0, ans=0.1 2024-08-15 06:56:32,669 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 1600, loss[loss=0.09463, beats_loss=0.01264, ecapa_loss=0.0001171, whisper_loss=0.08082, over 21531.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.0106, ecapa_loss=0.0001458, whisper_loss=0.08874, over 3792569.43 frames. ], batch size: 84, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:56:37,055 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3059270.0, ans=0.0 2024-08-15 06:56:51,297 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-15 06:57:02,192 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=3059370.0, ans=0.2 2024-08-15 06:57:02,409 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=12.27 vs. limit=12.0 2024-08-15 06:57:07,039 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 27 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-15 06:57:23,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3059470.0, ans=0.0 2024-08-15 06:57:41,408 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3059570.0, ans=0.125 2024-08-15 06:57:53,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3059670.0, ans=0.125 2024-08-15 06:57:59,111 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.901e+01 2.263e+01 2.473e+01 2.776e+01 4.144e+01, threshold=4.945e+01, percent-clipped=0.0 2024-08-15 06:58:02,343 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 1650, loss[loss=0.1047, beats_loss=0.01171, ecapa_loss=0.0001181, whisper_loss=0.09185, over 22797.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01058, ecapa_loss=0.0001462, whisper_loss=0.0888, over 3806531.38 frames. ], batch size: 89, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:58:12,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3059770.0, ans=0.1 2024-08-15 06:58:12,934 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.70 vs. limit=15.0 2024-08-15 06:58:14,722 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.98 vs. limit=15.0 2024-08-15 06:58:18,626 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.79 vs. limit=12.0 2024-08-15 06:58:33,660 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3059870.0, ans=0.2 2024-08-15 06:58:39,013 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3059970.0, ans=0.125 2024-08-15 06:58:49,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3059970.0, ans=0.125 2024-08-15 06:58:50,937 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3059970.0, ans=0.125 2024-08-15 06:59:30,763 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 1700, loss[loss=0.09371, beats_loss=0.01155, ecapa_loss=0.0001356, whisper_loss=0.08081, over 14164.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01064, ecapa_loss=0.0001462, whisper_loss=0.08854, over 3809422.68 frames. ], batch size: 57, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:59:39,788 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-15 06:59:41,331 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 18 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-15 06:59:45,578 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3060270.0, ans=0.125 2024-08-15 07:00:12,231 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.45 vs. limit=15.0 2024-08-15 07:00:13,509 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 18 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-15 07:00:20,649 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 31 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-15 07:00:24,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3060570.0, ans=0.0 2024-08-15 07:00:43,556 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 25 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-15 07:00:51,274 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+01 2.353e+01 2.609e+01 2.862e+01 3.979e+01, threshold=5.218e+01, percent-clipped=0.0 2024-08-15 07:00:54,842 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 1750, loss[loss=0.1076, beats_loss=0.01027, ecapa_loss=0.000132, whisper_loss=0.09605, over 23660.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01054, ecapa_loss=0.0001463, whisper_loss=0.08912, over 3792132.23 frames. ], batch size: 92, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 07:01:01,877 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.37 vs. limit=15.0 2024-08-15 07:01:09,082 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.14 vs. limit=15.0 2024-08-15 07:01:15,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3060870.0, ans=0.2 2024-08-15 07:01:18,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3060870.0, ans=0.1 2024-08-15 07:01:19,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3060870.0, ans=0.125 2024-08-15 07:01:27,928 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 11 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-15 07:01:31,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3060970.0, ans=0.0 2024-08-15 07:01:40,569 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 17 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-15 07:01:45,532 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 19 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-15 07:01:48,455 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.69 vs. limit=15.0 2024-08-15 07:02:15,282 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 1800, loss[loss=0.09765, beats_loss=0.0123, ecapa_loss=0.0001293, whisper_loss=0.08406, over 20536.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01049, ecapa_loss=0.000147, whisper_loss=0.08922, over 3780931.02 frames. ], batch size: 79, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 07:02:23,796 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-15 07:02:23,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3061270.0, ans=0.1 2024-08-15 07:02:34,784 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 28 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-15 07:02:41,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3061370.0, ans=0.0 2024-08-15 07:02:52,581 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 24 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-15 07:03:15,884 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 37 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-15 07:03:21,533 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.22 vs. limit=15.0 2024-08-15 07:03:31,473 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.694e+01 2.281e+01 2.524e+01 2.715e+01 4.496e+01, threshold=5.048e+01, percent-clipped=0.0 2024-08-15 07:03:35,155 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 1850, loss[loss=0.08942, beats_loss=0.009948, ecapa_loss=0.0001402, whisper_loss=0.07807, over 23589.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01048, ecapa_loss=0.0001462, whisper_loss=0.08931, over 3790535.01 frames. ], batch size: 92, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 07:03:39,659 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 24 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-15 07:03:46,703 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-15 07:03:50,160 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 24 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-15 07:04:02,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3061870.0, ans=0.07 2024-08-15 07:04:26,213 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3062070.0, ans=0.0 2024-08-15 07:04:36,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3062070.0, ans=0.125 2024-08-15 07:04:55,910 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 1900, loss[loss=0.09562, beats_loss=0.01097, ecapa_loss=0.0001548, whisper_loss=0.0831, over 20362.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01052, ecapa_loss=0.0001452, whisper_loss=0.08957, over 3799261.17 frames. ], batch size: 80, lr: 2.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 07:05:08,695 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.00 vs. limit=10.0 2024-08-15 07:05:14,007 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 22 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-15 07:05:15,785 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 24 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-15 07:05:22,463 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-15 07:05:28,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3062470.0, ans=0.125 2024-08-15 07:05:32,138 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.03 vs. limit=22.5 2024-08-15 07:05:48,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3062570.0, ans=0.0 2024-08-15 07:05:49,368 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 18 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-15 07:05:57,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3062570.0, ans=0.1 2024-08-15 07:06:07,239 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3062670.0, ans=0.05 2024-08-15 07:06:07,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3062670.0, ans=0.95 2024-08-15 07:06:10,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3062670.0, ans=0.0 2024-08-15 07:06:12,427 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.752e+01 2.324e+01 2.578e+01 2.950e+01 1.570e+02, threshold=5.156e+01, percent-clipped=1.0 2024-08-15 07:06:15,753 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 1950, loss[loss=0.1185, beats_loss=0.009596, ecapa_loss=0.0001643, whisper_loss=0.1073, over 17766.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01048, ecapa_loss=0.0001457, whisper_loss=0.09056, over 3799269.39 frames. ], batch size: 71, lr: 2.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 07:06:16,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3062770.0, ans=0.0 2024-08-15 07:06:17,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3062770.0, ans=0.125 2024-08-15 07:06:33,407 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.81 vs. limit=15.0 2024-08-15 07:06:34,685 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-15 07:06:42,594 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3062870.0, ans=0.125 2024-08-15 07:06:44,084 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 26 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-15 07:07:21,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3063170.0, ans=0.0 2024-08-15 07:07:32,482 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.77 vs. limit=12.0 2024-08-15 07:07:35,135 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 2000, loss[loss=0.1179, beats_loss=0.009198, ecapa_loss=0.0002024, whisper_loss=0.1067, over 19102.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01048, ecapa_loss=0.0001449, whisper_loss=0.09091, over 3791642.15 frames. ], batch size: 79, lr: 2.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 07:07:35,724 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3063270.0, ans=0.0 2024-08-15 07:07:35,801 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3063270.0, ans=0.0 2024-08-15 07:07:56,406 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3063370.0, ans=0.125 2024-08-15 07:08:51,031 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.304e+01 2.556e+01 2.865e+01 6.565e+01, threshold=5.113e+01, percent-clipped=1.0 2024-08-15 07:08:54,463 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 2050, loss[loss=0.1091, beats_loss=0.01141, ecapa_loss=0.0001198, whisper_loss=0.09655, over 19954.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01045, ecapa_loss=0.0001461, whisper_loss=0.09048, over 3796402.88 frames. ], batch size: 79, lr: 2.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 07:09:13,970 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-15 07:09:21,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3063870.0, ans=0.0 2024-08-15 07:09:25,570 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.97 vs. limit=12.0 2024-08-15 07:09:43,756 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.67 vs. limit=22.5 2024-08-15 07:10:10,246 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 26 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-15 07:10:12,796 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 2100, loss[loss=0.08178, beats_loss=0.01068, ecapa_loss=0.0001678, whisper_loss=0.06942, over 14177.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01057, ecapa_loss=0.0001451, whisper_loss=0.08974, over 3772750.08 frames. ], batch size: 59, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:10:20,149 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.11 vs. limit=22.5 2024-08-15 07:10:24,409 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 26 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-15 07:10:27,627 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.90 vs. limit=15.0 2024-08-15 07:10:40,035 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 23 from LS+wenet, 31 from Vox, 40 fro AS 2024-08-15 07:10:44,814 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 07:10:49,859 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.05 vs. limit=15.0 2024-08-15 07:11:25,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3064670.0, ans=0.125 2024-08-15 07:11:28,745 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.926e+01 2.342e+01 2.621e+01 2.964e+01 3.863e+02, threshold=5.241e+01, percent-clipped=3.0 2024-08-15 07:11:31,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3064770.0, ans=0.125 2024-08-15 07:11:32,856 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 2150, loss[loss=0.1001, beats_loss=0.007129, ecapa_loss=0.0001381, whisper_loss=0.09163, over 14119.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01074, ecapa_loss=0.0001454, whisper_loss=0.08905, over 3788838.82 frames. ], batch size: 54, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:12:00,474 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3064870.0, ans=0.125 2024-08-15 07:12:18,238 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 25 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-15 07:12:21,903 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3065070.0, ans=0.125 2024-08-15 07:12:23,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3065070.0, ans=0.05 2024-08-15 07:12:24,735 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3065070.0, ans=0.2 2024-08-15 07:12:33,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3065070.0, ans=0.125 2024-08-15 07:12:33,328 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 07:12:46,333 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.65 vs. limit=15.0 2024-08-15 07:12:47,829 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 23 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-15 07:12:53,951 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 2200, loss[loss=0.06977, beats_loss=0.01331, ecapa_loss=0.0001106, whisper_loss=0.05535, over 14278.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01072, ecapa_loss=0.0001454, whisper_loss=0.08975, over 3803086.93 frames. ], batch size: 54, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:13:02,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3065270.0, ans=0.1 2024-08-15 07:13:18,513 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.95 vs. limit=6.0 2024-08-15 07:13:26,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3065470.0, ans=0.125 2024-08-15 07:13:29,620 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.51 vs. limit=15.0 2024-08-15 07:13:46,344 INFO [train_multi_KD3.py:844] (0/4) A total of 96 cuts. 30 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-15 07:13:46,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3065570.0, ans=0.2 2024-08-15 07:13:54,575 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.56 vs. limit=15.0 2024-08-15 07:13:59,384 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 18 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-15 07:14:00,740 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 27 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-15 07:14:09,519 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.737e+01 2.287e+01 2.531e+01 2.773e+01 4.088e+01, threshold=5.062e+01, percent-clipped=0.0 2024-08-15 07:14:11,823 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 18 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-15 07:14:12,886 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 2250, loss[loss=0.09965, beats_loss=0.009072, ecapa_loss=0.0001322, whisper_loss=0.08926, over 15394.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01067, ecapa_loss=0.000146, whisper_loss=0.09007, over 3863859.02 frames. ], batch size: 58, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:14:21,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3065770.0, ans=0.125 2024-08-15 07:14:46,416 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.08 vs. limit=15.0 2024-08-15 07:15:01,477 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-15 07:15:12,962 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 35 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-15 07:15:13,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3066070.0, ans=0.0 2024-08-15 07:15:16,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3066170.0, ans=0.0 2024-08-15 07:15:29,138 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.18 vs. limit=15.0 2024-08-15 07:15:31,848 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3066170.0, ans=0.2 2024-08-15 07:15:34,193 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 2300, loss[loss=0.097, beats_loss=0.01195, ecapa_loss=0.000131, whisper_loss=0.08374, over 22280.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01064, ecapa_loss=0.0001466, whisper_loss=0.09104, over 3903657.85 frames. ], batch size: 88, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:15:41,381 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.76 vs. limit=22.5 2024-08-15 07:15:47,812 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 22 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-15 07:15:56,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3066370.0, ans=0.125 2024-08-15 07:16:03,737 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3066370.0, ans=0.125 2024-08-15 07:16:33,438 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.56 vs. limit=10.0 2024-08-15 07:16:46,437 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-15 07:16:50,541 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.738e+01 2.352e+01 2.574e+01 2.900e+01 4.946e+01, threshold=5.147e+01, percent-clipped=0.0 2024-08-15 07:16:53,864 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 2350, loss[loss=0.1089, beats_loss=0.01003, ecapa_loss=0.0001617, whisper_loss=0.09726, over 15354.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01065, ecapa_loss=0.0001471, whisper_loss=0.09049, over 3884597.62 frames. ], batch size: 61, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:17:10,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3066870.0, ans=0.125 2024-08-15 07:17:21,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3066870.0, ans=0.05 2024-08-15 07:17:23,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3066870.0, ans=0.09899494936611666 2024-08-15 07:17:31,525 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 19 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-15 07:17:44,098 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3067070.0, ans=0.125 2024-08-15 07:17:57,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3067170.0, ans=0.0 2024-08-15 07:18:12,594 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 2400, loss[loss=0.1183, beats_loss=0.01027, ecapa_loss=0.0001361, whisper_loss=0.1066, over 22633.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01052, ecapa_loss=0.0001476, whisper_loss=0.09078, over 3883700.37 frames. ], batch size: 87, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:18:19,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3067270.0, ans=0.125 2024-08-15 07:18:23,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3067270.0, ans=0.125 2024-08-15 07:18:27,691 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 25 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-15 07:18:43,519 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3067470.0, ans=0.1 2024-08-15 07:19:10,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3067570.0, ans=0.2 2024-08-15 07:19:25,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3067670.0, ans=0.2 2024-08-15 07:19:25,671 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3067670.0, ans=0.0 2024-08-15 07:19:26,460 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.781e+01 2.236e+01 2.451e+01 2.781e+01 1.373e+02, threshold=4.902e+01, percent-clipped=2.0 2024-08-15 07:19:27,227 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3067670.0, ans=0.125 2024-08-15 07:19:28,952 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3067770.0, ans=0.0 2024-08-15 07:19:29,698 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 2450, loss[loss=0.1093, beats_loss=0.008258, ecapa_loss=0.0001439, whisper_loss=0.09956, over 16757.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01055, ecapa_loss=0.0001477, whisper_loss=0.09012, over 3915075.29 frames. ], batch size: 63, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:19:33,431 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 21 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-15 07:19:35,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3067770.0, ans=0.125 2024-08-15 07:19:41,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3067770.0, ans=0.0 2024-08-15 07:19:44,919 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-15 07:20:34,461 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 19 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-15 07:20:41,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3068170.0, ans=0.2 2024-08-15 07:20:44,242 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3068170.0, ans=0.04949747468305833 2024-08-15 07:20:45,809 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 31 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-15 07:20:46,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3068170.0, ans=0.0 2024-08-15 07:20:48,721 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 2500, loss[loss=0.1083, beats_loss=0.01087, ecapa_loss=0.0001572, whisper_loss=0.09589, over 19898.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0105, ecapa_loss=0.0001477, whisper_loss=0.0911, over 3912527.27 frames. ], batch size: 82, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:20:50,935 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3068270.0, ans=0.125 2024-08-15 07:21:12,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3068370.0, ans=0.0 2024-08-15 07:21:22,387 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 22 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-15 07:21:23,877 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 33 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-15 07:21:27,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3068470.0, ans=0.0 2024-08-15 07:21:31,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3068470.0, ans=0.1 2024-08-15 07:21:34,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3068570.0, ans=0.125 2024-08-15 07:21:49,303 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 25 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-15 07:22:01,495 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 21 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-15 07:22:04,496 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.249e+01 2.498e+01 2.918e+01 4.518e+01, threshold=4.995e+01, percent-clipped=0.0 2024-08-15 07:22:07,192 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 2550, loss[loss=0.1034, beats_loss=0.01179, ecapa_loss=0.0001626, whisper_loss=0.08998, over 21397.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01056, ecapa_loss=0.000148, whisper_loss=0.09113, over 3900138.18 frames. ], batch size: 91, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:22:19,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3068770.0, ans=0.125 2024-08-15 07:22:27,692 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-15 07:22:58,375 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.93 vs. limit=22.5 2024-08-15 07:23:01,041 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.80 vs. limit=12.0 2024-08-15 07:23:03,711 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2024-08-15 07:23:05,156 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 07:23:06,774 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=6.108e+00 2024-08-15 07:23:21,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3069170.0, ans=0.125 2024-08-15 07:23:25,417 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 2600, loss[loss=0.1156, beats_loss=0.00873, ecapa_loss=0.0001675, whisper_loss=0.1052, over 19177.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01056, ecapa_loss=0.0001492, whisper_loss=0.09122, over 3873564.63 frames. ], batch size: 75, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:23:29,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3069270.0, ans=0.0 2024-08-15 07:23:43,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3069370.0, ans=0.125 2024-08-15 07:23:49,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3069370.0, ans=0.1 2024-08-15 07:24:05,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3069470.0, ans=0.125 2024-08-15 07:24:08,486 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 25 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-15 07:24:08,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3069470.0, ans=0.125 2024-08-15 07:24:13,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3069570.0, ans=0.125 2024-08-15 07:24:40,332 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.949e+01 2.364e+01 2.551e+01 2.920e+01 2.244e+02, threshold=5.103e+01, percent-clipped=2.0 2024-08-15 07:24:43,304 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 2650, loss[loss=0.1082, beats_loss=0.008793, ecapa_loss=0.0001881, whisper_loss=0.09755, over 21060.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01059, ecapa_loss=0.000149, whisper_loss=0.09112, over 3881221.13 frames. ], batch size: 88, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:24:52,369 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.96 vs. limit=15.0 2024-08-15 07:25:14,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3069970.0, ans=0.125 2024-08-15 07:25:24,069 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.07 vs. limit=22.5 2024-08-15 07:25:42,762 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3070070.0, ans=0.125 2024-08-15 07:25:42,892 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.74 vs. limit=15.0 2024-08-15 07:25:47,585 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3070170.0, ans=0.0 2024-08-15 07:25:48,693 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 32 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-15 07:26:00,949 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-15 07:26:02,131 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 2700, loss[loss=0.1071, beats_loss=0.00888, ecapa_loss=0.0001776, whisper_loss=0.09645, over 21241.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01059, ecapa_loss=0.0001495, whisper_loss=0.09076, over 3880228.24 frames. ], batch size: 90, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:26:15,814 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3070270.0, ans=0.1 2024-08-15 07:26:15,916 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3070270.0, ans=0.0 2024-08-15 07:26:26,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3070370.0, ans=0.0 2024-08-15 07:26:29,556 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3070370.0, ans=0.0 2024-08-15 07:26:36,341 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3070470.0, ans=0.0 2024-08-15 07:26:59,600 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3070570.0, ans=0.2 2024-08-15 07:27:01,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3070570.0, ans=0.0 2024-08-15 07:27:03,019 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-15 07:27:17,842 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.813e+01 2.236e+01 2.522e+01 2.732e+01 2.329e+02, threshold=5.045e+01, percent-clipped=1.0 2024-08-15 07:27:21,396 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 2750, loss[loss=0.1107, beats_loss=0.008764, ecapa_loss=0.0001722, whisper_loss=0.1002, over 22113.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01059, ecapa_loss=0.0001486, whisper_loss=0.09049, over 3858587.27 frames. ], batch size: 87, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:27:21,551 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 28 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-15 07:27:32,247 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 22 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-15 07:27:32,603 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3070770.0, ans=0.2 2024-08-15 07:27:33,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3070770.0, ans=0.1 2024-08-15 07:27:40,658 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 27 from LS+wenet, 25 from Vox, 22 fro AS 2024-08-15 07:28:29,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3071170.0, ans=0.2 2024-08-15 07:28:29,830 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.12 vs. limit=15.0 2024-08-15 07:28:39,945 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 2800, loss[loss=0.09963, beats_loss=0.01338, ecapa_loss=0.0001324, whisper_loss=0.08493, over 15843.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01064, ecapa_loss=0.0001481, whisper_loss=0.0906, over 3855309.36 frames. ], batch size: 63, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:28:41,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3071270.0, ans=0.0 2024-08-15 07:28:49,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3071270.0, ans=0.0 2024-08-15 07:28:54,239 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 16 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-15 07:28:55,874 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 27 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-15 07:29:01,512 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3071370.0, ans=0.125 2024-08-15 07:29:09,575 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3071370.0, ans=0.1 2024-08-15 07:29:34,429 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3071570.0, ans=0.125 2024-08-15 07:29:35,545 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 18 from LS+wenet, 9 from Vox, 31 fro AS 2024-08-15 07:29:37,254 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3071570.0, ans=0.2 2024-08-15 07:29:47,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3071670.0, ans=0.5 2024-08-15 07:29:51,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3071670.0, ans=0.125 2024-08-15 07:29:52,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3071670.0, ans=0.125 2024-08-15 07:29:57,126 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.358e+01 2.558e+01 2.902e+01 4.968e+01, threshold=5.116e+01, percent-clipped=0.0 2024-08-15 07:29:59,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3071770.0, ans=0.125 2024-08-15 07:30:00,191 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 2850, loss[loss=0.08992, beats_loss=0.01008, ecapa_loss=0.0001402, whisper_loss=0.07844, over 14725.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01077, ecapa_loss=0.0001466, whisper_loss=0.0901, over 3852262.51 frames. ], batch size: 58, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:30:13,117 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3071770.0, ans=0.125 2024-08-15 07:30:20,647 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 25 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-15 07:30:26,495 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3071870.0, ans=0.125 2024-08-15 07:30:42,111 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 18 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-15 07:30:58,995 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.07 vs. limit=15.0 2024-08-15 07:31:06,074 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 15 from Vox, 47 fro AS 2024-08-15 07:31:09,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3072170.0, ans=0.1 2024-08-15 07:31:12,785 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.11 vs. limit=22.5 2024-08-15 07:31:18,315 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 2900, loss[loss=0.1103, beats_loss=0.009919, ecapa_loss=0.0001255, whisper_loss=0.09916, over 15172.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01081, ecapa_loss=0.0001474, whisper_loss=0.08983, over 3863713.21 frames. ], batch size: 57, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:31:44,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3072370.0, ans=0.2 2024-08-15 07:32:13,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3072570.0, ans=0.0 2024-08-15 07:32:21,619 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-15 07:32:23,074 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3072670.0, ans=0.125 2024-08-15 07:32:24,792 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.91 vs. limit=15.0 2024-08-15 07:32:28,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3072670.0, ans=0.125 2024-08-15 07:32:32,618 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.806e+01 2.406e+01 2.599e+01 2.781e+01 4.544e+01, threshold=5.197e+01, percent-clipped=0.0 2024-08-15 07:32:35,987 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 2950, loss[loss=0.09657, beats_loss=0.00998, ecapa_loss=0.0001541, whisper_loss=0.08505, over 21624.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01079, ecapa_loss=0.0001484, whisper_loss=0.08955, over 3842966.40 frames. ], batch size: 87, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:32:39,638 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.99 vs. limit=12.0 2024-08-15 07:32:47,698 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.58 vs. limit=10.0 2024-08-15 07:32:51,222 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 29 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-15 07:32:51,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3072870.0, ans=0.1 2024-08-15 07:32:57,797 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.92 vs. limit=15.0 2024-08-15 07:32:59,103 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.68 vs. limit=10.0 2024-08-15 07:33:00,696 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.79 vs. limit=15.0 2024-08-15 07:33:09,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3072970.0, ans=0.125 2024-08-15 07:33:26,777 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=3073070.0, ans=15.0 2024-08-15 07:33:41,081 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 13 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-15 07:33:48,964 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 3000, loss[loss=0.1152, beats_loss=0.008742, ecapa_loss=0.0001633, whisper_loss=0.1048, over 22517.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01075, ecapa_loss=0.0001477, whisper_loss=0.08956, over 3864711.52 frames. ], batch size: 91, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:33:48,965 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-15 07:34:30,924 INFO [train_multi_KD3.py:1149] (0/4) Epoch 22, validation on ASR_libri: loss=0.2522, beats_loss=0, ecapa_loss=0.0005255, whisper_loss=0.2469, over 922467.00 frames. 2024-08-15 07:34:46,550 INFO [train_multi_KD3.py:1149] (0/4) Epoch 22, validation on SV_voxceleb1: loss=0.004113, beats_loss=0, ecapa_loss=0.0004113, whisper_loss=0, over 939242.00 frames. 2024-08-15 07:36:48,572 INFO [train_multi_KD3.py:1149] (0/4) Epoch 22, validation on AT_audioset: loss=0.02334, beats_loss=0.02334, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 07:36:48,577 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32601MB 2024-08-15 07:37:25,281 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 34 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-15 07:37:30,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3073470.0, ans=0.125 2024-08-15 07:37:57,201 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 27 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-15 07:37:59,794 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.341e+01 2.586e+01 2.957e+01 4.096e+01, threshold=5.172e+01, percent-clipped=0.0 2024-08-15 07:38:02,532 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 3050, loss[loss=0.11, beats_loss=0.008367, ecapa_loss=0.000138, whisper_loss=0.1003, over 14961.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01069, ecapa_loss=0.0001479, whisper_loss=0.09079, over 3880079.83 frames. ], batch size: 54, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:38:11,962 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3073770.0, ans=0.0 2024-08-15 07:38:29,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3073870.0, ans=0.125 2024-08-15 07:38:31,797 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 16 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-15 07:38:32,168 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3073970.0, ans=0.05 2024-08-15 07:39:01,452 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3074170.0, ans=0.125 2024-08-15 07:39:14,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3074270.0, ans=0.5 2024-08-15 07:39:15,047 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 3100, loss[loss=0.08959, beats_loss=0.01157, ecapa_loss=0.0001523, whisper_loss=0.0765, over 18542.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01074, ecapa_loss=0.0001495, whisper_loss=0.09024, over 3853822.77 frames. ], batch size: 75, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:39:17,873 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 26 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-15 07:39:21,046 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3074270.0, ans=0.1 2024-08-15 07:39:26,382 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 12 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-15 07:39:34,091 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-15 07:39:41,013 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 23 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-15 07:39:52,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3074470.0, ans=0.1 2024-08-15 07:40:01,227 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3074570.0, ans=0.0 2024-08-15 07:40:23,595 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.721e+01 2.264e+01 2.554e+01 2.842e+01 4.812e+01, threshold=5.107e+01, percent-clipped=0.0 2024-08-15 07:40:26,734 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 3150, loss[loss=0.1238, beats_loss=0.007563, ecapa_loss=0.0001811, whisper_loss=0.1144, over 21884.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01071, ecapa_loss=0.0001506, whisper_loss=0.09112, over 3867153.91 frames. ], batch size: 89, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:40:43,209 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3074870.0, ans=0.125 2024-08-15 07:40:48,929 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 23 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-15 07:40:53,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3074870.0, ans=0.2 2024-08-15 07:41:18,757 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3075070.0, ans=0.0 2024-08-15 07:41:24,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3075170.0, ans=0.0 2024-08-15 07:41:32,810 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 26 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-15 07:41:35,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3075170.0, ans=0.125 2024-08-15 07:41:39,212 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.88 vs. limit=6.0 2024-08-15 07:41:39,640 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 3200, loss[loss=0.1025, beats_loss=0.01289, ecapa_loss=0.000131, whisper_loss=0.08825, over 19051.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.0107, ecapa_loss=0.0001494, whisper_loss=0.09165, over 3878847.14 frames. ], batch size: 74, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:42:00,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3075370.0, ans=0.125 2024-08-15 07:42:07,811 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 12 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-15 07:42:15,724 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.62 vs. limit=22.5 2024-08-15 07:42:22,242 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 07:42:22,253 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3075570.0, ans=0.0 2024-08-15 07:42:25,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3075570.0, ans=0.125 2024-08-15 07:42:26,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3075570.0, ans=0.125 2024-08-15 07:42:26,912 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.72 vs. limit=15.0 2024-08-15 07:42:38,002 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3075670.0, ans=0.2 2024-08-15 07:42:47,874 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 16 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-15 07:42:48,805 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.795e+01 2.323e+01 2.639e+01 2.854e+01 4.930e+01, threshold=5.277e+01, percent-clipped=0.0 2024-08-15 07:42:51,935 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 3250, loss[loss=0.08916, beats_loss=0.01143, ecapa_loss=0.0001609, whisper_loss=0.07612, over 17970.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01063, ecapa_loss=0.0001497, whisper_loss=0.0922, over 3915033.27 frames. ], batch size: 73, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:42:57,026 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.25 vs. limit=15.0 2024-08-15 07:43:11,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3075870.0, ans=0.0 2024-08-15 07:43:52,278 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.60 vs. limit=15.0 2024-08-15 07:43:52,996 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-15 07:43:59,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3076170.0, ans=0.0 2024-08-15 07:44:01,708 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 3300, loss[loss=0.1482, beats_loss=0.008226, ecapa_loss=0.000125, whisper_loss=0.1388, over 20157.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01064, ecapa_loss=0.0001505, whisper_loss=0.09171, over 3897466.13 frames. ], batch size: 72, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:44:10,878 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 21 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-15 07:44:11,129 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3076270.0, ans=0.125 2024-08-15 07:45:02,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3076670.0, ans=0.125 2024-08-15 07:45:03,082 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 21 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-15 07:45:03,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3076670.0, ans=0.125 2024-08-15 07:45:11,267 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.736e+01 2.361e+01 2.617e+01 2.908e+01 9.847e+01, threshold=5.233e+01, percent-clipped=1.0 2024-08-15 07:45:13,920 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 3350, loss[loss=0.1021, beats_loss=0.009328, ecapa_loss=0.0001542, whisper_loss=0.09126, over 15167.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01057, ecapa_loss=0.0001513, whisper_loss=0.09146, over 3892363.02 frames. ], batch size: 62, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:45:28,611 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 25 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-15 07:45:31,517 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3076870.0, ans=0.1 2024-08-15 07:45:40,281 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.83 vs. limit=15.0 2024-08-15 07:46:19,505 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-15 07:46:24,647 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 3400, loss[loss=0.08952, beats_loss=0.01174, ecapa_loss=0.0001341, whisper_loss=0.07643, over 22149.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01064, ecapa_loss=0.00015, whisper_loss=0.09054, over 3906499.32 frames. ], batch size: 86, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:46:37,573 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.27 vs. limit=15.0 2024-08-15 07:46:52,782 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-15 07:46:56,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3077470.0, ans=0.0 2024-08-15 07:46:56,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3077470.0, ans=0.0 2024-08-15 07:47:04,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3077470.0, ans=0.0 2024-08-15 07:47:06,225 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 25 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-15 07:47:32,384 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.822e+01 2.325e+01 2.575e+01 2.904e+01 4.420e+01, threshold=5.150e+01, percent-clipped=0.0 2024-08-15 07:47:35,148 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 3450, loss[loss=0.07641, beats_loss=0.01348, ecapa_loss=0.0001502, whisper_loss=0.06142, over 18836.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01068, ecapa_loss=0.0001496, whisper_loss=0.08966, over 3880707.87 frames. ], batch size: 78, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:48:14,994 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.75 vs. limit=10.0 2024-08-15 07:48:16,255 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.86 vs. limit=15.0 2024-08-15 07:48:20,517 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3078070.0, ans=0.0 2024-08-15 07:48:47,956 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 26 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-15 07:48:48,204 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3078270.0, ans=0.0 2024-08-15 07:48:49,220 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 3500, loss[loss=0.1163, beats_loss=0.007609, ecapa_loss=0.0001703, whisper_loss=0.107, over 17683.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01065, ecapa_loss=0.0001505, whisper_loss=0.08966, over 3873844.24 frames. ], batch size: 69, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:49:03,810 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3078370.0, ans=0.125 2024-08-15 07:49:27,709 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 24 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-15 07:49:33,539 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 24 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-15 07:49:43,543 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3078570.0, ans=0.0 2024-08-15 07:49:57,450 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.884e+01 2.337e+01 2.595e+01 2.865e+01 3.542e+01, threshold=5.191e+01, percent-clipped=0.0 2024-08-15 07:49:59,969 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 3550, loss[loss=0.1155, beats_loss=0.009439, ecapa_loss=0.0001379, whisper_loss=0.1047, over 23465.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0106, ecapa_loss=0.0001495, whisper_loss=0.09004, over 3895036.77 frames. ], batch size: 90, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:50:18,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3078870.0, ans=0.0 2024-08-15 07:50:23,896 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 22 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-15 07:50:29,260 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.90 vs. limit=15.0 2024-08-15 07:50:37,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3078970.0, ans=0.125 2024-08-15 07:50:45,308 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.87 vs. limit=15.0 2024-08-15 07:50:45,743 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-15 07:50:53,476 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 24 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-15 07:51:12,197 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 3600, loss[loss=0.08813, beats_loss=0.01215, ecapa_loss=0.0001044, whisper_loss=0.07493, over 15749.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01063, ecapa_loss=0.0001489, whisper_loss=0.09005, over 3879229.35 frames. ], batch size: 59, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:51:33,800 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 24 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-15 07:51:51,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3079470.0, ans=0.2 2024-08-15 07:51:52,871 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 16 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-15 07:52:12,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3079670.0, ans=0.125 2024-08-15 07:52:21,679 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.633e+01 2.253e+01 2.466e+01 2.769e+01 4.270e+01, threshold=4.932e+01, percent-clipped=0.0 2024-08-15 07:52:24,726 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 3650, loss[loss=0.1354, beats_loss=0.00863, ecapa_loss=0.000127, whisper_loss=0.1255, over 24218.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01056, ecapa_loss=0.0001489, whisper_loss=0.091, over 3871574.11 frames. ], batch size: 89, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:52:27,974 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3079770.0, ans=0.0 2024-08-15 07:52:31,911 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3079770.0, ans=0.125 2024-08-15 07:52:43,631 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=3079870.0, ans=15.0 2024-08-15 07:52:45,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3079870.0, ans=0.125 2024-08-15 07:52:55,725 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-308000.pt 2024-08-15 07:53:00,797 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 31 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-15 07:53:10,597 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 30 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-15 07:53:12,147 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-15 07:53:33,590 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.13 vs. limit=15.0 2024-08-15 07:53:38,190 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 3700, loss[loss=0.0942, beats_loss=0.01285, ecapa_loss=0.0001439, whisper_loss=0.07991, over 22853.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01062, ecapa_loss=0.0001483, whisper_loss=0.09074, over 3866869.82 frames. ], batch size: 93, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:53:58,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3080370.0, ans=0.125 2024-08-15 07:54:01,653 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-15 07:54:01,925 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3080370.0, ans=0.125 2024-08-15 07:54:07,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3080470.0, ans=0.025 2024-08-15 07:54:27,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3080570.0, ans=0.0 2024-08-15 07:54:39,899 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 23 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-15 07:54:45,095 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.763e+01 2.379e+01 2.603e+01 2.924e+01 1.234e+02, threshold=5.207e+01, percent-clipped=1.0 2024-08-15 07:54:47,834 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 3750, loss[loss=0.1053, beats_loss=0.01048, ecapa_loss=0.0001533, whisper_loss=0.09328, over 22458.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01062, ecapa_loss=0.0001491, whisper_loss=0.09033, over 3862077.35 frames. ], batch size: 92, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:55:00,537 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3080870.0, ans=0.125 2024-08-15 07:55:33,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3081070.0, ans=0.125 2024-08-15 07:55:40,549 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 21 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-15 07:55:42,120 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 26 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-15 07:55:56,979 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 3800, loss[loss=0.0882, beats_loss=0.01432, ecapa_loss=0.0001046, whisper_loss=0.07283, over 24455.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01063, ecapa_loss=0.0001503, whisper_loss=0.09005, over 3837636.54 frames. ], batch size: 94, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:55:57,447 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.37 vs. limit=15.0 2024-08-15 07:55:59,912 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 18 from LS+wenet, 26 from Vox, 25 fro AS 2024-08-15 07:56:03,257 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3081270.0, ans=0.0 2024-08-15 07:56:31,821 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3081470.0, ans=0.125 2024-08-15 07:56:38,575 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.40 vs. limit=12.0 2024-08-15 07:56:48,125 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 20 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-15 07:56:48,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3081570.0, ans=0.0 2024-08-15 07:56:53,745 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3081670.0, ans=0.2 2024-08-15 07:56:59,255 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.55 vs. limit=22.5 2024-08-15 07:57:00,062 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 31 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-15 07:57:00,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3081670.0, ans=0.0 2024-08-15 07:57:02,310 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.893e+01 2.325e+01 2.607e+01 2.959e+01 1.127e+02, threshold=5.215e+01, percent-clipped=1.0 2024-08-15 07:57:05,430 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 3850, loss[loss=0.1099, beats_loss=0.009775, ecapa_loss=0.0001257, whisper_loss=0.09886, over 15322.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01061, ecapa_loss=0.0001511, whisper_loss=0.08979, over 3846573.00 frames. ], batch size: 56, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:57:11,636 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.74 vs. limit=22.5 2024-08-15 07:57:21,850 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 36 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-15 07:57:40,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3081970.0, ans=0.1 2024-08-15 07:57:52,091 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.85 vs. limit=5.0 2024-08-15 07:58:00,870 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-15 07:58:04,956 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3082170.0, ans=0.1 2024-08-15 07:58:09,600 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.58 vs. limit=15.0 2024-08-15 07:58:14,039 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 3900, loss[loss=0.1166, beats_loss=0.01173, ecapa_loss=0.0001292, whisper_loss=0.1035, over 19589.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01057, ecapa_loss=0.0001505, whisper_loss=0.09078, over 3866225.94 frames. ], batch size: 76, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:58:27,650 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 18 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-15 07:58:29,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3082370.0, ans=0.2 2024-08-15 07:58:45,615 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.34 vs. limit=15.0 2024-08-15 07:59:01,952 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.76 vs. limit=15.0 2024-08-15 07:59:02,849 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 26 from LS+wenet, 12 from Vox, 37 fro AS 2024-08-15 07:59:05,856 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.74 vs. limit=15.0 2024-08-15 07:59:10,253 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=3082670.0, ans=15.0 2024-08-15 07:59:11,013 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 27 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-15 07:59:19,092 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.886e+01 2.371e+01 2.557e+01 2.985e+01 4.331e+01, threshold=5.113e+01, percent-clipped=0.0 2024-08-15 07:59:19,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3082670.0, ans=0.125 2024-08-15 07:59:22,076 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 3950, loss[loss=0.1196, beats_loss=0.009508, ecapa_loss=0.0001591, whisper_loss=0.1085, over 22659.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01053, ecapa_loss=0.0001506, whisper_loss=0.09166, over 3879575.16 frames. ], batch size: 92, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:59:54,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3082970.0, ans=0.0 2024-08-15 08:00:00,701 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.61 vs. limit=15.0 2024-08-15 08:00:20,942 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-15 08:00:22,153 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-15 08:00:29,830 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.82 vs. limit=22.5 2024-08-15 08:00:31,775 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 4000, loss[loss=0.1219, beats_loss=0.008798, ecapa_loss=0.0001544, whisper_loss=0.1116, over 22798.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01054, ecapa_loss=0.0001502, whisper_loss=0.09168, over 3874739.31 frames. ], batch size: 88, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:00:44,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3083370.0, ans=0.125 2024-08-15 08:00:47,322 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 21 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-15 08:00:50,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3083370.0, ans=0.125 2024-08-15 08:00:52,790 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 22 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-15 08:01:15,191 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 24 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-15 08:01:17,353 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.17 vs. limit=15.0 2024-08-15 08:01:29,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3083670.0, ans=0.125 2024-08-15 08:01:29,697 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3083670.0, ans=0.125 2024-08-15 08:01:31,218 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3083670.0, ans=0.125 2024-08-15 08:01:39,010 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.961e+01 2.375e+01 2.633e+01 2.839e+01 4.243e+01, threshold=5.267e+01, percent-clipped=0.0 2024-08-15 08:01:42,100 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 4050, loss[loss=0.09313, beats_loss=0.01183, ecapa_loss=0.0001886, whisper_loss=0.07941, over 16598.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01053, ecapa_loss=0.0001512, whisper_loss=0.0914, over 3848149.29 frames. ], batch size: 70, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:01:50,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3083770.0, ans=0.1 2024-08-15 08:01:53,213 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 26 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-15 08:02:00,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3083870.0, ans=0.0 2024-08-15 08:02:00,908 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.88 vs. limit=10.0 2024-08-15 08:02:10,031 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3083970.0, ans=0.125 2024-08-15 08:02:10,226 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.23 vs. limit=15.0 2024-08-15 08:02:12,968 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-15 08:02:14,420 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3083970.0, ans=0.07 2024-08-15 08:02:16,910 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=3083970.0, ans=0.02 2024-08-15 08:02:37,634 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=5.248e+01 2024-08-15 08:02:40,562 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.50 vs. limit=6.0 2024-08-15 08:02:46,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3084170.0, ans=0.125 2024-08-15 08:02:50,772 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 4100, loss[loss=0.1093, beats_loss=0.01223, ecapa_loss=0.0001457, whisper_loss=0.09566, over 21934.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01055, ecapa_loss=0.0001524, whisper_loss=0.09074, over 3818072.85 frames. ], batch size: 90, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:02:55,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3084270.0, ans=0.125 2024-08-15 08:02:56,673 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 26 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-15 08:03:24,349 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3084470.0, ans=0.0 2024-08-15 08:03:39,648 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3084570.0, ans=0.0 2024-08-15 08:03:41,243 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.23 vs. limit=15.0 2024-08-15 08:03:48,313 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3084670.0, ans=0.1 2024-08-15 08:03:57,372 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.396e+01 2.573e+01 2.888e+01 2.852e+02, threshold=5.147e+01, percent-clipped=1.0 2024-08-15 08:03:57,979 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.47 vs. limit=15.0 2024-08-15 08:04:00,155 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 4150, loss[loss=0.1073, beats_loss=0.01005, ecapa_loss=0.0001453, whisper_loss=0.09576, over 21897.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01056, ecapa_loss=0.0001528, whisper_loss=0.09096, over 3832463.85 frames. ], batch size: 88, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:04:02,371 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.85 vs. limit=15.0 2024-08-15 08:04:21,467 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 22 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-15 08:04:22,881 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 16 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-15 08:04:59,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3085170.0, ans=0.1 2024-08-15 08:05:03,837 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 21 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-15 08:05:09,072 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 4200, loss[loss=0.09431, beats_loss=0.0089, ecapa_loss=0.00014, whisper_loss=0.08401, over 19255.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0106, ecapa_loss=0.0001535, whisper_loss=0.09071, over 3856776.84 frames. ], batch size: 74, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:05:41,925 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3085470.0, ans=0.1 2024-08-15 08:05:45,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3085470.0, ans=0.125 2024-08-15 08:06:09,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3085670.0, ans=0.0 2024-08-15 08:06:13,680 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 35 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-15 08:06:15,233 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.924e+01 2.224e+01 2.413e+01 2.831e+01 9.655e+01, threshold=4.827e+01, percent-clipped=1.0 2024-08-15 08:06:16,044 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.17 vs. limit=15.0 2024-08-15 08:06:18,122 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 4250, loss[loss=0.1012, beats_loss=0.0108, ecapa_loss=0.0001216, whisper_loss=0.08917, over 19204.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01059, ecapa_loss=0.0001525, whisper_loss=0.09094, over 3873237.11 frames. ], batch size: 77, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:06:25,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3085770.0, ans=0.2 2024-08-15 08:06:32,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3085870.0, ans=0.0 2024-08-15 08:06:46,135 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 08:07:03,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3086070.0, ans=0.125 2024-08-15 08:07:18,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3086170.0, ans=0.125 2024-08-15 08:07:19,607 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.73 vs. limit=6.0 2024-08-15 08:07:26,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3086170.0, ans=0.2 2024-08-15 08:07:28,340 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 4300, loss[loss=0.1175, beats_loss=0.01025, ecapa_loss=0.0001301, whisper_loss=0.1059, over 24384.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01057, ecapa_loss=0.0001524, whisper_loss=0.09135, over 3892568.42 frames. ], batch size: 92, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:07:46,611 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3086370.0, ans=0.1 2024-08-15 08:08:02,024 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 25 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-15 08:08:03,347 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 15 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-15 08:08:11,635 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-15 08:08:13,273 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3086570.0, ans=0.125 2024-08-15 08:08:21,655 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 25 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-15 08:08:35,108 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.282e+01 2.452e+01 2.695e+01 5.506e+01, threshold=4.904e+01, percent-clipped=1.0 2024-08-15 08:08:38,020 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 4350, loss[loss=0.08757, beats_loss=0.0109, ecapa_loss=0.0001059, whisper_loss=0.07562, over 16492.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01064, ecapa_loss=0.0001524, whisper_loss=0.08979, over 3861595.00 frames. ], batch size: 63, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:08:53,636 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3086870.0, ans=0.07 2024-08-15 08:09:23,483 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-15 08:09:25,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3087070.0, ans=0.125 2024-08-15 08:09:46,891 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 4400, loss[loss=0.09004, beats_loss=0.01356, ecapa_loss=0.0001294, whisper_loss=0.07519, over 18193.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01069, ecapa_loss=0.0001521, whisper_loss=0.0896, over 3863148.50 frames. ], batch size: 73, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:09:47,309 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.483e+01 2024-08-15 08:10:12,836 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-15 08:10:18,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3087470.0, ans=0.125 2024-08-15 08:10:20,919 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 21 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-15 08:10:26,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3087570.0, ans=0.2 2024-08-15 08:10:47,114 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 30 from Vox, 19 fro AS 2024-08-15 08:10:48,462 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 22 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-15 08:10:48,655 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3087670.0, ans=0.125 2024-08-15 08:10:52,410 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.331e+01 2.564e+01 2.900e+01 4.263e+01, threshold=5.127e+01, percent-clipped=0.0 2024-08-15 08:10:55,208 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 4450, loss[loss=0.1027, beats_loss=0.01051, ecapa_loss=0.0001527, whisper_loss=0.0907, over 21871.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01058, ecapa_loss=0.0001514, whisper_loss=0.08951, over 3837627.55 frames. ], batch size: 89, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:11:01,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3087770.0, ans=0.125 2024-08-15 08:11:13,461 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 25 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-15 08:11:20,511 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3087870.0, ans=0.125 2024-08-15 08:11:37,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3088070.0, ans=0.125 2024-08-15 08:11:50,559 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.09 vs. limit=22.5 2024-08-15 08:11:57,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3088170.0, ans=0.125 2024-08-15 08:12:05,041 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 4500, loss[loss=0.09864, beats_loss=0.01073, ecapa_loss=0.0001289, whisper_loss=0.08662, over 14624.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01055, ecapa_loss=0.0001502, whisper_loss=0.09002, over 3872275.45 frames. ], batch size: 53, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:12:13,929 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.95 vs. limit=15.0 2024-08-15 08:12:20,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3088370.0, ans=0.0 2024-08-15 08:12:22,541 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.35 vs. limit=15.0 2024-08-15 08:12:34,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3088470.0, ans=0.125 2024-08-15 08:12:47,371 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.11 vs. limit=15.0 2024-08-15 08:12:51,084 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 25 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-15 08:13:00,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3088670.0, ans=0.125 2024-08-15 08:13:04,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3088670.0, ans=0.125 2024-08-15 08:13:11,204 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.772e+01 2.272e+01 2.536e+01 2.739e+01 4.209e+01, threshold=5.072e+01, percent-clipped=0.0 2024-08-15 08:13:13,988 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 4550, loss[loss=0.0806, beats_loss=0.01463, ecapa_loss=0.0001546, whisper_loss=0.06443, over 18272.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01056, ecapa_loss=0.0001515, whisper_loss=0.09026, over 3865070.02 frames. ], batch size: 74, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:13:14,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3088770.0, ans=0.0 2024-08-15 08:13:29,736 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.62 vs. limit=12.0 2024-08-15 08:13:32,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3088870.0, ans=0.0 2024-08-15 08:13:33,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3088870.0, ans=0.04949747468305833 2024-08-15 08:13:50,708 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 26 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-15 08:13:53,036 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.65 vs. limit=10.0 2024-08-15 08:14:01,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3089070.0, ans=0.125 2024-08-15 08:14:02,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3089070.0, ans=0.0 2024-08-15 08:14:04,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3089070.0, ans=0.0 2024-08-15 08:14:21,851 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 20 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-15 08:14:23,248 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 4600, loss[loss=0.08983, beats_loss=0.009968, ecapa_loss=0.0001626, whisper_loss=0.07823, over 17861.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01047, ecapa_loss=0.0001508, whisper_loss=0.09125, over 3880973.76 frames. ], batch size: 73, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:14:29,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3089270.0, ans=0.125 2024-08-15 08:14:41,850 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.83 vs. limit=15.0 2024-08-15 08:15:01,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3089470.0, ans=0.2 2024-08-15 08:15:08,317 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 20 from LS+wenet, 10 from Vox, 25 fro AS 2024-08-15 08:15:08,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3089570.0, ans=0.125 2024-08-15 08:15:27,757 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=3089670.0, ans=0.025 2024-08-15 08:15:29,177 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3089670.0, ans=0.125 2024-08-15 08:15:34,714 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.809e+01 2.352e+01 2.617e+01 2.995e+01 7.008e+01, threshold=5.233e+01, percent-clipped=1.0 2024-08-15 08:15:37,809 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 4650, loss[loss=0.1302, beats_loss=0.007312, ecapa_loss=0.0001654, whisper_loss=0.1213, over 15397.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01048, ecapa_loss=0.0001514, whisper_loss=0.09081, over 3871733.34 frames. ], batch size: 59, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:15:39,535 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 19 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-15 08:16:08,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3089970.0, ans=0.1 2024-08-15 08:16:09,346 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-15 08:16:19,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3089970.0, ans=0.0 2024-08-15 08:16:24,385 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 24 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-15 08:16:54,914 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 4700, loss[loss=0.09804, beats_loss=0.01003, ecapa_loss=0.0001865, whisper_loss=0.08614, over 19232.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01052, ecapa_loss=0.0001511, whisper_loss=0.09026, over 3867695.45 frames. ], batch size: 79, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:16:56,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3090270.0, ans=0.2 2024-08-15 08:17:20,514 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 25 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-15 08:17:24,851 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 23 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-15 08:17:52,126 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 25 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-15 08:17:55,476 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 22 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-15 08:18:00,140 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-15 08:18:10,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3090670.0, ans=0.2 2024-08-15 08:18:12,462 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.393e+01 2.597e+01 3.087e+01 7.330e+01, threshold=5.194e+01, percent-clipped=1.0 2024-08-15 08:18:12,906 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3090670.0, ans=0.125 2024-08-15 08:18:15,806 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 4750, loss[loss=0.09039, beats_loss=0.01244, ecapa_loss=0.0001401, whisper_loss=0.07655, over 21456.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01051, ecapa_loss=0.000151, whisper_loss=0.0906, over 3858413.80 frames. ], batch size: 89, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:18:18,924 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 21 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-15 08:18:25,250 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 24 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-15 08:18:40,488 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3090870.0, ans=0.125 2024-08-15 08:18:48,537 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 20 from LS+wenet, 33 from Vox, 27 fro AS 2024-08-15 08:18:57,372 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-15 08:18:59,978 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.46 vs. limit=15.0 2024-08-15 08:19:03,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3091070.0, ans=0.125 2024-08-15 08:19:08,708 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 25 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-15 08:19:16,672 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3091170.0, ans=0.125 2024-08-15 08:19:26,858 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-15 08:19:32,941 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 4800, loss[loss=0.1024, beats_loss=0.01239, ecapa_loss=0.0001408, whisper_loss=0.08859, over 22737.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01042, ecapa_loss=0.0001525, whisper_loss=0.09118, over 3883023.39 frames. ], batch size: 92, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:19:47,356 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.82 vs. limit=15.0 2024-08-15 08:19:54,962 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.81 vs. limit=22.5 2024-08-15 08:20:06,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3091470.0, ans=0.125 2024-08-15 08:20:13,077 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.21 vs. limit=10.0 2024-08-15 08:20:14,435 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3091470.0, ans=0.125 2024-08-15 08:20:17,662 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.46 vs. limit=15.0 2024-08-15 08:20:22,213 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-15 08:20:23,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3091570.0, ans=0.1 2024-08-15 08:20:28,513 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 08:20:32,231 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.70 vs. limit=15.0 2024-08-15 08:20:50,971 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.905e+01 2.389e+01 2.640e+01 2.976e+01 3.395e+02, threshold=5.281e+01, percent-clipped=5.0 2024-08-15 08:20:52,427 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 4850, loss[loss=0.09982, beats_loss=0.009631, ecapa_loss=0.0001761, whisper_loss=0.08843, over 19303.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01056, ecapa_loss=0.0001524, whisper_loss=0.09051, over 3909814.79 frames. ], batch size: 82, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:20:54,223 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3091770.0, ans=0.125 2024-08-15 08:20:55,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3091770.0, ans=0.0 2024-08-15 08:20:57,887 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 29 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-15 08:21:26,625 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-15 08:21:37,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3091970.0, ans=0.2 2024-08-15 08:22:12,355 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 4900, loss[loss=0.1244, beats_loss=0.009806, ecapa_loss=0.0001524, whisper_loss=0.113, over 23077.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0106, ecapa_loss=0.0001525, whisper_loss=0.09064, over 3891003.90 frames. ], batch size: 91, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:22:20,874 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3092270.0, ans=0.125 2024-08-15 08:22:27,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3092370.0, ans=0.1 2024-08-15 08:22:30,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3092370.0, ans=0.1 2024-08-15 08:22:33,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3092370.0, ans=0.1 2024-08-15 08:22:57,842 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 18 from LS+wenet, 21 from Vox, 15 fro AS 2024-08-15 08:23:04,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3092570.0, ans=0.0 2024-08-15 08:23:07,633 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3092570.0, ans=0.1 2024-08-15 08:23:24,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3092670.0, ans=0.0 2024-08-15 08:23:30,357 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.893e+01 2.221e+01 2.391e+01 2.667e+01 3.893e+01, threshold=4.783e+01, percent-clipped=0.0 2024-08-15 08:23:32,557 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 4950, loss[loss=0.0905, beats_loss=0.01158, ecapa_loss=0.0001595, whisper_loss=0.07733, over 20084.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01058, ecapa_loss=0.0001521, whisper_loss=0.09057, over 3918851.39 frames. ], batch size: 82, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:23:40,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3092770.0, ans=0.2 2024-08-15 08:23:46,397 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3092870.0, ans=0.2 2024-08-15 08:23:47,625 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3092870.0, ans=0.0 2024-08-15 08:23:59,223 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.41 vs. limit=15.0 2024-08-15 08:24:01,700 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3092970.0, ans=0.125 2024-08-15 08:24:04,248 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 19 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-15 08:24:04,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=3092970.0, ans=15.0 2024-08-15 08:24:09,199 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 17 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-15 08:24:13,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3092970.0, ans=0.2 2024-08-15 08:24:23,760 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.42 vs. limit=10.0 2024-08-15 08:24:33,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=3093170.0, ans=15.0 2024-08-15 08:24:48,034 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 5000, loss[loss=0.08066, beats_loss=0.01326, ecapa_loss=0.0001228, whisper_loss=0.06617, over 14962.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01051, ecapa_loss=0.0001528, whisper_loss=0.09069, over 3852233.77 frames. ], batch size: 59, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:25:18,205 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.14 vs. limit=12.0 2024-08-15 08:25:21,170 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.12 vs. limit=15.0 2024-08-15 08:25:37,492 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3093570.0, ans=0.125 2024-08-15 08:25:40,500 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 24 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-15 08:25:55,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3093670.0, ans=0.125 2024-08-15 08:25:58,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3093670.0, ans=0.125 2024-08-15 08:25:58,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3093670.0, ans=0.2 2024-08-15 08:26:03,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3093670.0, ans=0.0 2024-08-15 08:26:07,470 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.695e+01 2.349e+01 2.649e+01 2.919e+01 1.466e+02, threshold=5.298e+01, percent-clipped=4.0 2024-08-15 08:26:09,254 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 5050, loss[loss=0.123, beats_loss=0.008001, ecapa_loss=0.0001541, whisper_loss=0.1135, over 17958.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01056, ecapa_loss=0.0001534, whisper_loss=0.09099, over 3869884.84 frames. ], batch size: 66, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:26:15,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3093770.0, ans=0.125 2024-08-15 08:26:19,273 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 19 from LS+wenet, 29 from Vox, 24 fro AS 2024-08-15 08:26:25,723 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3093870.0, ans=0.1 2024-08-15 08:26:27,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3093870.0, ans=0.05 2024-08-15 08:26:30,369 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 23 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-15 08:26:32,553 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-15 08:26:47,266 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 23 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-15 08:27:14,232 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3094170.0, ans=0.0 2024-08-15 08:27:15,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3094170.0, ans=0.1 2024-08-15 08:27:17,779 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3094170.0, ans=0.0 2024-08-15 08:27:21,667 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-15 08:27:30,214 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 5100, loss[loss=0.1136, beats_loss=0.009939, ecapa_loss=0.0001628, whisper_loss=0.102, over 16257.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01058, ecapa_loss=0.0001536, whisper_loss=0.09099, over 3874503.87 frames. ], batch size: 64, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:27:36,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3094270.0, ans=0.0 2024-08-15 08:27:41,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3094270.0, ans=0.125 2024-08-15 08:27:46,988 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.63 vs. limit=22.5 2024-08-15 08:27:52,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3094370.0, ans=10.0 2024-08-15 08:27:56,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3094370.0, ans=0.125 2024-08-15 08:28:13,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3094470.0, ans=0.125 2024-08-15 08:28:18,396 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 19 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-15 08:28:50,039 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.866e+01 2.353e+01 2.564e+01 3.013e+01 4.662e+01, threshold=5.127e+01, percent-clipped=0.0 2024-08-15 08:28:51,406 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 5150, loss[loss=0.1039, beats_loss=0.01018, ecapa_loss=0.0001804, whisper_loss=0.09192, over 19056.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01062, ecapa_loss=0.0001531, whisper_loss=0.09121, over 3875440.14 frames. ], batch size: 78, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:28:53,462 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 21 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-15 08:29:06,901 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3094870.0, ans=0.0 2024-08-15 08:29:06,943 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3094870.0, ans=0.2 2024-08-15 08:29:13,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3094870.0, ans=0.125 2024-08-15 08:29:41,759 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 24 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-15 08:29:43,725 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 19 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-15 08:29:43,960 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3095070.0, ans=10.0 2024-08-15 08:29:55,257 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 28 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-15 08:30:12,691 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 5200, loss[loss=0.1031, beats_loss=0.01274, ecapa_loss=0.0001183, whisper_loss=0.08917, over 21188.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01057, ecapa_loss=0.0001511, whisper_loss=0.09119, over 3856994.90 frames. ], batch size: 85, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:30:32,483 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 28 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-15 08:30:43,172 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 20 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-15 08:31:09,061 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 31 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-15 08:31:09,350 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=5.066e-03 2024-08-15 08:31:11,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3095570.0, ans=0.125 2024-08-15 08:31:18,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3095670.0, ans=0.125 2024-08-15 08:31:20,532 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 29 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-15 08:31:20,756 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3095670.0, ans=0.125 2024-08-15 08:31:25,612 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.53 vs. limit=15.0 2024-08-15 08:31:26,992 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.25 vs. limit=15.0 2024-08-15 08:31:31,057 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.296e+01 2.558e+01 2.889e+01 4.447e+01, threshold=5.115e+01, percent-clipped=0.0 2024-08-15 08:31:32,973 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 5250, loss[loss=0.07354, beats_loss=0.01246, ecapa_loss=0.0001495, whisper_loss=0.05959, over 20559.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01057, ecapa_loss=0.0001511, whisper_loss=0.09093, over 3829702.19 frames. ], batch size: 87, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:31:34,999 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-15 08:31:37,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=3095770.0, ans=15.0 2024-08-15 08:31:59,778 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3095870.0, ans=0.035 2024-08-15 08:32:26,766 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-15 08:32:38,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3096170.0, ans=0.125 2024-08-15 08:32:40,893 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 26 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-15 08:32:47,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3096170.0, ans=0.125 2024-08-15 08:32:51,565 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 5300, loss[loss=0.0888, beats_loss=0.01241, ecapa_loss=0.0001693, whisper_loss=0.0747, over 12882.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01063, ecapa_loss=0.000151, whisper_loss=0.09047, over 3832572.83 frames. ], batch size: 54, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:33:14,221 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3096370.0, ans=0.2 2024-08-15 08:33:22,273 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3096470.0, ans=0.2 2024-08-15 08:33:27,060 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 32 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-15 08:33:33,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3096470.0, ans=0.0 2024-08-15 08:33:35,120 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 22 from LS+wenet, 20 from Vox, 49 fro AS 2024-08-15 08:33:41,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3096570.0, ans=0.125 2024-08-15 08:33:45,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3096570.0, ans=0.1 2024-08-15 08:34:06,625 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 26 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-15 08:34:11,422 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.942e+01 2.309e+01 2.587e+01 2.813e+01 1.007e+02, threshold=5.174e+01, percent-clipped=2.0 2024-08-15 08:34:12,753 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 5350, loss[loss=0.1036, beats_loss=0.01232, ecapa_loss=0.0001292, whisper_loss=0.09002, over 22749.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01065, ecapa_loss=0.0001508, whisper_loss=0.09109, over 3875802.26 frames. ], batch size: 90, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:34:19,560 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 21 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-15 08:34:34,883 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3096870.0, ans=0.0 2024-08-15 08:34:36,103 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 33 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-15 08:34:49,044 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 21 from LS+wenet, 10 from Vox, 35 fro AS 2024-08-15 08:34:53,916 WARNING [optim.py:496] (0/4) Scaling gradients by 0.02987569198012352, model_norm_threshold=51.74189376831055 2024-08-15 08:34:54,104 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.43, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.299e+06, grad_sumsq=1.297e+08, orig_rms_sq=1.001e-02 2024-08-15 08:35:12,077 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3097070.0, ans=0.125 2024-08-15 08:35:16,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3097170.0, ans=0.125 2024-08-15 08:35:34,296 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 5400, loss[loss=0.1153, beats_loss=0.009331, ecapa_loss=0.0001557, whisper_loss=0.1045, over 22373.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.0105, ecapa_loss=0.0001509, whisper_loss=0.09177, over 3859097.33 frames. ], batch size: 91, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:35:40,965 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 18 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-15 08:35:56,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3097370.0, ans=0.2 2024-08-15 08:36:09,202 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-15 08:36:11,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3097370.0, ans=0.125 2024-08-15 08:36:14,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3097470.0, ans=0.015 2024-08-15 08:36:18,048 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 27 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-15 08:36:39,082 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 21 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-15 08:36:44,857 WARNING [optim.py:496] (0/4) Scaling gradients by 0.07409150153398514, model_norm_threshold=51.74189376831055 2024-08-15 08:36:45,037 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.08, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.947e+04, grad_sumsq=3.947e+04, orig_rms_sq=1.000e+00 2024-08-15 08:36:57,387 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3097670.0, ans=0.0 2024-08-15 08:36:59,495 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.767e+01 2.348e+01 2.686e+01 2.991e+01 1.732e+03, threshold=5.372e+01, percent-clipped=3.0 2024-08-15 08:37:00,938 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 5450, loss[loss=0.08973, beats_loss=0.01305, ecapa_loss=0.0001729, whisper_loss=0.07495, over 21765.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0106, ecapa_loss=0.0001518, whisper_loss=0.09107, over 3876725.31 frames. ], batch size: 93, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:37:06,499 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3097770.0, ans=0.125 2024-08-15 08:37:10,275 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-15 08:37:10,578 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3097770.0, ans=0.125 2024-08-15 08:37:12,732 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3097770.0, ans=0.125 2024-08-15 08:37:29,398 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-15 08:38:43,643 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 5500, loss[loss=0.09836, beats_loss=0.009833, ecapa_loss=0.0001306, whisper_loss=0.08722, over 16398.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01057, ecapa_loss=0.0001514, whisper_loss=0.09172, over 3894110.31 frames. ], batch size: 64, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:38:46,213 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3098270.0, ans=0.0 2024-08-15 08:39:15,672 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.26 vs. limit=6.0 2024-08-15 08:39:16,295 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 11 from Vox, 49 fro AS 2024-08-15 08:39:23,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3098470.0, ans=0.125 2024-08-15 08:39:30,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3098470.0, ans=0.125 2024-08-15 08:39:37,847 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 23 from LS+wenet, 17 from Vox, 52 fro AS 2024-08-15 08:40:00,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3098670.0, ans=0.1 2024-08-15 08:40:24,823 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.774e+01 2.188e+01 2.412e+01 2.681e+01 4.153e+01, threshold=4.824e+01, percent-clipped=0.0 2024-08-15 08:40:28,340 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 5550, loss[loss=0.1256, beats_loss=0.009297, ecapa_loss=0.0001348, whisper_loss=0.115, over 18907.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01056, ecapa_loss=0.0001505, whisper_loss=0.09199, over 3917029.80 frames. ], batch size: 72, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:40:30,460 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 35 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-15 08:41:13,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3098870.0, ans=0.125 2024-08-15 08:41:32,154 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.03 vs. limit=15.0 2024-08-15 08:41:36,017 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 30 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-15 08:41:43,289 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3099070.0, ans=0.0 2024-08-15 08:41:46,512 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3099070.0, ans=0.0 2024-08-15 08:42:06,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3099170.0, ans=0.125 2024-08-15 08:42:08,632 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-15 08:42:22,205 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3099170.0, ans=0.1 2024-08-15 08:42:29,205 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 5600, loss[loss=0.09521, beats_loss=0.0104, ecapa_loss=0.0001504, whisper_loss=0.0833, over 20905.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01059, ecapa_loss=0.000151, whisper_loss=0.09142, over 3919504.71 frames. ], batch size: 85, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:42:35,756 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.23 vs. limit=22.5 2024-08-15 08:42:42,792 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 21 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-15 08:43:40,892 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3099470.0, ans=0.5 2024-08-15 08:43:58,963 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.80 vs. limit=15.0 2024-08-15 08:44:22,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3099670.0, ans=0.125 2024-08-15 08:44:35,210 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.681e+01 2.363e+01 2.562e+01 2.966e+01 4.681e+01, threshold=5.125e+01, percent-clipped=0.0 2024-08-15 08:44:36,795 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 5650, loss[loss=0.09506, beats_loss=0.01092, ecapa_loss=0.0001473, whisper_loss=0.08267, over 18812.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01057, ecapa_loss=0.0001496, whisper_loss=0.09162, over 3937073.22 frames. ], batch size: 76, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:44:47,997 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3099770.0, ans=0.125 2024-08-15 08:45:25,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3099970.0, ans=0.125 2024-08-15 08:45:31,720 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-15 08:45:44,996 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-15 08:45:47,221 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3100070.0, ans=0.0 2024-08-15 08:45:49,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3100070.0, ans=0.2 2024-08-15 08:46:01,988 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3100070.0, ans=0.125 2024-08-15 08:46:14,126 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3100170.0, ans=0.125 2024-08-15 08:46:23,606 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 5700, loss[loss=0.09244, beats_loss=0.01213, ecapa_loss=0.0001535, whisper_loss=0.07878, over 22709.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0107, ecapa_loss=0.0001502, whisper_loss=0.09102, over 3950906.74 frames. ], batch size: 95, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:46:31,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3100270.0, ans=0.125 2024-08-15 08:46:35,732 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 21 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-15 08:46:37,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3100370.0, ans=0.125 2024-08-15 08:46:38,836 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3100370.0, ans=0.2 2024-08-15 08:46:42,163 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 14 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-15 08:47:00,217 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3100470.0, ans=0.125 2024-08-15 08:47:05,238 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.22 vs. limit=10.0 2024-08-15 08:47:30,636 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.81 vs. limit=15.0 2024-08-15 08:47:43,309 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.415e+01 2.624e+01 2.975e+01 2.275e+02, threshold=5.249e+01, percent-clipped=3.0 2024-08-15 08:47:44,776 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 5750, loss[loss=0.0943, beats_loss=0.009837, ecapa_loss=0.000132, whisper_loss=0.08314, over 15956.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01072, ecapa_loss=0.0001494, whisper_loss=0.09097, over 3953680.65 frames. ], batch size: 61, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:47:55,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3100770.0, ans=0.125 2024-08-15 08:48:10,031 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 17 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-15 08:48:32,431 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.90 vs. limit=12.0 2024-08-15 08:48:41,783 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 34 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-15 08:48:45,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3101070.0, ans=0.125 2024-08-15 08:48:50,659 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 29 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-15 08:48:55,033 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 13 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-15 08:49:03,275 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3101170.0, ans=0.0 2024-08-15 08:49:05,620 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 5800, loss[loss=0.07546, beats_loss=0.0128, ecapa_loss=0.0001799, whisper_loss=0.06087, over 20820.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0106, ecapa_loss=0.0001508, whisper_loss=0.09102, over 3917808.79 frames. ], batch size: 91, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:49:18,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3101270.0, ans=0.125 2024-08-15 08:49:24,084 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3101370.0, ans=0.125 2024-08-15 08:49:26,599 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 31 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-15 08:49:37,824 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 26 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-15 08:49:46,411 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.82 vs. limit=22.5 2024-08-15 08:49:48,754 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3101470.0, ans=0.125 2024-08-15 08:49:55,612 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3101570.0, ans=0.1 2024-08-15 08:49:57,149 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 08:50:23,930 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.830e+01 2.392e+01 2.742e+01 3.107e+01 2.079e+02, threshold=5.485e+01, percent-clipped=4.0 2024-08-15 08:50:24,104 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 21 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-15 08:50:24,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3101770.0, ans=0.0 2024-08-15 08:50:24,491 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.35 vs. limit=15.0 2024-08-15 08:50:25,284 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 5850, loss[loss=0.09177, beats_loss=0.01079, ecapa_loss=0.0001665, whisper_loss=0.07932, over 17485.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01068, ecapa_loss=0.0001499, whisper_loss=0.09091, over 3921363.54 frames. ], batch size: 74, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:50:38,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3101770.0, ans=0.035 2024-08-15 08:50:43,949 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 08:50:49,331 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.91 vs. limit=6.0 2024-08-15 08:50:51,573 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 30 from LS+wenet, 31 from Vox, 25 fro AS 2024-08-15 08:51:20,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3102070.0, ans=0.125 2024-08-15 08:51:23,540 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3102070.0, ans=0.125 2024-08-15 08:51:42,479 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3102170.0, ans=0.125 2024-08-15 08:51:44,951 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 5900, loss[loss=0.121, beats_loss=0.009147, ecapa_loss=0.0001568, whisper_loss=0.1103, over 23356.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01066, ecapa_loss=0.0001503, whisper_loss=0.09112, over 3925217.00 frames. ], batch size: 90, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:51:51,328 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 21 from LS+wenet, 12 from Vox, 21 fro AS 2024-08-15 08:51:52,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3102270.0, ans=0.0 2024-08-15 08:52:08,985 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3102370.0, ans=0.125 2024-08-15 08:52:09,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3102370.0, ans=0.1 2024-08-15 08:52:10,788 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.89 vs. limit=15.0 2024-08-15 08:52:13,231 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 26 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-15 08:52:15,193 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3102470.0, ans=0.0 2024-08-15 08:52:35,728 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 21 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-15 08:52:37,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3102570.0, ans=0.1 2024-08-15 08:52:57,229 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3102670.0, ans=0.2 2024-08-15 08:52:59,506 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.266e+01 2.479e+01 2.808e+01 3.444e+02, threshold=4.958e+01, percent-clipped=1.0 2024-08-15 08:53:01,408 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 5950, loss[loss=0.07664, beats_loss=0.01107, ecapa_loss=0.0001891, whisper_loss=0.06368, over 14168.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01074, ecapa_loss=0.0001499, whisper_loss=0.09065, over 3937361.39 frames. ], batch size: 57, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:53:04,588 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 12 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-15 08:53:04,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3102770.0, ans=0.0 2024-08-15 08:53:09,164 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 22 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-15 08:53:12,712 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-15 08:53:12,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3102770.0, ans=0.125 2024-08-15 08:53:21,459 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3102870.0, ans=0.125 2024-08-15 08:53:24,402 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 19 from LS+wenet, 20 from Vox, 17 fro AS 2024-08-15 08:53:42,173 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.31 vs. limit=15.0 2024-08-15 08:53:59,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3103070.0, ans=0.1 2024-08-15 08:54:02,029 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-15 08:54:02,457 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.83 vs. limit=6.0 2024-08-15 08:54:16,767 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.57 vs. limit=15.0 2024-08-15 08:54:17,698 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3103270.0, ans=0.125 2024-08-15 08:54:18,698 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 6000, loss[loss=0.08685, beats_loss=0.01239, ecapa_loss=0.0001469, whisper_loss=0.07298, over 20988.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0107, ecapa_loss=0.0001506, whisper_loss=0.09081, over 3921104.27 frames. ], batch size: 88, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:54:18,700 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-15 08:54:59,176 INFO [train_multi_KD3.py:1149] (0/4) Epoch 22, validation on ASR_libri: loss=0.2524, beats_loss=0, ecapa_loss=0.0005326, whisper_loss=0.2471, over 922467.00 frames. 2024-08-15 08:55:14,770 INFO [train_multi_KD3.py:1149] (0/4) Epoch 22, validation on SV_voxceleb1: loss=0.004204, beats_loss=0, ecapa_loss=0.0004204, whisper_loss=0, over 939242.00 frames. 2024-08-15 08:57:13,977 INFO [train_multi_KD3.py:1149] (0/4) Epoch 22, validation on AT_audioset: loss=0.02337, beats_loss=0.02337, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 08:57:13,981 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32601MB 2024-08-15 08:57:16,846 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3103270.0, ans=0.0 2024-08-15 08:57:24,597 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 24 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-15 08:57:24,794 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3103270.0, ans=0.125 2024-08-15 08:57:44,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3103470.0, ans=0.2 2024-08-15 08:57:52,424 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 14 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-15 08:58:12,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3103570.0, ans=0.125 2024-08-15 08:58:14,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3103670.0, ans=0.125 2024-08-15 08:58:28,667 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.350e+01 2.585e+01 2.886e+01 6.077e+01, threshold=5.169e+01, percent-clipped=1.0 2024-08-15 08:58:28,867 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 20 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-15 08:58:29,111 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 08:58:30,789 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 6050, loss[loss=0.1066, beats_loss=0.008769, ecapa_loss=0.0001549, whisper_loss=0.09633, over 15857.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0107, ecapa_loss=0.0001505, whisper_loss=0.09024, over 3880631.46 frames. ], batch size: 61, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:58:35,091 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3103770.0, ans=0.1 2024-08-15 08:58:40,084 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3103770.0, ans=0.125 2024-08-15 08:58:46,425 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.908e+00 2024-08-15 08:58:50,869 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.74 vs. limit=15.0 2024-08-15 08:58:53,557 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 19 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-15 08:58:55,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3103870.0, ans=0.2 2024-08-15 08:58:58,104 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-15 08:58:59,577 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 25 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-15 08:59:14,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3104070.0, ans=0.0 2024-08-15 08:59:23,844 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.41 vs. limit=22.5 2024-08-15 08:59:26,499 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.11 vs. limit=15.0 2024-08-15 08:59:34,779 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.41 vs. limit=15.0 2024-08-15 08:59:36,700 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3104170.0, ans=0.125 2024-08-15 08:59:45,335 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 6100, loss[loss=0.1116, beats_loss=0.0101, ecapa_loss=0.0001234, whisper_loss=0.1003, over 15340.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01069, ecapa_loss=0.0001506, whisper_loss=0.08982, over 3864427.19 frames. ], batch size: 58, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:59:50,702 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.40 vs. limit=15.0 2024-08-15 09:00:00,631 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3104370.0, ans=0.125 2024-08-15 09:00:26,427 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 13 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-15 09:00:47,700 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3104670.0, ans=0.05 2024-08-15 09:00:56,179 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 22 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-15 09:00:56,479 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.91 vs. limit=15.0 2024-08-15 09:00:57,410 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.721e+01 2.233e+01 2.517e+01 2.744e+01 4.126e+01, threshold=5.033e+01, percent-clipped=0.0 2024-08-15 09:00:58,746 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 6150, loss[loss=0.0951, beats_loss=0.01238, ecapa_loss=0.0001645, whisper_loss=0.08107, over 20903.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01071, ecapa_loss=0.0001528, whisper_loss=0.08949, over 3852383.72 frames. ], batch size: 88, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:01:01,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3104770.0, ans=0.125 2024-08-15 09:01:05,343 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3104770.0, ans=0.125 2024-08-15 09:01:41,082 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 34 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-15 09:01:53,086 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.60 vs. limit=15.0 2024-08-15 09:01:57,077 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 18 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-15 09:02:00,046 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 20 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-15 09:02:13,079 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 6200, loss[loss=0.0925, beats_loss=0.01202, ecapa_loss=0.000153, whisper_loss=0.07895, over 19617.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01074, ecapa_loss=0.0001512, whisper_loss=0.08981, over 3844742.43 frames. ], batch size: 79, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:02:22,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3105270.0, ans=0.125 2024-08-15 09:02:26,305 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-15 09:02:35,194 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 19 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-15 09:02:35,447 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3105370.0, ans=0.0 2024-08-15 09:02:43,069 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3105470.0, ans=0.125 2024-08-15 09:02:46,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3105470.0, ans=0.0 2024-08-15 09:02:55,831 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 14 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-15 09:03:23,235 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-15 09:03:26,005 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.684e+01 2.264e+01 2.437e+01 2.763e+01 4.898e+01, threshold=4.875e+01, percent-clipped=0.0 2024-08-15 09:03:28,394 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 6250, loss[loss=0.1079, beats_loss=0.01062, ecapa_loss=0.0001414, whisper_loss=0.09592, over 21168.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01074, ecapa_loss=0.0001517, whisper_loss=0.08919, over 3856185.97 frames. ], batch size: 85, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:04:02,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3105970.0, ans=0.125 2024-08-15 09:04:11,117 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3105970.0, ans=0.125 2024-08-15 09:04:20,820 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3106070.0, ans=0.125 2024-08-15 09:04:44,382 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-15 09:04:45,426 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 6300, loss[loss=0.1048, beats_loss=0.01116, ecapa_loss=0.0001458, whisper_loss=0.09221, over 22879.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01077, ecapa_loss=0.0001504, whisper_loss=0.08961, over 3868428.66 frames. ], batch size: 93, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:04:48,327 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.43 vs. limit=22.5 2024-08-15 09:04:55,609 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.51 vs. limit=22.5 2024-08-15 09:05:14,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3106370.0, ans=0.2 2024-08-15 09:05:14,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3106370.0, ans=0.95 2024-08-15 09:05:28,166 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3106470.0, ans=0.125 2024-08-15 09:05:40,737 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3106570.0, ans=0.1 2024-08-15 09:05:47,911 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3106570.0, ans=0.125 2024-08-15 09:05:49,220 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3106570.0, ans=0.125 2024-08-15 09:05:54,399 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-15 09:05:57,457 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-15 09:06:07,836 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.954e+01 2.366e+01 2.627e+01 2.982e+01 5.649e+01, threshold=5.254e+01, percent-clipped=1.0 2024-08-15 09:06:09,660 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 6350, loss[loss=0.09864, beats_loss=0.01026, ecapa_loss=0.0001431, whisper_loss=0.08695, over 14767.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01073, ecapa_loss=0.0001511, whisper_loss=0.08974, over 3866738.71 frames. ], batch size: 58, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:06:12,055 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 28 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-15 09:06:12,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3106770.0, ans=0.125 2024-08-15 09:06:21,575 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3106770.0, ans=0.1 2024-08-15 09:06:34,605 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 34 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-15 09:06:38,228 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3106870.0, ans=0.1 2024-08-15 09:06:39,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3106870.0, ans=0.0 2024-08-15 09:06:40,950 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 30 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-15 09:06:45,574 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-15 09:06:49,875 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 22 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-15 09:06:53,166 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.99 vs. limit=15.0 2024-08-15 09:06:54,699 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3106970.0, ans=0.2 2024-08-15 09:07:07,407 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3107070.0, ans=0.0 2024-08-15 09:07:30,275 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 6400, loss[loss=0.0997, beats_loss=0.01071, ecapa_loss=0.0001388, whisper_loss=0.08761, over 20396.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01076, ecapa_loss=0.0001507, whisper_loss=0.08933, over 3884297.59 frames. ], batch size: 79, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:08:06,895 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3107470.0, ans=0.125 2024-08-15 09:08:14,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3107470.0, ans=0.125 2024-08-15 09:08:23,133 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3107570.0, ans=0.125 2024-08-15 09:08:26,853 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 19 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-15 09:08:28,318 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.050e-02 2024-08-15 09:08:32,985 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3107570.0, ans=0.0 2024-08-15 09:08:39,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3107670.0, ans=0.07 2024-08-15 09:08:42,571 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 39 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-15 09:08:48,720 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3107670.0, ans=0.0 2024-08-15 09:08:51,737 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3107670.0, ans=0.0 2024-08-15 09:08:52,624 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.707e+01 2.320e+01 2.533e+01 2.838e+01 5.335e+01, threshold=5.066e+01, percent-clipped=1.0 2024-08-15 09:08:54,040 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 6450, loss[loss=0.1209, beats_loss=0.01055, ecapa_loss=0.0001344, whisper_loss=0.109, over 23314.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01073, ecapa_loss=0.0001501, whisper_loss=0.09003, over 3886005.96 frames. ], batch size: 91, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:09:08,062 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 22 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-15 09:09:32,156 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 21 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-15 09:09:45,218 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 30 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-15 09:09:57,263 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 23 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-15 09:10:03,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3108170.0, ans=0.125 2024-08-15 09:10:06,634 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 17 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-15 09:10:16,146 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 6500, loss[loss=0.1089, beats_loss=0.0118, ecapa_loss=0.0001176, whisper_loss=0.09588, over 22408.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01068, ecapa_loss=0.0001497, whisper_loss=0.09079, over 3891341.53 frames. ], batch size: 86, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:10:26,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3108270.0, ans=0.125 2024-08-15 09:10:55,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3108470.0, ans=0.09899494936611666 2024-08-15 09:11:07,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3108570.0, ans=0.2 2024-08-15 09:11:18,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3108670.0, ans=0.0 2024-08-15 09:11:19,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3108670.0, ans=0.0 2024-08-15 09:11:25,707 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-15 09:11:30,689 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 19 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-15 09:11:31,947 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-15 09:11:33,007 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.929e+01 2.377e+01 2.603e+01 2.970e+01 3.973e+01, threshold=5.206e+01, percent-clipped=0.0 2024-08-15 09:11:33,515 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 34 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-15 09:11:34,752 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 6550, loss[loss=0.1228, beats_loss=0.00938, ecapa_loss=0.000133, whisper_loss=0.1121, over 23105.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01064, ecapa_loss=0.0001495, whisper_loss=0.0912, over 3928960.54 frames. ], batch size: 87, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:11:35,682 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3108770.0, ans=0.0 2024-08-15 09:11:47,472 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3108770.0, ans=0.2 2024-08-15 09:11:51,481 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.48 vs. limit=15.0 2024-08-15 09:12:19,215 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.48 vs. limit=10.0 2024-08-15 09:12:27,307 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.29 vs. limit=15.0 2024-08-15 09:12:32,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3109070.0, ans=0.125 2024-08-15 09:12:33,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3109070.0, ans=0.125 2024-08-15 09:12:53,750 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 6600, loss[loss=0.09743, beats_loss=0.01277, ecapa_loss=0.0001322, whisper_loss=0.08334, over 22105.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01068, ecapa_loss=0.0001502, whisper_loss=0.09144, over 3935640.90 frames. ], batch size: 88, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:13:07,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3109270.0, ans=0.125 2024-08-15 09:13:17,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3109370.0, ans=0.0 2024-08-15 09:13:37,458 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 24 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-15 09:13:56,378 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-15 09:14:07,920 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3109670.0, ans=0.2 2024-08-15 09:14:10,255 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.840e+01 2.330e+01 2.492e+01 2.798e+01 4.030e+01, threshold=4.985e+01, percent-clipped=0.0 2024-08-15 09:14:11,821 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 6650, loss[loss=0.09906, beats_loss=0.00883, ecapa_loss=0.000188, whisper_loss=0.08835, over 15491.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0107, ecapa_loss=0.0001507, whisper_loss=0.09089, over 3930589.82 frames. ], batch size: 63, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:14:18,599 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-15 09:14:22,134 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 16 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-15 09:14:27,490 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 14 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-15 09:14:31,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3109870.0, ans=0.0 2024-08-15 09:14:38,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3109870.0, ans=0.0 2024-08-15 09:14:58,225 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3109970.0, ans=0.125 2024-08-15 09:15:24,400 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3110170.0, ans=0.125 2024-08-15 09:15:30,131 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3110270.0, ans=0.125 2024-08-15 09:15:31,293 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 6700, loss[loss=0.1115, beats_loss=0.009095, ecapa_loss=0.0001529, whisper_loss=0.1009, over 23210.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01068, ecapa_loss=0.0001509, whisper_loss=0.09049, over 3900326.96 frames. ], batch size: 90, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:15:33,616 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 21 from LS+wenet, 8 from Vox, 33 fro AS 2024-08-15 09:15:50,691 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.35 vs. limit=6.0 2024-08-15 09:16:07,087 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 20 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-15 09:16:09,186 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.91 vs. limit=15.0 2024-08-15 09:16:25,631 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 27 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-15 09:16:34,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3110570.0, ans=0.0 2024-08-15 09:16:40,113 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 19 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-15 09:16:41,799 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3110670.0, ans=0.125 2024-08-15 09:16:45,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3110670.0, ans=0.125 2024-08-15 09:16:55,255 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.969e+01 2.348e+01 2.579e+01 2.866e+01 4.401e+01, threshold=5.159e+01, percent-clipped=0.0 2024-08-15 09:16:56,751 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 6750, loss[loss=0.1061, beats_loss=0.01065, ecapa_loss=0.0001503, whisper_loss=0.09393, over 21972.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01061, ecapa_loss=0.0001511, whisper_loss=0.09044, over 3870581.02 frames. ], batch size: 89, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:17:03,443 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3110770.0, ans=0.0 2024-08-15 09:17:16,408 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3110870.0, ans=0.0 2024-08-15 09:17:19,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3110870.0, ans=0.1 2024-08-15 09:17:25,126 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3110870.0, ans=0.125 2024-08-15 09:18:09,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3111170.0, ans=0.1 2024-08-15 09:18:21,462 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 6800, loss[loss=0.1111, beats_loss=0.009863, ecapa_loss=0.0001884, whisper_loss=0.09939, over 20947.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01059, ecapa_loss=0.0001529, whisper_loss=0.09033, over 3874539.81 frames. ], batch size: 88, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:18:40,037 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.00 vs. limit=6.0 2024-08-15 09:18:49,956 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3111370.0, ans=0.09899494936611666 2024-08-15 09:18:51,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3111370.0, ans=0.2 2024-08-15 09:18:57,277 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-15 09:19:03,942 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3111470.0, ans=0.0 2024-08-15 09:19:05,300 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 22 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-15 09:19:12,277 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 22 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-15 09:19:21,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3111570.0, ans=0.125 2024-08-15 09:19:26,249 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 26 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-15 09:19:33,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3111670.0, ans=0.2 2024-08-15 09:19:37,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3111670.0, ans=0.125 2024-08-15 09:19:41,878 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.946e+01 2.366e+01 2.737e+01 3.020e+01 4.133e+01, threshold=5.473e+01, percent-clipped=0.0 2024-08-15 09:19:43,259 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 6850, loss[loss=0.1059, beats_loss=0.008924, ecapa_loss=0.0001413, whisper_loss=0.0956, over 18411.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01058, ecapa_loss=0.0001527, whisper_loss=0.09016, over 3856427.14 frames. ], batch size: 71, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:19:58,173 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2024-08-15 09:20:07,168 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 21 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-15 09:20:34,087 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.09 vs. limit=15.0 2024-08-15 09:21:04,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3112270.0, ans=0.0 2024-08-15 09:21:05,642 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 6900, loss[loss=0.1074, beats_loss=0.009534, ecapa_loss=0.0001473, whisper_loss=0.0964, over 21148.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0106, ecapa_loss=0.000152, whisper_loss=0.09107, over 3859921.66 frames. ], batch size: 84, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:21:05,900 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 24 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-15 09:21:07,259 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-15 09:21:12,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3112270.0, ans=0.07 2024-08-15 09:21:14,098 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3112270.0, ans=0.0 2024-08-15 09:21:14,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3112270.0, ans=0.125 2024-08-15 09:21:17,834 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3112270.0, ans=0.125 2024-08-15 09:21:51,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3112470.0, ans=0.125 2024-08-15 09:22:02,150 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3112570.0, ans=0.125 2024-08-15 09:22:21,484 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 23 from LS+wenet, 23 from Vox, 48 fro AS 2024-08-15 09:22:25,829 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.697e+01 2.330e+01 2.607e+01 2.906e+01 3.903e+01, threshold=5.213e+01, percent-clipped=0.0 2024-08-15 09:22:27,878 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 6950, loss[loss=0.09582, beats_loss=0.009093, ecapa_loss=0.0001335, whisper_loss=0.0854, over 15008.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01066, ecapa_loss=0.0001505, whisper_loss=0.09089, over 3896216.93 frames. ], batch size: 57, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:22:35,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3112770.0, ans=0.1 2024-08-15 09:22:40,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3112770.0, ans=0.1 2024-08-15 09:22:46,804 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-15 09:23:13,859 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3112970.0, ans=0.2 2024-08-15 09:23:18,525 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3113070.0, ans=0.1 2024-08-15 09:23:46,772 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.79 vs. limit=22.5 2024-08-15 09:23:53,450 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 7000, loss[loss=0.1019, beats_loss=0.0111, ecapa_loss=0.00014, whisper_loss=0.08937, over 16151.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01064, ecapa_loss=0.0001509, whisper_loss=0.09079, over 3879022.80 frames. ], batch size: 64, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:24:12,474 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3113370.0, ans=0.125 2024-08-15 09:24:14,925 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 09:24:34,097 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3113470.0, ans=0.0 2024-08-15 09:24:37,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3113470.0, ans=0.1 2024-08-15 09:24:44,969 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 22 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-15 09:24:48,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3113570.0, ans=0.125 2024-08-15 09:24:59,396 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.92 vs. limit=15.0 2024-08-15 09:25:03,159 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 22 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-15 09:25:08,987 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-15 09:25:11,776 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.285e+01 2.515e+01 2.817e+01 4.322e+01, threshold=5.031e+01, percent-clipped=0.0 2024-08-15 09:25:12,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3113770.0, ans=0.125 2024-08-15 09:25:13,419 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 7050, loss[loss=0.117, beats_loss=0.01034, ecapa_loss=0.0001421, whisper_loss=0.1052, over 21966.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01071, ecapa_loss=0.0001516, whisper_loss=0.09049, over 3881131.70 frames. ], batch size: 87, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:25:16,005 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 09:25:49,400 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3113970.0, ans=0.0 2024-08-15 09:26:08,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3114070.0, ans=0.125 2024-08-15 09:26:14,164 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-15 09:26:15,866 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3114070.0, ans=0.2 2024-08-15 09:26:24,244 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3114170.0, ans=0.125 2024-08-15 09:26:29,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3114170.0, ans=0.0 2024-08-15 09:26:37,070 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 7100, loss[loss=0.1016, beats_loss=0.01055, ecapa_loss=0.0001435, whisper_loss=0.08964, over 20827.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01069, ecapa_loss=0.000152, whisper_loss=0.09067, over 3903165.19 frames. ], batch size: 79, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:26:39,729 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3114270.0, ans=0.125 2024-08-15 09:26:42,764 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 22 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-15 09:27:07,495 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3114470.0, ans=0.125 2024-08-15 09:27:24,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3114570.0, ans=0.0 2024-08-15 09:27:35,752 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 20 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-15 09:27:37,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3114570.0, ans=0.04949747468305833 2024-08-15 09:27:45,276 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 28 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-15 09:27:53,918 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.689e+01 2.259e+01 2.510e+01 2.858e+01 3.355e+02, threshold=5.020e+01, percent-clipped=2.0 2024-08-15 09:27:55,405 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 7150, loss[loss=0.08861, beats_loss=0.01048, ecapa_loss=0.0001558, whisper_loss=0.07657, over 20698.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01064, ecapa_loss=0.000152, whisper_loss=0.09117, over 3924290.62 frames. ], batch size: 86, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:27:58,112 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.81 vs. limit=12.0 2024-08-15 09:28:09,543 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 24 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-15 09:28:28,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3114970.0, ans=0.125 2024-08-15 09:29:18,991 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 7200, loss[loss=0.1012, beats_loss=0.01142, ecapa_loss=0.0001597, whisper_loss=0.08817, over 18434.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01059, ecapa_loss=0.000151, whisper_loss=0.09182, over 3944956.34 frames. ], batch size: 77, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:29:20,452 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.43 vs. limit=22.5 2024-08-15 09:29:25,718 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3115270.0, ans=0.1 2024-08-15 09:29:31,922 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3115270.0, ans=0.2 2024-08-15 09:29:51,668 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 29 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-15 09:30:04,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3115470.0, ans=0.125 2024-08-15 09:30:11,595 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.57 vs. limit=12.0 2024-08-15 09:30:12,953 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3115570.0, ans=0.1 2024-08-15 09:30:22,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3115570.0, ans=0.125 2024-08-15 09:30:22,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3115570.0, ans=0.125 2024-08-15 09:30:26,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3115670.0, ans=0.125 2024-08-15 09:30:41,796 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.990e+01 2.339e+01 2.550e+01 2.963e+01 5.481e+01, threshold=5.099e+01, percent-clipped=2.0 2024-08-15 09:30:42,809 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.85 vs. limit=5.0 2024-08-15 09:30:43,360 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 7250, loss[loss=0.1128, beats_loss=0.01042, ecapa_loss=0.0001665, whisper_loss=0.1008, over 22111.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01064, ecapa_loss=0.0001515, whisper_loss=0.09164, over 3935087.01 frames. ], batch size: 93, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:31:06,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.whiten.whitening_limit, batch_count=3115870.0, ans=12.0 2024-08-15 09:31:11,752 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3115870.0, ans=0.1 2024-08-15 09:31:41,965 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 26 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-15 09:32:01,685 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.82 vs. limit=15.0 2024-08-15 09:32:02,217 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 7300, loss[loss=0.09196, beats_loss=0.01067, ecapa_loss=0.0001797, whisper_loss=0.0795, over 17278.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01063, ecapa_loss=0.0001518, whisper_loss=0.09153, over 3930759.93 frames. ], batch size: 73, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:32:35,054 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 25 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-15 09:32:39,334 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-15 09:32:52,747 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.28 vs. limit=15.0 2024-08-15 09:33:14,896 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 19 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-15 09:33:15,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3116670.0, ans=0.125 2024-08-15 09:33:20,877 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.754e+01 2.398e+01 2.645e+01 3.010e+01 2.880e+02, threshold=5.290e+01, percent-clipped=2.0 2024-08-15 09:33:22,263 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 7350, loss[loss=0.1005, beats_loss=0.01181, ecapa_loss=0.0001261, whisper_loss=0.08739, over 23292.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01061, ecapa_loss=0.0001518, whisper_loss=0.0915, over 3926748.43 frames. ], batch size: 94, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:33:29,535 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.91 vs. limit=22.5 2024-08-15 09:33:37,888 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 18 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-15 09:33:49,735 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3116870.0, ans=0.2 2024-08-15 09:34:19,094 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.33 vs. limit=22.5 2024-08-15 09:34:25,662 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 22 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-15 09:34:26,420 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2024-08-15 09:34:39,606 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 7400, loss[loss=0.07429, beats_loss=0.0147, ecapa_loss=9.669e-05, whisper_loss=0.05862, over 19913.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01064, ecapa_loss=0.0001513, whisper_loss=0.0912, over 3913481.43 frames. ], batch size: 80, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:35:15,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3117470.0, ans=0.125 2024-08-15 09:35:18,076 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3117470.0, ans=0.0 2024-08-15 09:35:21,323 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.16 vs. limit=15.0 2024-08-15 09:35:57,200 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.94 vs. limit=5.0 2024-08-15 09:35:59,535 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.864e+01 2.404e+01 2.626e+01 2.946e+01 5.024e+01, threshold=5.253e+01, percent-clipped=0.0 2024-08-15 09:36:01,266 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 7450, loss[loss=0.07952, beats_loss=0.009194, ecapa_loss=0.0001369, whisper_loss=0.06896, over 17282.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01067, ecapa_loss=0.0001503, whisper_loss=0.09108, over 3908540.29 frames. ], batch size: 68, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:36:02,026 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3117770.0, ans=0.1 2024-08-15 09:36:10,830 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3117770.0, ans=0.125 2024-08-15 09:36:14,760 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.45 vs. limit=15.0 2024-08-15 09:36:21,626 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.76 vs. limit=15.0 2024-08-15 09:36:40,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3117970.0, ans=0.125 2024-08-15 09:37:17,954 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 7500, loss[loss=0.1074, beats_loss=0.01004, ecapa_loss=0.0001224, whisper_loss=0.09612, over 23346.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01059, ecapa_loss=0.0001513, whisper_loss=0.09126, over 3917366.92 frames. ], batch size: 90, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:37:35,329 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 21 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-15 09:37:43,930 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 25 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-15 09:37:47,345 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.48 vs. limit=6.0 2024-08-15 09:37:49,862 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 18 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-15 09:38:00,367 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3118470.0, ans=0.2 2024-08-15 09:38:09,976 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 17 from LS+wenet, 23 from Vox, 19 fro AS 2024-08-15 09:38:32,517 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.772e+01 2.370e+01 2.711e+01 2.994e+01 4.451e+02, threshold=5.422e+01, percent-clipped=4.0 2024-08-15 09:38:32,539 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 7550, loss[loss=0.09009, beats_loss=0.008455, ecapa_loss=0.0001459, whisper_loss=0.08018, over 16420.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01059, ecapa_loss=0.0001511, whisper_loss=0.09165, over 3885315.56 frames. ], batch size: 63, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:38:54,555 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.32 vs. limit=10.0 2024-08-15 09:38:57,082 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 28 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-15 09:39:07,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3118970.0, ans=0.2 2024-08-15 09:39:16,337 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3118970.0, ans=0.125 2024-08-15 09:39:23,189 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3119070.0, ans=0.2 2024-08-15 09:39:41,461 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-15 09:39:52,963 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 7600, loss[loss=0.09786, beats_loss=0.01118, ecapa_loss=0.0001534, whisper_loss=0.08515, over 23733.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01053, ecapa_loss=0.0001513, whisper_loss=0.09156, over 3868720.33 frames. ], batch size: 94, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:40:56,231 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 22 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-15 09:41:00,514 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3119670.0, ans=0.0 2024-08-15 09:41:09,877 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.254e+01 2.450e+01 2.637e+01 4.565e+01, threshold=4.900e+01, percent-clipped=0.0 2024-08-15 09:41:09,903 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 7650, loss[loss=0.08352, beats_loss=0.01221, ecapa_loss=0.0001327, whisper_loss=0.06998, over 16049.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01048, ecapa_loss=0.000151, whisper_loss=0.09168, over 3858053.81 frames. ], batch size: 63, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:41:14,827 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 15 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-15 09:41:15,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3119770.0, ans=0.125 2024-08-15 09:41:41,979 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-312000.pt 2024-08-15 09:42:18,757 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3120170.0, ans=0.0 2024-08-15 09:42:26,177 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 23 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-15 09:42:27,109 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 7700, loss[loss=0.09746, beats_loss=0.01117, ecapa_loss=0.0001282, whisper_loss=0.085, over 18497.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01052, ecapa_loss=0.00015, whisper_loss=0.09129, over 3836892.45 frames. ], batch size: 73, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:42:45,121 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-15 09:42:51,699 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3120370.0, ans=0.0 2024-08-15 09:42:54,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3120370.0, ans=0.125 2024-08-15 09:43:19,495 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 24 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-15 09:43:36,547 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3120670.0, ans=0.125 2024-08-15 09:43:38,263 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3120670.0, ans=0.04949747468305833 2024-08-15 09:43:39,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3120670.0, ans=0.125 2024-08-15 09:43:47,271 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.349e+01 2.670e+01 3.107e+01 2.208e+02, threshold=5.341e+01, percent-clipped=1.0 2024-08-15 09:43:47,296 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 7750, loss[loss=0.1149, beats_loss=0.008173, ecapa_loss=0.0001662, whisper_loss=0.1051, over 19098.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01049, ecapa_loss=0.0001502, whisper_loss=0.091, over 3832117.23 frames. ], batch size: 74, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:44:09,503 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 23 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-15 09:44:11,625 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.19 vs. limit=15.0 2024-08-15 09:44:33,925 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.59 vs. limit=5.0 2024-08-15 09:44:37,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3121070.0, ans=0.0 2024-08-15 09:44:40,191 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 23 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-15 09:44:41,468 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 35 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-15 09:44:42,974 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3121070.0, ans=0.035 2024-08-15 09:45:05,856 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 7800, loss[loss=0.08775, beats_loss=0.01036, ecapa_loss=0.0001679, whisper_loss=0.07571, over 20752.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01053, ecapa_loss=0.0001496, whisper_loss=0.0909, over 3822069.82 frames. ], batch size: 85, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:45:23,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3121370.0, ans=0.2 2024-08-15 09:45:39,460 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 26 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-15 09:45:44,794 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3121470.0, ans=0.125 2024-08-15 09:45:52,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3121470.0, ans=0.0 2024-08-15 09:45:58,679 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 30 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-15 09:46:02,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=3121570.0, ans=0.05 2024-08-15 09:46:14,306 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-15 09:46:24,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3121770.0, ans=0.0 2024-08-15 09:46:25,110 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.314e+01 2.626e+01 3.075e+01 3.695e+02, threshold=5.252e+01, percent-clipped=2.0 2024-08-15 09:46:25,132 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 7850, loss[loss=0.06461, beats_loss=0.01363, ecapa_loss=0.0001762, whisper_loss=0.04922, over 13852.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01052, ecapa_loss=0.0001505, whisper_loss=0.09102, over 3826069.03 frames. ], batch size: 60, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:46:25,965 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.63 vs. limit=22.5 2024-08-15 09:46:26,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3121770.0, ans=0.125 2024-08-15 09:46:34,183 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.01 vs. limit=15.0 2024-08-15 09:46:36,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3121770.0, ans=0.125 2024-08-15 09:46:37,934 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 25 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-15 09:46:41,198 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3121870.0, ans=0.1 2024-08-15 09:46:54,951 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 09:47:03,126 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.79 vs. limit=10.0 2024-08-15 09:47:33,745 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 7900, loss[loss=0.1103, beats_loss=0.008982, ecapa_loss=0.0001366, whisper_loss=0.09996, over 18131.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01058, ecapa_loss=0.0001502, whisper_loss=0.09106, over 3865116.90 frames. ], batch size: 69, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:47:34,071 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3122270.0, ans=0.125 2024-08-15 09:47:53,432 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.33 vs. limit=12.0 2024-08-15 09:48:01,449 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 24 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-15 09:48:11,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3122470.0, ans=0.125 2024-08-15 09:48:12,577 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 19 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-15 09:48:22,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3122570.0, ans=0.125 2024-08-15 09:48:40,464 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-15 09:48:44,839 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.277e+01 2.656e+01 2.964e+01 2.137e+02, threshold=5.312e+01, percent-clipped=1.0 2024-08-15 09:48:44,869 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 7950, loss[loss=0.0903, beats_loss=0.0117, ecapa_loss=0.0001295, whisper_loss=0.07731, over 16552.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01063, ecapa_loss=0.0001502, whisper_loss=0.09106, over 3868642.98 frames. ], batch size: 62, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:48:45,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3122770.0, ans=0.125 2024-08-15 09:48:57,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3122770.0, ans=0.125 2024-08-15 09:49:13,901 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 37 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-15 09:49:20,753 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.22 vs. limit=22.5 2024-08-15 09:49:24,297 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 18 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-15 09:49:25,575 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 18 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-15 09:49:25,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3122970.0, ans=0.0 2024-08-15 09:49:28,383 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-15 09:49:32,539 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 26 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-15 09:49:35,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3123070.0, ans=0.125 2024-08-15 09:49:50,866 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3123170.0, ans=0.125 2024-08-15 09:49:52,031 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3123170.0, ans=0.125 2024-08-15 09:49:53,493 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 14 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-15 09:49:58,888 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 8000, loss[loss=0.09305, beats_loss=0.01421, ecapa_loss=0.0001521, whisper_loss=0.07731, over 14219.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01066, ecapa_loss=0.0001499, whisper_loss=0.09101, over 3855764.64 frames. ], batch size: 59, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:50:01,157 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.95 vs. limit=15.0 2024-08-15 09:50:26,544 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 18 from LS+wenet, 32 from Vox, 26 fro AS 2024-08-15 09:50:32,639 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3123470.0, ans=0.05 2024-08-15 09:50:40,095 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 19 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-15 09:50:40,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3123470.0, ans=0.2 2024-08-15 09:50:53,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3123570.0, ans=0.0 2024-08-15 09:50:59,523 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3123670.0, ans=0.1 2024-08-15 09:51:00,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3123670.0, ans=0.025 2024-08-15 09:51:04,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3123670.0, ans=0.1 2024-08-15 09:51:12,256 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 16 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-15 09:51:13,541 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.705e+01 2.280e+01 2.544e+01 2.885e+01 5.910e+01, threshold=5.088e+01, percent-clipped=1.0 2024-08-15 09:51:13,563 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 8050, loss[loss=0.08811, beats_loss=0.01142, ecapa_loss=0.0001243, whisper_loss=0.07545, over 13993.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01068, ecapa_loss=0.00015, whisper_loss=0.09051, over 3854075.68 frames. ], batch size: 56, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:51:17,424 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.81 vs. limit=22.5 2024-08-15 09:51:19,782 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 20 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-15 09:51:40,346 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.62 vs. limit=15.0 2024-08-15 09:52:05,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3124070.0, ans=0.2 2024-08-15 09:52:11,410 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3124170.0, ans=0.125 2024-08-15 09:52:22,402 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3124170.0, ans=0.2 2024-08-15 09:52:25,001 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 8100, loss[loss=0.09447, beats_loss=0.01144, ecapa_loss=0.0001664, whisper_loss=0.08136, over 22170.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01061, ecapa_loss=0.0001509, whisper_loss=0.09056, over 3875799.77 frames. ], batch size: 93, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:52:32,244 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 17 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-15 09:52:39,681 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.09 vs. limit=22.5 2024-08-15 09:52:41,386 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=19.72 vs. limit=15.0 2024-08-15 09:52:42,703 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 24 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-15 09:52:45,458 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-15 09:52:45,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3124370.0, ans=0.125 2024-08-15 09:52:45,991 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.72 vs. limit=15.0 2024-08-15 09:52:49,837 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 13 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-15 09:53:16,726 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.01 vs. limit=15.0 2024-08-15 09:53:27,707 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3124670.0, ans=0.0 2024-08-15 09:53:27,830 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3124670.0, ans=0.0 2024-08-15 09:53:32,014 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-15 09:53:33,437 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 26 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-15 09:53:40,754 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.747e+01 2.386e+01 2.609e+01 2.958e+01 3.972e+01, threshold=5.217e+01, percent-clipped=0.0 2024-08-15 09:53:40,776 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 8150, loss[loss=0.113, beats_loss=0.01012, ecapa_loss=0.0001342, whisper_loss=0.1015, over 22466.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01057, ecapa_loss=0.0001521, whisper_loss=0.09027, over 3877922.89 frames. ], batch size: 88, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:53:57,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3124870.0, ans=0.125 2024-08-15 09:54:16,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3124970.0, ans=0.125 2024-08-15 09:54:41,517 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3125070.0, ans=0.1 2024-08-15 09:54:49,220 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3125170.0, ans=0.0 2024-08-15 09:54:57,787 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 20 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-15 09:54:58,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3125170.0, ans=0.07 2024-08-15 09:54:58,208 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.84 vs. limit=15.0 2024-08-15 09:55:06,859 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 8200, loss[loss=0.08931, beats_loss=0.0104, ecapa_loss=0.0001543, whisper_loss=0.07737, over 16940.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01058, ecapa_loss=0.0001523, whisper_loss=0.08962, over 3890241.15 frames. ], batch size: 68, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:55:07,071 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-15 09:55:36,634 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.667e-02 2024-08-15 09:55:38,145 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 23 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-15 09:56:05,445 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 22 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-15 09:56:09,818 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 26 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-15 09:56:20,943 WARNING [optim.py:496] (0/4) Scaling gradients by 0.08162933588027954, model_norm_threshold=52.17145538330078 2024-08-15 09:56:21,127 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.08, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.265e+04, grad_sumsq=3.250e+06, orig_rms_sq=1.005e-02 2024-08-15 09:56:23,720 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.208e+01 2.504e+01 2.791e+01 6.391e+02, threshold=5.008e+01, percent-clipped=1.0 2024-08-15 09:56:23,746 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 8250, loss[loss=0.1062, beats_loss=0.01098, ecapa_loss=0.0001411, whisper_loss=0.09383, over 15735.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01064, ecapa_loss=0.0001514, whisper_loss=0.08921, over 3866211.51 frames. ], batch size: 60, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:56:25,335 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 9 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-15 09:56:30,085 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 23 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-15 09:56:45,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3125870.0, ans=0.025 2024-08-15 09:57:17,232 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3126070.0, ans=0.0 2024-08-15 09:57:37,711 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 8300, loss[loss=0.1142, beats_loss=0.01089, ecapa_loss=0.0001616, whisper_loss=0.1017, over 22255.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01074, ecapa_loss=0.0001511, whisper_loss=0.08954, over 3878259.79 frames. ], batch size: 89, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:57:44,029 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 14 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-15 09:57:53,200 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 17 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-15 09:58:05,712 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.45 vs. limit=22.5 2024-08-15 09:58:09,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3126470.0, ans=0.95 2024-08-15 09:58:42,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3126670.0, ans=0.125 2024-08-15 09:59:00,221 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.746e+01 2.411e+01 2.673e+01 3.013e+01 2.459e+02, threshold=5.345e+01, percent-clipped=1.0 2024-08-15 09:59:00,243 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 8350, loss[loss=0.1298, beats_loss=0.007992, ecapa_loss=0.0001844, whisper_loss=0.12, over 23693.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01077, ecapa_loss=0.0001506, whisper_loss=0.0894, over 3895193.58 frames. ], batch size: 92, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:59:04,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3126770.0, ans=0.0 2024-08-15 09:59:15,376 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3126870.0, ans=0.1 2024-08-15 09:59:18,454 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3126870.0, ans=0.1 2024-08-15 09:59:20,002 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.14 vs. limit=22.5 2024-08-15 09:59:24,022 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 20 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-15 09:59:36,636 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 15 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-15 09:59:38,144 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 24 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-15 09:59:47,515 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 20 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-15 10:00:16,547 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 8400, loss[loss=0.1062, beats_loss=0.01128, ecapa_loss=0.0001618, whisper_loss=0.09333, over 19743.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01066, ecapa_loss=0.0001512, whisper_loss=0.09073, over 3906119.24 frames. ], batch size: 83, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:00:18,182 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3127270.0, ans=0.2 2024-08-15 10:00:27,110 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 15 from LS+wenet, 9 from Vox, 31 fro AS 2024-08-15 10:00:33,013 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3127370.0, ans=0.125 2024-08-15 10:00:54,835 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 25 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-15 10:01:00,842 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 25 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-15 10:01:03,057 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.00 vs. limit=22.5 2024-08-15 10:01:11,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3127570.0, ans=0.125 2024-08-15 10:01:20,765 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 22 from LS+wenet, 23 from Vox, 16 fro AS 2024-08-15 10:01:31,242 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.918e+01 2.355e+01 2.603e+01 2.827e+01 7.121e+01, threshold=5.205e+01, percent-clipped=1.0 2024-08-15 10:01:31,269 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 8450, loss[loss=0.1029, beats_loss=0.008432, ecapa_loss=0.0001866, whisper_loss=0.09264, over 13644.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01061, ecapa_loss=0.000151, whisper_loss=0.09072, over 3876927.56 frames. ], batch size: 54, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:01:33,017 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-15 10:01:50,540 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3127870.0, ans=0.2 2024-08-15 10:01:52,343 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3127870.0, ans=0.125 2024-08-15 10:01:53,287 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-15 10:01:58,355 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 23 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-15 10:02:12,586 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 25 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-15 10:02:52,278 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 8500, loss[loss=0.09583, beats_loss=0.01038, ecapa_loss=0.0001461, whisper_loss=0.08399, over 20262.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01058, ecapa_loss=0.0001503, whisper_loss=0.091, over 3897074.33 frames. ], batch size: 77, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:02:59,591 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3128270.0, ans=0.1 2024-08-15 10:03:13,866 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-15 10:03:32,281 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3128470.0, ans=0.0 2024-08-15 10:03:37,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3128470.0, ans=0.1 2024-08-15 10:03:51,957 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 35 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-15 10:03:56,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3128670.0, ans=0.0 2024-08-15 10:04:02,532 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3128670.0, ans=0.025 2024-08-15 10:04:11,097 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.981e+01 2.400e+01 2.720e+01 3.121e+01 2.458e+02, threshold=5.440e+01, percent-clipped=2.0 2024-08-15 10:04:11,123 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 8550, loss[loss=0.1073, beats_loss=0.01055, ecapa_loss=0.0001842, whisper_loss=0.09489, over 19414.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01058, ecapa_loss=0.0001504, whisper_loss=0.09148, over 3875865.77 frames. ], batch size: 81, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:04:22,182 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3128770.0, ans=0.09899494936611666 2024-08-15 10:04:22,195 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3128770.0, ans=0.125 2024-08-15 10:04:50,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3128970.0, ans=0.2 2024-08-15 10:04:50,960 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-15 10:04:57,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3129070.0, ans=0.125 2024-08-15 10:04:57,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3129070.0, ans=0.125 2024-08-15 10:05:14,330 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 34 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-15 10:05:16,005 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 26 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-15 10:05:19,277 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.17 vs. limit=15.0 2024-08-15 10:05:25,964 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 8600, loss[loss=0.08983, beats_loss=0.01157, ecapa_loss=0.0001317, whisper_loss=0.07694, over 16082.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01051, ecapa_loss=0.00015, whisper_loss=0.09214, over 3881103.91 frames. ], batch size: 65, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:05:46,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3129370.0, ans=0.0 2024-08-15 10:05:54,703 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.85 vs. limit=10.0 2024-08-15 10:06:11,550 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 14 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-15 10:06:13,625 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.39 vs. limit=15.0 2024-08-15 10:06:23,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3129670.0, ans=0.0 2024-08-15 10:06:24,975 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-15 10:06:37,934 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.410e+01 2.647e+01 2.940e+01 4.400e+01, threshold=5.294e+01, percent-clipped=0.0 2024-08-15 10:06:37,956 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 8650, loss[loss=0.09385, beats_loss=0.008452, ecapa_loss=0.0001659, whisper_loss=0.08374, over 15005.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01054, ecapa_loss=0.0001501, whisper_loss=0.09144, over 3864825.71 frames. ], batch size: 59, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:07:03,827 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.45 vs. limit=22.5 2024-08-15 10:07:07,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3129870.0, ans=0.1 2024-08-15 10:07:12,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3129970.0, ans=0.125 2024-08-15 10:07:19,416 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-15 10:08:00,913 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 8700, loss[loss=0.125, beats_loss=0.009209, ecapa_loss=0.0001456, whisper_loss=0.1143, over 23364.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01053, ecapa_loss=0.0001501, whisper_loss=0.09167, over 3867856.76 frames. ], batch size: 90, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:08:05,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3130270.0, ans=0.125 2024-08-15 10:08:14,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3130270.0, ans=0.1 2024-08-15 10:08:18,174 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.56 vs. limit=6.0 2024-08-15 10:08:30,822 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3130370.0, ans=0.0 2024-08-15 10:08:39,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3130470.0, ans=0.125 2024-08-15 10:08:41,980 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3130470.0, ans=0.125 2024-08-15 10:09:00,129 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3130570.0, ans=0.125 2024-08-15 10:09:00,254 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3130570.0, ans=0.1 2024-08-15 10:09:06,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3130670.0, ans=0.0 2024-08-15 10:09:18,844 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.37 vs. limit=15.0 2024-08-15 10:09:21,901 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3130770.0, ans=0.0 2024-08-15 10:09:22,508 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.333e+01 2.545e+01 2.859e+01 2.640e+02, threshold=5.090e+01, percent-clipped=1.0 2024-08-15 10:09:22,531 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 8750, loss[loss=0.1167, beats_loss=0.0104, ecapa_loss=0.0001275, whisper_loss=0.105, over 22875.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01063, ecapa_loss=0.0001507, whisper_loss=0.09107, over 3825861.86 frames. ], batch size: 88, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:09:29,941 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 10:09:31,140 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 21 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-15 10:09:31,358 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3130770.0, ans=0.0 2024-08-15 10:09:41,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3130870.0, ans=0.125 2024-08-15 10:09:41,474 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3130870.0, ans=0.07 2024-08-15 10:09:46,266 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-15 10:09:54,571 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 19 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-15 10:09:56,614 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.87 vs. limit=22.5 2024-08-15 10:10:16,536 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.82 vs. limit=15.0 2024-08-15 10:10:19,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3131070.0, ans=0.1 2024-08-15 10:10:21,697 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3131070.0, ans=0.2 2024-08-15 10:10:24,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3131070.0, ans=0.0 2024-08-15 10:10:32,222 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 20 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-15 10:10:41,477 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 8800, loss[loss=0.08025, beats_loss=0.01159, ecapa_loss=0.0001599, whisper_loss=0.06706, over 18702.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01073, ecapa_loss=0.0001501, whisper_loss=0.09052, over 3846097.15 frames. ], batch size: 77, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:11:09,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3131470.0, ans=0.125 2024-08-15 10:11:11,330 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 23 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-15 10:11:20,273 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3131470.0, ans=0.0 2024-08-15 10:11:26,970 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 18 from LS+wenet, 27 from Vox, 48 fro AS 2024-08-15 10:11:43,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3131670.0, ans=0.125 2024-08-15 10:11:52,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3131670.0, ans=0.125 2024-08-15 10:11:54,614 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.912e+01 2.280e+01 2.531e+01 2.796e+01 4.372e+01, threshold=5.061e+01, percent-clipped=0.0 2024-08-15 10:11:54,636 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 8850, loss[loss=0.1005, beats_loss=0.01028, ecapa_loss=0.0001264, whisper_loss=0.08897, over 18447.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0108, ecapa_loss=0.0001495, whisper_loss=0.08962, over 3842175.04 frames. ], batch size: 70, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:12:09,515 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.88 vs. limit=15.0 2024-08-15 10:12:13,837 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 26 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-15 10:12:13,956 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3131870.0, ans=0.125 2024-08-15 10:12:16,781 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-15 10:12:22,451 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.13 vs. limit=15.0 2024-08-15 10:12:24,652 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3131970.0, ans=0.04949747468305833 2024-08-15 10:12:25,743 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-15 10:12:33,093 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3131970.0, ans=0.025 2024-08-15 10:13:01,762 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.46 vs. limit=22.5 2024-08-15 10:13:08,900 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 8900, loss[loss=0.09672, beats_loss=0.01069, ecapa_loss=0.0001456, whisper_loss=0.08458, over 17378.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0107, ecapa_loss=0.0001495, whisper_loss=0.09051, over 3855409.41 frames. ], batch size: 71, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:13:14,903 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3132270.0, ans=0.125 2024-08-15 10:13:15,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3132270.0, ans=10.0 2024-08-15 10:13:15,084 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.94 vs. limit=15.0 2024-08-15 10:13:23,652 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3132370.0, ans=0.0 2024-08-15 10:13:31,642 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-15 10:13:35,427 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 24 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-15 10:13:39,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3132470.0, ans=0.0 2024-08-15 10:13:46,349 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.48 vs. limit=15.0 2024-08-15 10:13:46,853 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 25 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-15 10:14:11,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3132670.0, ans=0.1 2024-08-15 10:14:12,616 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3132670.0, ans=0.125 2024-08-15 10:14:18,807 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.768e+01 2.382e+01 2.533e+01 2.856e+01 1.200e+02, threshold=5.066e+01, percent-clipped=2.0 2024-08-15 10:14:18,829 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 8950, loss[loss=0.1048, beats_loss=0.01129, ecapa_loss=0.0001458, whisper_loss=0.09207, over 19248.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01067, ecapa_loss=0.0001501, whisper_loss=0.0905, over 3861071.92 frames. ], batch size: 74, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:14:25,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3132770.0, ans=0.0 2024-08-15 10:14:27,935 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3132770.0, ans=0.2 2024-08-15 10:14:45,234 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.92 vs. limit=6.0 2024-08-15 10:14:53,283 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3132970.0, ans=0.1 2024-08-15 10:14:57,454 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.66 vs. limit=12.0 2024-08-15 10:15:01,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3133070.0, ans=0.1 2024-08-15 10:15:20,000 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-15 10:15:29,242 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 9000, loss[loss=0.114, beats_loss=0.01088, ecapa_loss=0.000139, whisper_loss=0.1017, over 22375.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01066, ecapa_loss=0.0001501, whisper_loss=0.09111, over 3872228.17 frames. ], batch size: 89, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:15:29,244 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-15 10:16:12,802 INFO [train_multi_KD3.py:1149] (0/4) Epoch 22, validation on ASR_libri: loss=0.2522, beats_loss=0, ecapa_loss=0.0005364, whisper_loss=0.2468, over 922467.00 frames. 2024-08-15 10:16:35,752 INFO [train_multi_KD3.py:1149] (0/4) Epoch 22, validation on SV_voxceleb1: loss=0.004068, beats_loss=0, ecapa_loss=0.0004068, whisper_loss=0, over 939242.00 frames. 2024-08-15 10:17:27,064 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.6985, 2.0015, 2.2458, 1.7943, 1.6383, 2.2001, 2.8209, 1.6418], device='cuda:0') 2024-08-15 10:18:37,891 INFO [train_multi_KD3.py:1149] (0/4) Epoch 22, validation on AT_audioset: loss=0.02332, beats_loss=0.02332, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 10:18:37,896 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32601MB 2024-08-15 10:18:43,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3133270.0, ans=0.0 2024-08-15 10:18:57,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3133370.0, ans=0.0 2024-08-15 10:19:00,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3133370.0, ans=0.125 2024-08-15 10:19:04,833 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3133470.0, ans=0.125 2024-08-15 10:19:09,896 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 15 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-15 10:19:17,098 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3133470.0, ans=0.035 2024-08-15 10:19:31,409 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3133570.0, ans=0.2 2024-08-15 10:19:36,281 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.84 vs. limit=15.0 2024-08-15 10:19:36,990 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-15 10:19:44,201 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3133670.0, ans=0.0 2024-08-15 10:19:45,266 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-15 10:19:47,291 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.28 vs. limit=15.0 2024-08-15 10:19:48,037 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.756e+01 2.223e+01 2.558e+01 2.882e+01 3.996e+01, threshold=5.117e+01, percent-clipped=0.0 2024-08-15 10:19:48,059 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 9050, loss[loss=0.09062, beats_loss=0.01364, ecapa_loss=9.513e-05, whisper_loss=0.07603, over 16202.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01062, ecapa_loss=0.0001502, whisper_loss=0.09109, over 3870917.82 frames. ], batch size: 61, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:20:03,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3133870.0, ans=0.125 2024-08-15 10:20:09,415 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.82 vs. limit=15.0 2024-08-15 10:20:16,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3133970.0, ans=0.125 2024-08-15 10:20:27,312 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=3133970.0, ans=0.02 2024-08-15 10:20:44,707 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.67 vs. limit=15.0 2024-08-15 10:20:57,591 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 9100, loss[loss=0.09626, beats_loss=0.01176, ecapa_loss=0.0001357, whisper_loss=0.08314, over 22517.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0106, ecapa_loss=0.0001507, whisper_loss=0.09081, over 3873780.22 frames. ], batch size: 90, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:21:02,005 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 28 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-15 10:21:07,752 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3134270.0, ans=0.125 2024-08-15 10:21:13,058 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 20 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-15 10:21:20,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3134370.0, ans=0.125 2024-08-15 10:21:22,518 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.83 vs. limit=15.0 2024-08-15 10:21:24,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3134470.0, ans=0.09899494936611666 2024-08-15 10:21:35,700 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3134470.0, ans=0.1 2024-08-15 10:21:48,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3134570.0, ans=0.0 2024-08-15 10:22:08,895 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.360e+01 2.655e+01 2.981e+01 2.632e+02, threshold=5.310e+01, percent-clipped=2.0 2024-08-15 10:22:08,918 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 9150, loss[loss=0.09271, beats_loss=0.009192, ecapa_loss=0.0001849, whisper_loss=0.08167, over 21075.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01058, ecapa_loss=0.0001499, whisper_loss=0.09145, over 3899567.60 frames. ], batch size: 89, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:22:16,184 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3134770.0, ans=0.125 2024-08-15 10:22:28,354 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3134870.0, ans=0.0 2024-08-15 10:22:39,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3134970.0, ans=0.125 2024-08-15 10:22:39,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3134970.0, ans=0.125 2024-08-15 10:22:41,270 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3134970.0, ans=0.0 2024-08-15 10:22:45,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3134970.0, ans=0.125 2024-08-15 10:22:55,964 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.81 vs. limit=10.0 2024-08-15 10:22:56,780 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 17 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-15 10:23:15,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3135170.0, ans=0.0 2024-08-15 10:23:17,599 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.55 vs. limit=15.0 2024-08-15 10:23:21,865 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-15 10:23:23,193 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 9200, loss[loss=0.1057, beats_loss=0.01089, ecapa_loss=0.0001569, whisper_loss=0.09322, over 22065.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01064, ecapa_loss=0.0001489, whisper_loss=0.09086, over 3905914.06 frames. ], batch size: 89, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:23:26,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3135270.0, ans=0.125 2024-08-15 10:23:31,440 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 29 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-15 10:23:45,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3135370.0, ans=0.125 2024-08-15 10:24:25,437 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3135570.0, ans=0.0 2024-08-15 10:24:29,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3135670.0, ans=0.125 2024-08-15 10:24:29,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3135670.0, ans=0.1 2024-08-15 10:24:30,199 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.79 vs. limit=15.0 2024-08-15 10:24:46,683 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.313e+01 2.628e+01 2.847e+01 1.469e+02, threshold=5.255e+01, percent-clipped=1.0 2024-08-15 10:24:46,704 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 9250, loss[loss=0.1095, beats_loss=0.01002, ecapa_loss=0.0001669, whisper_loss=0.09781, over 21784.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01063, ecapa_loss=0.0001509, whisper_loss=0.09019, over 3904944.48 frames. ], batch size: 88, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:24:48,326 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 20 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-15 10:24:59,756 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3135770.0, ans=0.125 2024-08-15 10:25:21,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3135970.0, ans=0.125 2024-08-15 10:25:23,216 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.72 vs. limit=15.0 2024-08-15 10:25:24,788 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.52 vs. limit=10.0 2024-08-15 10:25:26,047 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-15 10:25:46,582 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 18 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-15 10:26:07,488 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3136170.0, ans=0.2 2024-08-15 10:26:13,808 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 9300, loss[loss=0.08877, beats_loss=0.01273, ecapa_loss=0.0001307, whisper_loss=0.07473, over 15718.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01072, ecapa_loss=0.0001505, whisper_loss=0.08945, over 3870866.20 frames. ], batch size: 61, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:26:15,531 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 24 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-15 10:26:18,775 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 10:26:23,581 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3136270.0, ans=0.125 2024-08-15 10:26:24,133 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.47 vs. limit=15.0 2024-08-15 10:26:30,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3136370.0, ans=0.125 2024-08-15 10:26:37,673 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 22 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-15 10:26:44,980 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 20 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-15 10:26:51,714 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.07 vs. limit=15.0 2024-08-15 10:27:07,320 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 34 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-15 10:27:26,485 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.790e+01 2.363e+01 2.589e+01 2.922e+01 5.036e+01, threshold=5.179e+01, percent-clipped=0.0 2024-08-15 10:27:26,510 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 9350, loss[loss=0.1116, beats_loss=0.01127, ecapa_loss=0.0001515, whisper_loss=0.09883, over 19634.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01069, ecapa_loss=0.0001501, whisper_loss=0.09058, over 3874373.66 frames. ], batch size: 78, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:27:33,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3136770.0, ans=0.125 2024-08-15 10:27:49,068 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 32 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-15 10:27:57,844 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.12 vs. limit=10.0 2024-08-15 10:28:07,489 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 40 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-15 10:28:10,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3137070.0, ans=0.125 2024-08-15 10:28:14,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3137070.0, ans=0.0 2024-08-15 10:28:23,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3137170.0, ans=0.2 2024-08-15 10:28:35,963 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 9400, loss[loss=0.08541, beats_loss=0.01175, ecapa_loss=0.0001206, whisper_loss=0.07245, over 22984.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01068, ecapa_loss=0.0001504, whisper_loss=0.09084, over 3897392.62 frames. ], batch size: 93, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:28:51,317 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 19 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-15 10:29:03,672 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 18 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-15 10:29:06,778 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3137470.0, ans=0.025 2024-08-15 10:29:08,191 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-15 10:29:09,715 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.69 vs. limit=6.0 2024-08-15 10:29:24,296 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3137570.0, ans=0.125 2024-08-15 10:29:31,537 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3137670.0, ans=0.125 2024-08-15 10:29:38,240 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 20 from LS+wenet, 13 from Vox, 40 fro AS 2024-08-15 10:29:39,617 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 21 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-15 10:29:39,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3137670.0, ans=0.0 2024-08-15 10:29:44,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3137770.0, ans=0.125 2024-08-15 10:29:45,424 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.347e+01 2.545e+01 2.871e+01 4.993e+01, threshold=5.089e+01, percent-clipped=0.0 2024-08-15 10:29:45,449 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 9450, loss[loss=0.1165, beats_loss=0.01028, ecapa_loss=0.0001351, whisper_loss=0.1049, over 22131.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01076, ecapa_loss=0.0001497, whisper_loss=0.09004, over 3881904.18 frames. ], batch size: 88, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:30:04,952 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3137870.0, ans=0.04949747468305833 2024-08-15 10:30:08,044 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3137870.0, ans=0.125 2024-08-15 10:30:09,407 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3137870.0, ans=0.125 2024-08-15 10:30:10,601 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-15 10:30:18,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3137970.0, ans=0.0 2024-08-15 10:30:22,894 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3137970.0, ans=0.125 2024-08-15 10:30:26,817 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 17 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-15 10:30:29,813 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3138070.0, ans=0.125 2024-08-15 10:30:40,475 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-15 10:30:40,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3138170.0, ans=0.125 2024-08-15 10:30:43,265 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3138170.0, ans=0.125 2024-08-15 10:30:52,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3138170.0, ans=0.125 2024-08-15 10:30:54,637 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 9500, loss[loss=0.09428, beats_loss=0.0112, ecapa_loss=0.0001395, whisper_loss=0.08168, over 16370.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01078, ecapa_loss=0.00015, whisper_loss=0.0897, over 3880356.15 frames. ], batch size: 63, lr: 2.82e-03, grad_scale: 1.152921504606847e+18 2024-08-15 10:31:05,909 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-15 10:31:16,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3138370.0, ans=0.0 2024-08-15 10:31:24,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3138470.0, ans=0.125 2024-08-15 10:31:24,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3138470.0, ans=0.0 2024-08-15 10:31:34,648 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3138570.0, ans=0.2 2024-08-15 10:31:47,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3138570.0, ans=0.0 2024-08-15 10:31:48,204 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3138670.0, ans=0.125 2024-08-15 10:31:48,591 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.87 vs. limit=15.0 2024-08-15 10:32:03,579 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.809e+01 2.464e+01 2.657e+01 3.059e+01 1.936e+02, threshold=5.313e+01, percent-clipped=3.0 2024-08-15 10:32:03,601 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 9550, loss[loss=0.1045, beats_loss=0.01096, ecapa_loss=0.0001356, whisper_loss=0.09221, over 16784.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01077, ecapa_loss=0.0001496, whisper_loss=0.08981, over 3855255.06 frames. ], batch size: 66, lr: 2.82e-03, grad_scale: 1.152921504606847e+18 2024-08-15 10:32:08,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3138770.0, ans=0.1 2024-08-15 10:32:11,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3138770.0, ans=0.1 2024-08-15 10:32:16,780 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-15 10:32:26,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3138870.0, ans=0.125 2024-08-15 10:32:28,280 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.45 vs. limit=15.0 2024-08-15 10:32:30,926 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3138970.0, ans=0.125 2024-08-15 10:32:41,747 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 26 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-15 10:32:49,659 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.66 vs. limit=15.0 2024-08-15 10:33:00,545 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3139170.0, ans=0.125 2024-08-15 10:33:13,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3139170.0, ans=0.0 2024-08-15 10:33:15,640 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 9600, loss[loss=0.07591, beats_loss=0.0115, ecapa_loss=0.0001384, whisper_loss=0.06302, over 18259.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01071, ecapa_loss=0.0001506, whisper_loss=0.08996, over 3853974.47 frames. ], batch size: 74, lr: 2.82e-03, grad_scale: 1.152921504606847e+18 2024-08-15 10:33:28,984 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 20 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-15 10:33:32,317 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 22 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-15 10:33:40,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3139370.0, ans=0.2 2024-08-15 10:33:44,924 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 10:33:46,310 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 27 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-15 10:33:47,817 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 16 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-15 10:33:57,876 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.61 vs. limit=22.5 2024-08-15 10:34:00,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3139570.0, ans=0.0 2024-08-15 10:34:15,913 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3139670.0, ans=0.0 2024-08-15 10:34:21,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3139670.0, ans=0.125 2024-08-15 10:34:22,724 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 19 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-15 10:34:26,670 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 9650, loss[loss=0.1229, beats_loss=0.006731, ecapa_loss=0.0002028, whisper_loss=0.1141, over 19068.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01058, ecapa_loss=0.0001516, whisper_loss=0.08997, over 3824211.84 frames. ], batch size: 77, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:34:27,875 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.910e+01 2.224e+01 2.493e+01 2.795e+01 4.633e+01, threshold=4.986e+01, percent-clipped=0.0 2024-08-15 10:34:33,881 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 26 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-15 10:34:45,203 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.75 vs. limit=12.0 2024-08-15 10:34:46,087 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 29 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-15 10:34:56,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3139970.0, ans=0.1 2024-08-15 10:35:00,087 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-15 10:35:04,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3139970.0, ans=0.2 2024-08-15 10:35:11,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3140070.0, ans=0.0 2024-08-15 10:35:19,790 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 21 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-15 10:35:25,584 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3140170.0, ans=0.125 2024-08-15 10:35:26,552 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-15 10:35:34,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3140270.0, ans=0.125 2024-08-15 10:35:36,021 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 9700, loss[loss=0.1154, beats_loss=0.009379, ecapa_loss=0.0001534, whisper_loss=0.1045, over 19463.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01053, ecapa_loss=0.0001518, whisper_loss=0.09054, over 3820693.84 frames. ], batch size: 75, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:35:37,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3140270.0, ans=0.2 2024-08-15 10:35:40,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3140270.0, ans=0.125 2024-08-15 10:35:41,874 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3140270.0, ans=0.0 2024-08-15 10:36:16,350 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3140570.0, ans=0.2 2024-08-15 10:36:18,899 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 14 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-15 10:36:19,643 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.55 vs. limit=15.0 2024-08-15 10:36:29,538 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.67 vs. limit=15.0 2024-08-15 10:36:32,807 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 25 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-15 10:36:34,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3140670.0, ans=0.125 2024-08-15 10:36:40,298 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.27 vs. limit=15.0 2024-08-15 10:36:42,607 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-15 10:36:45,568 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 9750, loss[loss=0.1006, beats_loss=0.009197, ecapa_loss=0.0001752, whisper_loss=0.08964, over 19797.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0106, ecapa_loss=0.0001509, whisper_loss=0.09003, over 3811464.12 frames. ], batch size: 79, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:36:46,841 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.834e+01 2.354e+01 2.591e+01 2.841e+01 9.647e+01, threshold=5.183e+01, percent-clipped=2.0 2024-08-15 10:36:47,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3140770.0, ans=10.0 2024-08-15 10:36:50,249 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3140770.0, ans=0.0 2024-08-15 10:37:03,626 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-15 10:37:31,833 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 20 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-15 10:37:36,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3141070.0, ans=0.1 2024-08-15 10:37:37,424 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 24 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-15 10:37:40,861 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.26 vs. limit=10.0 2024-08-15 10:37:44,677 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3141170.0, ans=0.0 2024-08-15 10:37:48,963 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3141170.0, ans=0.2 2024-08-15 10:37:55,836 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 9800, loss[loss=0.08627, beats_loss=0.01151, ecapa_loss=0.0001409, whisper_loss=0.07335, over 18877.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01065, ecapa_loss=0.0001502, whisper_loss=0.08958, over 3800538.73 frames. ], batch size: 77, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:38:15,134 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.26 vs. limit=15.0 2024-08-15 10:38:21,401 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 27 from LS+wenet, 30 from Vox, 21 fro AS 2024-08-15 10:38:22,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3141470.0, ans=0.125 2024-08-15 10:38:39,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3141570.0, ans=0.0 2024-08-15 10:38:44,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.whiten.whitening_limit, batch_count=3141570.0, ans=12.0 2024-08-15 10:39:05,430 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 9850, loss[loss=0.07037, beats_loss=0.01313, ecapa_loss=0.0001469, whisper_loss=0.05577, over 14929.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01057, ecapa_loss=0.0001506, whisper_loss=0.09025, over 3809236.97 frames. ], batch size: 60, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:39:06,750 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.699e+01 2.300e+01 2.506e+01 2.923e+01 9.908e+01, threshold=5.012e+01, percent-clipped=1.0 2024-08-15 10:39:09,760 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 41 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-15 10:39:16,916 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3141770.0, ans=0.125 2024-08-15 10:39:18,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3141870.0, ans=0.0 2024-08-15 10:39:52,826 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.87 vs. limit=15.0 2024-08-15 10:39:59,160 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3142170.0, ans=0.09899494936611666 2024-08-15 10:40:05,312 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 18 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-15 10:40:12,347 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 18 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-15 10:40:13,761 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 9900, loss[loss=0.09772, beats_loss=0.01123, ecapa_loss=0.0001095, whisper_loss=0.0854, over 14556.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01054, ecapa_loss=0.0001501, whisper_loss=0.09113, over 3840106.55 frames. ], batch size: 56, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:40:51,114 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 18 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-15 10:40:52,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3142470.0, ans=0.0 2024-08-15 10:40:53,533 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 18 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-15 10:41:11,114 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 21 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-15 10:41:13,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3142670.0, ans=0.125 2024-08-15 10:41:22,151 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 9950, loss[loss=0.09173, beats_loss=0.01333, ecapa_loss=0.0001297, whisper_loss=0.0771, over 22770.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01054, ecapa_loss=0.0001504, whisper_loss=0.0907, over 3833163.13 frames. ], batch size: 93, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:41:24,921 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.971e+01 2.404e+01 2.640e+01 2.920e+01 4.147e+01, threshold=5.279e+01, percent-clipped=0.0 2024-08-15 10:41:26,623 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 22 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-15 10:41:44,182 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3142870.0, ans=0.125 2024-08-15 10:41:55,990 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.09 vs. limit=6.0 2024-08-15 10:41:59,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3142970.0, ans=0.1 2024-08-15 10:42:05,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3143070.0, ans=0.125 2024-08-15 10:42:09,283 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3143070.0, ans=0.125 2024-08-15 10:42:25,903 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3143170.0, ans=0.2 2024-08-15 10:42:28,594 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 18 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-15 10:42:32,807 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 10000, loss[loss=0.09887, beats_loss=0.0112, ecapa_loss=0.0001433, whisper_loss=0.08624, over 19481.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01056, ecapa_loss=0.0001525, whisper_loss=0.09055, over 3853542.72 frames. ], batch size: 78, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:42:33,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3143270.0, ans=0.125 2024-08-15 10:42:33,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3143270.0, ans=0.125 2024-08-15 10:42:38,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3143270.0, ans=0.0 2024-08-15 10:43:04,267 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-15 10:43:50,144 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 10050, loss[loss=0.1076, beats_loss=0.01095, ecapa_loss=0.0001569, whisper_loss=0.09512, over 22779.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01055, ecapa_loss=0.0001521, whisper_loss=0.09057, over 3864629.12 frames. ], batch size: 93, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:43:53,507 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.380e+01 2.609e+01 2.956e+01 1.893e+02, threshold=5.219e+01, percent-clipped=1.0 2024-08-15 10:43:59,029 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 31 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-15 10:44:26,973 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 19 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-15 10:44:50,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3144070.0, ans=0.025 2024-08-15 10:45:05,378 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.94 vs. limit=15.0 2024-08-15 10:45:13,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3144170.0, ans=0.125 2024-08-15 10:45:16,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3144170.0, ans=0.125 2024-08-15 10:45:21,133 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 29 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-15 10:45:26,971 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 22 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-15 10:45:28,056 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 10100, loss[loss=0.1027, beats_loss=0.01115, ecapa_loss=0.0001345, whisper_loss=0.09018, over 18313.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01058, ecapa_loss=0.0001519, whisper_loss=0.09035, over 3889926.66 frames. ], batch size: 71, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:45:30,188 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 19 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-15 10:45:33,265 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-15 10:45:36,307 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3144270.0, ans=0.2 2024-08-15 10:45:55,992 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 26 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-15 10:45:58,441 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.12 vs. limit=10.0 2024-08-15 10:46:11,343 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.87 vs. limit=15.0 2024-08-15 10:46:15,355 INFO [train_multi_KD3.py:844] (0/4) A total of 96 cuts. 26 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-15 10:46:22,420 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3144470.0, ans=0.125 2024-08-15 10:46:52,819 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3144570.0, ans=0.1 2024-08-15 10:47:24,271 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 10150, loss[loss=0.1181, beats_loss=0.008362, ecapa_loss=0.00014, whisper_loss=0.1083, over 19322.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01054, ecapa_loss=0.0001531, whisper_loss=0.09038, over 3896802.31 frames. ], batch size: 74, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:47:29,577 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.324e+01 2.588e+01 2.924e+01 3.968e+01, threshold=5.175e+01, percent-clipped=0.0 2024-08-15 10:47:36,742 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-15 10:48:29,026 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.52 vs. limit=22.5 2024-08-15 10:48:39,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3145070.0, ans=0.125 2024-08-15 10:48:57,763 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.56 vs. limit=12.0 2024-08-15 10:49:06,182 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 10200, loss[loss=0.1084, beats_loss=0.01228, ecapa_loss=0.0001328, whisper_loss=0.0948, over 23358.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01046, ecapa_loss=0.0001531, whisper_loss=0.09127, over 3910845.09 frames. ], batch size: 92, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:49:06,682 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.97 vs. limit=15.0 2024-08-15 10:49:14,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3145270.0, ans=0.125 2024-08-15 10:49:15,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3145270.0, ans=0.125 2024-08-15 10:49:39,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3145470.0, ans=0.1 2024-08-15 10:49:54,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3145570.0, ans=0.125 2024-08-15 10:49:56,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3145570.0, ans=0.125 2024-08-15 10:50:02,041 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 25 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-15 10:50:02,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3145570.0, ans=0.125 2024-08-15 10:50:02,518 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.33 vs. limit=22.5 2024-08-15 10:50:15,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3145670.0, ans=0.2 2024-08-15 10:50:20,538 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 24 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-15 10:50:23,395 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 10250, loss[loss=0.09308, beats_loss=0.008393, ecapa_loss=0.0001923, whisper_loss=0.08277, over 21760.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01042, ecapa_loss=0.000154, whisper_loss=0.09094, over 3922954.60 frames. ], batch size: 91, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:50:24,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3145770.0, ans=0.125 2024-08-15 10:50:26,706 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.845e+01 2.258e+01 2.433e+01 2.798e+01 3.625e+01, threshold=4.866e+01, percent-clipped=0.0 2024-08-15 10:51:02,677 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3145970.0, ans=0.1 2024-08-15 10:51:42,166 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 10300, loss[loss=0.1116, beats_loss=0.008077, ecapa_loss=0.0001784, whisper_loss=0.1017, over 17100.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01049, ecapa_loss=0.0001535, whisper_loss=0.09019, over 3926490.48 frames. ], batch size: 71, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:52:24,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3146470.0, ans=0.125 2024-08-15 10:52:30,649 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.71 vs. limit=15.0 2024-08-15 10:52:34,325 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-15 10:52:37,449 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3146570.0, ans=0.2 2024-08-15 10:53:05,766 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 10350, loss[loss=0.1292, beats_loss=0.009194, ecapa_loss=0.0001763, whisper_loss=0.1183, over 22125.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01054, ecapa_loss=0.0001532, whisper_loss=0.09029, over 3940408.20 frames. ], batch size: 90, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:53:08,854 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.977e+01 2.405e+01 2.644e+01 3.063e+01 2.497e+02, threshold=5.287e+01, percent-clipped=1.0 2024-08-15 10:53:13,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3146770.0, ans=0.07 2024-08-15 10:53:19,757 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3146870.0, ans=0.125 2024-08-15 10:53:20,300 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.99 vs. limit=15.0 2024-08-15 10:53:41,150 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 17 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-15 10:53:42,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3146970.0, ans=0.1 2024-08-15 10:53:48,317 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.208e+05 2024-08-15 10:54:22,399 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 25 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-15 10:54:23,773 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 23 from LS+wenet, 34 from Vox, 36 fro AS 2024-08-15 10:54:24,343 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.70 vs. limit=15.0 2024-08-15 10:54:28,439 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 10400, loss[loss=0.09271, beats_loss=0.01231, ecapa_loss=0.0001152, whisper_loss=0.07925, over 19297.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0106, ecapa_loss=0.0001512, whisper_loss=0.08938, over 3906311.21 frames. ], batch size: 75, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:54:38,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3147270.0, ans=0.2 2024-08-15 10:54:41,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3147270.0, ans=0.125 2024-08-15 10:55:09,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3147470.0, ans=0.125 2024-08-15 10:55:14,479 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 31 from LS+wenet, 8 from Vox, 23 fro AS 2024-08-15 10:55:16,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3147470.0, ans=0.0 2024-08-15 10:55:24,489 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 19 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-15 10:55:52,232 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 10450, loss[loss=0.09717, beats_loss=0.01043, ecapa_loss=0.0001593, whisper_loss=0.08515, over 16494.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01063, ecapa_loss=0.0001503, whisper_loss=0.0894, over 3880773.74 frames. ], batch size: 69, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:55:52,393 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 32 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-15 10:55:54,997 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.272e+01 2.480e+01 2.758e+01 4.514e+01, threshold=4.959e+01, percent-clipped=0.0 2024-08-15 10:56:07,911 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3147870.0, ans=0.125 2024-08-15 10:56:13,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3147870.0, ans=0.1 2024-08-15 10:56:25,499 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3147970.0, ans=0.0 2024-08-15 10:56:27,939 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-15 10:56:45,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3148070.0, ans=0.125 2024-08-15 10:56:57,687 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.364e+05 2024-08-15 10:57:00,942 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 28 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-15 10:57:02,701 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.95 vs. limit=22.5 2024-08-15 10:57:08,648 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 10500, loss[loss=0.07347, beats_loss=0.01111, ecapa_loss=0.0001811, whisper_loss=0.06056, over 18269.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01057, ecapa_loss=0.000151, whisper_loss=0.09002, over 3873560.71 frames. ], batch size: 81, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:57:22,208 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 22 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-15 10:57:28,029 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.33 vs. limit=15.0 2024-08-15 10:57:33,095 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-15 10:57:38,298 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 23 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-15 10:57:51,629 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 10:57:52,042 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.37 vs. limit=15.0 2024-08-15 10:57:56,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3148570.0, ans=0.0 2024-08-15 10:58:03,621 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 17 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-15 10:58:23,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3148670.0, ans=0.0 2024-08-15 10:58:26,089 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3148670.0, ans=0.0 2024-08-15 10:58:31,826 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 10550, loss[loss=0.09865, beats_loss=0.00965, ecapa_loss=0.0002025, whisper_loss=0.08698, over 21238.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01052, ecapa_loss=0.0001511, whisper_loss=0.09019, over 3854717.04 frames. ], batch size: 93, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:58:34,825 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.372e+01 2.650e+01 2.883e+01 3.926e+01, threshold=5.299e+01, percent-clipped=0.0 2024-08-15 10:58:38,082 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 20 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-15 10:58:52,232 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.66 vs. limit=12.0 2024-08-15 10:58:57,051 INFO [train_multi_KD3.py:844] (0/4) A total of 97 cuts. 25 from LS+wenet, 27 from Vox, 45 fro AS 2024-08-15 10:59:09,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3148970.0, ans=0.125 2024-08-15 10:59:34,621 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 18 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-15 10:59:49,025 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 10600, loss[loss=0.0979, beats_loss=0.01157, ecapa_loss=0.0001084, whisper_loss=0.08524, over 21020.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01054, ecapa_loss=0.0001513, whisper_loss=0.08993, over 3846807.11 frames. ], batch size: 81, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:59:49,237 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-15 10:59:56,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3149270.0, ans=0.1 2024-08-15 11:00:05,667 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 11 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-15 11:00:07,013 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 23 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-15 11:00:32,647 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 21 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-15 11:00:39,057 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.72 vs. limit=22.5 2024-08-15 11:00:40,266 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3149570.0, ans=0.125 2024-08-15 11:00:44,123 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 19 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-15 11:00:50,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3149670.0, ans=0.0 2024-08-15 11:01:04,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3149670.0, ans=0.0 2024-08-15 11:01:06,362 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 10650, loss[loss=0.09817, beats_loss=0.01297, ecapa_loss=0.0001413, whisper_loss=0.08379, over 22239.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01053, ecapa_loss=0.0001503, whisper_loss=0.08951, over 3816135.05 frames. ], batch size: 92, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:01:06,580 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-15 11:01:09,332 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.988e+01 2.413e+01 2.629e+01 2.898e+01 3.897e+01, threshold=5.257e+01, percent-clipped=0.0 2024-08-15 11:01:42,001 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 28 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-15 11:01:47,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3149970.0, ans=0.1 2024-08-15 11:01:48,518 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 18 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-15 11:01:48,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3149970.0, ans=0.0 2024-08-15 11:01:51,143 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 16 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-15 11:01:54,962 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 21 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-15 11:02:03,095 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 12 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-15 11:02:03,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3150070.0, ans=0.125 2024-08-15 11:02:09,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3150070.0, ans=0.125 2024-08-15 11:02:11,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3150070.0, ans=0.125 2024-08-15 11:02:22,109 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.86 vs. limit=15.0 2024-08-15 11:02:30,775 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 10700, loss[loss=0.09351, beats_loss=0.01072, ecapa_loss=0.0001415, whisper_loss=0.08137, over 16716.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01056, ecapa_loss=0.0001495, whisper_loss=0.09009, over 3843850.64 frames. ], batch size: 67, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:02:40,723 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3150270.0, ans=0.125 2024-08-15 11:02:42,408 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-15 11:02:50,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3150370.0, ans=0.1 2024-08-15 11:02:53,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3150370.0, ans=0.09899494936611666 2024-08-15 11:03:23,852 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.20 vs. limit=10.0 2024-08-15 11:03:40,766 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.66 vs. limit=15.0 2024-08-15 11:03:44,579 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 10750, loss[loss=0.1251, beats_loss=0.01032, ecapa_loss=0.000148, whisper_loss=0.1133, over 23152.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01057, ecapa_loss=0.0001496, whisper_loss=0.0904, over 3823875.34 frames. ], batch size: 87, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:03:47,020 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.22 vs. limit=10.0 2024-08-15 11:03:47,413 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.904e+01 2.262e+01 2.469e+01 2.772e+01 4.273e+01, threshold=4.939e+01, percent-clipped=0.0 2024-08-15 11:03:57,916 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3150870.0, ans=0.2 2024-08-15 11:04:07,996 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-15 11:04:22,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3150970.0, ans=0.125 2024-08-15 11:04:29,560 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 15 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-15 11:04:35,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3151070.0, ans=0.125 2024-08-15 11:04:52,882 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.64 vs. limit=22.5 2024-08-15 11:04:54,029 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3151170.0, ans=0.125 2024-08-15 11:04:58,152 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 10800, loss[loss=0.1034, beats_loss=0.01154, ecapa_loss=0.0001474, whisper_loss=0.09042, over 16349.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01062, ecapa_loss=0.0001491, whisper_loss=0.09088, over 3839237.92 frames. ], batch size: 64, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:05:00,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=3151270.0, ans=0.05 2024-08-15 11:05:00,193 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=3151270.0, ans=0.1 2024-08-15 11:05:15,394 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.10 vs. limit=10.0 2024-08-15 11:05:29,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3151470.0, ans=0.125 2024-08-15 11:05:39,720 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 23 from LS+wenet, 29 from Vox, 40 fro AS 2024-08-15 11:05:45,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3151470.0, ans=0.0 2024-08-15 11:05:47,309 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 19 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-15 11:06:17,838 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-15 11:06:22,927 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 10850, loss[loss=0.09568, beats_loss=0.009724, ecapa_loss=0.0001845, whisper_loss=0.08412, over 16630.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01056, ecapa_loss=0.0001496, whisper_loss=0.09172, over 3869976.75 frames. ], batch size: 69, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:06:26,991 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.371e+01 2.566e+01 2.885e+01 4.578e+01, threshold=5.132e+01, percent-clipped=0.0 2024-08-15 11:06:27,948 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3151770.0, ans=0.2 2024-08-15 11:06:31,752 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3151770.0, ans=0.125 2024-08-15 11:06:36,636 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 25 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-15 11:07:05,746 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-15 11:07:12,663 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-15 11:07:26,354 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 26 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-15 11:07:31,823 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.72 vs. limit=22.5 2024-08-15 11:07:39,793 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-15 11:07:44,724 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 10900, loss[loss=0.1067, beats_loss=0.009643, ecapa_loss=0.0001297, whisper_loss=0.09573, over 18597.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01048, ecapa_loss=0.0001509, whisper_loss=0.09221, over 3935633.30 frames. ], batch size: 71, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:08:05,216 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 14 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-15 11:08:12,722 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-15 11:08:16,099 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 24 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-15 11:09:07,111 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 29 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-15 11:09:09,560 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 10950, loss[loss=0.1316, beats_loss=0.006957, ecapa_loss=0.0002054, whisper_loss=0.1226, over 22773.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01047, ecapa_loss=0.0001511, whisper_loss=0.09248, over 3927301.44 frames. ], batch size: 91, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:09:12,169 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.969e+01 2.384e+01 2.656e+01 2.933e+01 4.855e+01, threshold=5.312e+01, percent-clipped=0.0 2024-08-15 11:09:59,296 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 17 from LS+wenet, 24 from Vox, 50 fro AS 2024-08-15 11:09:59,598 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3153070.0, ans=0.125 2024-08-15 11:10:06,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3153070.0, ans=0.1 2024-08-15 11:10:18,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3153170.0, ans=0.0 2024-08-15 11:10:25,265 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 11000, loss[loss=0.1143, beats_loss=0.008447, ecapa_loss=0.0001646, whisper_loss=0.1042, over 14737.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01046, ecapa_loss=0.0001526, whisper_loss=0.09238, over 3944706.44 frames. ], batch size: 59, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:10:57,585 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 12 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-15 11:10:59,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3153470.0, ans=0.0 2024-08-15 11:11:02,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3153470.0, ans=0.125 2024-08-15 11:11:16,464 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=5.234e-03 2024-08-15 11:11:20,233 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 22 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-15 11:11:23,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3153670.0, ans=0.125 2024-08-15 11:11:38,498 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 11050, loss[loss=0.0861, beats_loss=0.01117, ecapa_loss=0.0001571, whisper_loss=0.07336, over 20622.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01051, ecapa_loss=0.0001534, whisper_loss=0.09163, over 3935930.78 frames. ], batch size: 86, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:11:41,496 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.292e+01 2.575e+01 2.942e+01 2.806e+02, threshold=5.150e+01, percent-clipped=2.0 2024-08-15 11:11:51,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=3153770.0, ans=0.2 2024-08-15 11:12:02,681 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 16 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-15 11:12:07,071 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3153970.0, ans=0.125 2024-08-15 11:12:10,026 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 31 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-15 11:12:26,504 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 29 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-15 11:12:30,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3154070.0, ans=0.2 2024-08-15 11:12:31,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3154070.0, ans=0.0 2024-08-15 11:13:00,767 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 11100, loss[loss=0.09011, beats_loss=0.01062, ecapa_loss=0.0001523, whisper_loss=0.07796, over 22638.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01052, ecapa_loss=0.0001533, whisper_loss=0.09138, over 3952485.23 frames. ], batch size: 91, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:13:11,102 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 18 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-15 11:13:41,953 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3154470.0, ans=0.2 2024-08-15 11:14:03,239 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 20 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-15 11:14:08,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3154670.0, ans=0.125 2024-08-15 11:14:16,360 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 11150, loss[loss=0.09624, beats_loss=0.01034, ecapa_loss=0.0001529, whisper_loss=0.08437, over 16787.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01049, ecapa_loss=0.0001523, whisper_loss=0.09146, over 3929968.13 frames. ], batch size: 67, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:14:18,083 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 26 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-15 11:14:19,203 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.773e+01 2.361e+01 2.547e+01 2.785e+01 4.285e+01, threshold=5.094e+01, percent-clipped=0.0 2024-08-15 11:14:30,065 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.38 vs. limit=22.5 2024-08-15 11:14:38,537 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.84 vs. limit=15.0 2024-08-15 11:14:54,022 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3154970.0, ans=0.125 2024-08-15 11:14:54,191 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.55 vs. limit=22.5 2024-08-15 11:15:15,150 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=19.93 vs. limit=22.5 2024-08-15 11:15:17,289 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3155170.0, ans=0.125 2024-08-15 11:15:25,973 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3155170.0, ans=0.95 2024-08-15 11:15:31,193 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 11200, loss[loss=0.07353, beats_loss=0.01065, ecapa_loss=0.0001746, whisper_loss=0.06114, over 14198.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01046, ecapa_loss=0.0001531, whisper_loss=0.09121, over 3891123.74 frames. ], batch size: 58, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:15:31,347 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-15 11:15:47,936 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-15 11:15:51,283 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3155370.0, ans=0.125 2024-08-15 11:15:56,943 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 27 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-15 11:16:04,561 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 20 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-15 11:16:16,092 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.29 vs. limit=10.0 2024-08-15 11:16:42,423 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 31 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-15 11:16:43,993 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 11250, loss[loss=0.1357, beats_loss=0.008359, ecapa_loss=0.0001699, whisper_loss=0.1256, over 18380.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01047, ecapa_loss=0.000153, whisper_loss=0.09141, over 3900990.22 frames. ], batch size: 73, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:16:46,928 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.971e+01 2.380e+01 2.622e+01 3.019e+01 1.107e+02, threshold=5.243e+01, percent-clipped=1.0 2024-08-15 11:16:55,047 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.41 vs. limit=15.0 2024-08-15 11:17:09,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3155870.0, ans=0.2 2024-08-15 11:17:21,443 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 23 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-15 11:17:33,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3156070.0, ans=0.125 2024-08-15 11:17:35,393 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 24 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-15 11:17:53,290 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-15 11:17:58,192 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3156170.0, ans=0.0 2024-08-15 11:18:00,544 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 11300, loss[loss=0.09141, beats_loss=0.01167, ecapa_loss=0.0001772, whisper_loss=0.07796, over 19571.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01048, ecapa_loss=0.0001521, whisper_loss=0.09139, over 3921335.28 frames. ], batch size: 83, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:18:02,536 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3156270.0, ans=0.125 2024-08-15 11:18:03,662 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-15 11:18:03,892 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3156270.0, ans=0.1 2024-08-15 11:18:05,277 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 28 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-15 11:18:19,001 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 36 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-15 11:18:19,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3156370.0, ans=0.0 2024-08-15 11:18:33,834 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 32 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-15 11:18:35,291 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3156470.0, ans=0.125 2024-08-15 11:18:44,934 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-15 11:18:45,198 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3156470.0, ans=0.2 2024-08-15 11:18:49,360 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 11:19:03,738 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.58 vs. limit=22.5 2024-08-15 11:19:11,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3156670.0, ans=0.125 2024-08-15 11:19:15,627 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 24 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-15 11:19:15,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3156670.0, ans=0.1 2024-08-15 11:19:20,869 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 26 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-15 11:19:26,097 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 11350, loss[loss=0.1167, beats_loss=0.009082, ecapa_loss=0.0001672, whisper_loss=0.106, over 21312.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01048, ecapa_loss=0.0001522, whisper_loss=0.09194, over 3921642.00 frames. ], batch size: 83, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:19:29,192 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.900e+01 2.374e+01 2.563e+01 2.940e+01 7.855e+01, threshold=5.126e+01, percent-clipped=1.0 2024-08-15 11:19:52,474 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 17 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-15 11:20:01,479 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 21 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-15 11:20:18,574 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.94 vs. limit=6.0 2024-08-15 11:20:23,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3157070.0, ans=0.1 2024-08-15 11:20:30,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3157170.0, ans=0.0 2024-08-15 11:20:40,180 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 11400, loss[loss=0.09435, beats_loss=0.009554, ecapa_loss=0.0001561, whisper_loss=0.08324, over 13421.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01055, ecapa_loss=0.0001521, whisper_loss=0.09155, over 3915324.81 frames. ], batch size: 54, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:20:42,121 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-15 11:20:51,013 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn2.whiten.whitening_limit, batch_count=3157270.0, ans=22.5 2024-08-15 11:20:53,279 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-15 11:21:20,730 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 24 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-15 11:21:27,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3157570.0, ans=0.125 2024-08-15 11:21:30,706 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.30 vs. limit=15.0 2024-08-15 11:21:41,271 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.87 vs. limit=22.5 2024-08-15 11:21:44,191 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 27 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-15 11:21:47,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3157670.0, ans=0.125 2024-08-15 11:22:01,584 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 11450, loss[loss=0.08419, beats_loss=0.01411, ecapa_loss=0.0001165, whisper_loss=0.06892, over 21941.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01064, ecapa_loss=0.0001518, whisper_loss=0.09103, over 3923823.50 frames. ], batch size: 92, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:22:04,386 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.396e+01 2.613e+01 2.879e+01 7.410e+02, threshold=5.227e+01, percent-clipped=0.0 2024-08-15 11:22:04,386 WARNING [optim.py:496] (0/4) Scaling gradients by 0.07053599506616592, model_norm_threshold=52.26521682739258 2024-08-15 11:22:04,579 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.2.out_combiner.bypass_scale with proportion 0.10, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=5.421e+04, grad_sumsq=9.420e+04, orig_rms_sq=5.754e-01 2024-08-15 11:22:05,865 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 37 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-15 11:22:12,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3157770.0, ans=0.125 2024-08-15 11:22:16,281 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 24 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-15 11:22:24,220 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3157870.0, ans=0.125 2024-08-15 11:22:24,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3157870.0, ans=0.2 2024-08-15 11:22:29,538 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3157870.0, ans=0.1 2024-08-15 11:22:38,576 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 18 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-15 11:22:42,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3157970.0, ans=0.035 2024-08-15 11:22:45,413 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2024-08-15 11:22:53,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3158070.0, ans=0.0 2024-08-15 11:23:01,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3158070.0, ans=0.125 2024-08-15 11:23:22,461 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3158270.0, ans=0.125 2024-08-15 11:23:23,253 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 11500, loss[loss=0.09681, beats_loss=0.01251, ecapa_loss=0.0001877, whisper_loss=0.08243, over 21789.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0106, ecapa_loss=0.0001515, whisper_loss=0.09107, over 3927605.20 frames. ], batch size: 95, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:23:39,186 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.24 vs. limit=15.0 2024-08-15 11:23:48,335 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 25 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-15 11:24:11,409 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 21 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-15 11:24:23,272 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 31 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-15 11:24:29,996 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.92 vs. limit=10.0 2024-08-15 11:24:30,928 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 17 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-15 11:24:36,825 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-15 11:24:40,608 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 11550, loss[loss=0.1098, beats_loss=0.008833, ecapa_loss=0.0001573, whisper_loss=0.09938, over 17668.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01061, ecapa_loss=0.0001506, whisper_loss=0.09125, over 3913240.45 frames. ], batch size: 68, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:24:44,454 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.899e+01 2.411e+01 2.579e+01 2.880e+01 5.127e+01, threshold=5.159e+01, percent-clipped=1.0 2024-08-15 11:24:53,827 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-15 11:25:09,458 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 18 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-15 11:25:09,737 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3158870.0, ans=0.125 2024-08-15 11:25:09,773 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3158870.0, ans=0.0 2024-08-15 11:25:15,313 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-15 11:25:29,334 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.72 vs. limit=15.0 2024-08-15 11:25:37,456 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.91 vs. limit=15.0 2024-08-15 11:25:50,020 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-15 11:25:57,941 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 11600, loss[loss=0.1112, beats_loss=0.008615, ecapa_loss=0.0001591, whisper_loss=0.101, over 21213.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01059, ecapa_loss=0.0001504, whisper_loss=0.09131, over 3932062.89 frames. ], batch size: 84, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:26:06,767 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-15 11:26:21,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3159370.0, ans=0.1 2024-08-15 11:26:49,358 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3159570.0, ans=0.125 2024-08-15 11:26:54,039 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.33 vs. limit=6.0 2024-08-15 11:27:14,499 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3159670.0, ans=0.125 2024-08-15 11:27:16,686 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 11650, loss[loss=0.1364, beats_loss=0.007914, ecapa_loss=0.0001563, whisper_loss=0.1269, over 22817.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01061, ecapa_loss=0.0001499, whisper_loss=0.09129, over 3939337.23 frames. ], batch size: 89, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:27:19,969 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.980e+01 2.447e+01 2.693e+01 2.991e+01 1.020e+02, threshold=5.386e+01, percent-clipped=2.0 2024-08-15 11:27:32,918 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3159870.0, ans=0.125 2024-08-15 11:27:33,507 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.66 vs. limit=22.5 2024-08-15 11:27:50,524 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-316000.pt 2024-08-15 11:27:56,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3159970.0, ans=0.09899494936611666 2024-08-15 11:28:04,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3160070.0, ans=0.2 2024-08-15 11:28:13,774 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 26 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-15 11:28:24,776 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 25 from LS+wenet, 30 from Vox, 39 fro AS 2024-08-15 11:28:34,454 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 11700, loss[loss=0.07898, beats_loss=0.01301, ecapa_loss=0.0001566, whisper_loss=0.06441, over 18192.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01064, ecapa_loss=0.0001503, whisper_loss=0.09127, over 3939205.71 frames. ], batch size: 77, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:28:59,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3160370.0, ans=0.125 2024-08-15 11:29:00,112 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 22 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-15 11:29:10,835 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 22 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-15 11:29:16,543 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3160470.0, ans=0.125 2024-08-15 11:29:27,229 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.62 vs. limit=15.0 2024-08-15 11:29:48,614 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 11750, loss[loss=0.09721, beats_loss=0.00883, ecapa_loss=0.0001938, whisper_loss=0.08644, over 16230.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01064, ecapa_loss=0.0001513, whisper_loss=0.09118, over 3907650.57 frames. ], batch size: 66, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:29:52,018 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.895e+01 2.446e+01 2.685e+01 3.012e+01 3.635e+02, threshold=5.370e+01, percent-clipped=2.0 2024-08-15 11:29:59,815 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3160770.0, ans=0.0 2024-08-15 11:30:05,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3160870.0, ans=0.0 2024-08-15 11:30:14,621 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 26 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-15 11:30:16,335 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3160870.0, ans=0.95 2024-08-15 11:30:19,349 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3160970.0, ans=0.1 2024-08-15 11:30:28,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3160970.0, ans=0.125 2024-08-15 11:30:29,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3160970.0, ans=0.125 2024-08-15 11:30:32,433 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 27 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-15 11:30:44,155 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-15 11:31:03,125 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 11800, loss[loss=0.0922, beats_loss=0.00996, ecapa_loss=0.0001745, whisper_loss=0.0805, over 21380.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01073, ecapa_loss=0.0001503, whisper_loss=0.09061, over 3908495.73 frames. ], batch size: 88, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:31:14,954 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 21 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-15 11:31:17,774 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-15 11:31:22,444 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 33 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-15 11:31:29,578 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-15 11:31:44,811 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.40 vs. limit=15.0 2024-08-15 11:31:48,538 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.649e-01 2024-08-15 11:31:59,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3161670.0, ans=0.0 2024-08-15 11:32:13,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3161670.0, ans=0.2 2024-08-15 11:32:14,600 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3161770.0, ans=0.0 2024-08-15 11:32:15,553 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 11850, loss[loss=0.1061, beats_loss=0.009484, ecapa_loss=0.0001437, whisper_loss=0.0952, over 19062.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01071, ecapa_loss=0.0001496, whisper_loss=0.09063, over 3932727.64 frames. ], batch size: 77, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:32:16,965 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-15 11:32:17,964 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.442e+01 2.720e+01 2.983e+01 2.168e+02, threshold=5.440e+01, percent-clipped=2.0 2024-08-15 11:32:21,079 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 34 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-15 11:32:25,412 WARNING [optim.py:496] (0/4) Scaling gradients by 0.029826095327734947, model_norm_threshold=54.40060806274414 2024-08-15 11:32:25,589 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.2.self_attn_weights.linear_pos.weight with proportion 0.21, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.872e+05, grad_sumsq=7.654e+04, orig_rms_sq=8.977e+00 2024-08-15 11:32:39,117 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-15 11:32:41,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=3161870.0, ans=15.0 2024-08-15 11:32:43,731 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 20 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-15 11:32:45,257 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3161970.0, ans=0.125 2024-08-15 11:33:29,023 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 11900, loss[loss=0.08933, beats_loss=0.01218, ecapa_loss=0.0001239, whisper_loss=0.07591, over 20579.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01064, ecapa_loss=0.0001501, whisper_loss=0.09169, over 3946747.87 frames. ], batch size: 78, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:33:50,601 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 26 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-15 11:33:55,491 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3162370.0, ans=0.0 2024-08-15 11:33:57,022 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3162370.0, ans=0.0 2024-08-15 11:34:10,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3162470.0, ans=0.2 2024-08-15 11:34:43,951 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 11950, loss[loss=0.0922, beats_loss=0.005969, ecapa_loss=0.0001887, whisper_loss=0.08434, over 16685.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01055, ecapa_loss=0.00015, whisper_loss=0.09175, over 3912723.46 frames. ], batch size: 64, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:34:46,925 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.257e+01 2.477e+01 2.736e+01 1.824e+03, threshold=4.954e+01, percent-clipped=1.0 2024-08-15 11:35:07,685 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3162870.0, ans=0.2 2024-08-15 11:35:09,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3162870.0, ans=0.125 2024-08-15 11:35:17,480 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 15 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-15 11:35:18,900 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 18 from LS+wenet, 8 from Vox, 29 fro AS 2024-08-15 11:35:22,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3162970.0, ans=0.07 2024-08-15 11:35:29,302 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 12 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-15 11:35:38,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3163070.0, ans=0.2 2024-08-15 11:35:57,247 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 12000, loss[loss=0.08726, beats_loss=0.01083, ecapa_loss=0.0001568, whisper_loss=0.07485, over 16907.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01061, ecapa_loss=0.0001498, whisper_loss=0.09126, over 3913581.86 frames. ], batch size: 68, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:35:57,249 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-15 11:36:35,772 INFO [train_multi_KD3.py:1149] (0/4) Epoch 22, validation on ASR_libri: loss=0.2516, beats_loss=0, ecapa_loss=0.0005396, whisper_loss=0.2462, over 922467.00 frames. 2024-08-15 11:36:56,045 INFO [train_multi_KD3.py:1149] (0/4) Epoch 22, validation on SV_voxceleb1: loss=0.004196, beats_loss=0, ecapa_loss=0.0004196, whisper_loss=0, over 939242.00 frames. 2024-08-15 11:38:25,121 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.2522, 2.6737, 2.5344, 2.5273], device='cuda:0') 2024-08-15 11:38:51,382 INFO [train_multi_KD3.py:1149] (0/4) Epoch 22, validation on AT_audioset: loss=0.02333, beats_loss=0.02333, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 11:38:51,387 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32601MB 2024-08-15 11:38:53,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3163270.0, ans=0.125 2024-08-15 11:39:10,154 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 11:39:23,247 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3163470.0, ans=0.0 2024-08-15 11:39:25,752 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-15 11:39:33,009 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-15 11:39:36,012 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 22 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-15 11:39:37,626 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 22 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-15 11:39:40,529 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 24 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-15 11:39:40,833 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3163570.0, ans=0.0 2024-08-15 11:39:43,588 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 19 from LS+wenet, 32 from Vox, 41 fro AS 2024-08-15 11:39:52,288 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 27 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-15 11:39:52,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3163670.0, ans=0.125 2024-08-15 11:39:53,893 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 17 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-15 11:40:04,839 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 12050, loss[loss=0.08205, beats_loss=0.01307, ecapa_loss=0.0001288, whisper_loss=0.06769, over 22793.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01066, ecapa_loss=0.0001495, whisper_loss=0.09109, over 3939803.58 frames. ], batch size: 91, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:40:05,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3163770.0, ans=0.125 2024-08-15 11:40:07,900 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.917e+01 2.427e+01 2.583e+01 3.021e+01 1.024e+02, threshold=5.165e+01, percent-clipped=2.0 2024-08-15 11:40:42,781 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3163970.0, ans=0.0 2024-08-15 11:40:44,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3163970.0, ans=0.025 2024-08-15 11:40:47,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3163970.0, ans=0.125 2024-08-15 11:40:59,969 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-15 11:41:01,672 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3164070.0, ans=0.125 2024-08-15 11:41:17,940 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 22 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-15 11:41:18,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3164270.0, ans=0.1 2024-08-15 11:41:19,143 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 12100, loss[loss=0.09486, beats_loss=0.01314, ecapa_loss=0.0001637, whisper_loss=0.08009, over 16450.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01067, ecapa_loss=0.0001498, whisper_loss=0.09071, over 3942127.03 frames. ], batch size: 68, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:41:45,211 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3164370.0, ans=0.0 2024-08-15 11:42:24,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3164670.0, ans=0.125 2024-08-15 11:42:29,605 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.99 vs. limit=12.0 2024-08-15 11:42:31,615 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 12150, loss[loss=0.1077, beats_loss=0.01389, ecapa_loss=0.000108, whisper_loss=0.09273, over 23892.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01063, ecapa_loss=0.0001499, whisper_loss=0.09092, over 3927273.42 frames. ], batch size: 92, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:42:32,463 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.40 vs. limit=15.0 2024-08-15 11:42:34,333 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.817e+01 2.205e+01 2.453e+01 2.798e+01 9.875e+01, threshold=4.907e+01, percent-clipped=1.0 2024-08-15 11:42:57,558 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 24 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-15 11:43:03,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3164970.0, ans=0.0 2024-08-15 11:43:03,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3164970.0, ans=0.0 2024-08-15 11:43:13,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3164970.0, ans=0.2 2024-08-15 11:43:46,709 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 12200, loss[loss=0.1184, beats_loss=0.01104, ecapa_loss=0.0001143, whisper_loss=0.1063, over 18591.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0106, ecapa_loss=0.0001491, whisper_loss=0.09151, over 3893989.47 frames. ], batch size: 67, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:44:00,557 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3165370.0, ans=0.2 2024-08-15 11:44:05,767 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.whiten.whitening_limit, batch_count=3165370.0, ans=12.0 2024-08-15 11:44:17,925 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3165470.0, ans=0.125 2024-08-15 11:44:31,410 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3165570.0, ans=0.1 2024-08-15 11:44:50,713 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 21 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-15 11:44:52,770 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.92 vs. limit=15.0 2024-08-15 11:44:56,989 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.41 vs. limit=15.0 2024-08-15 11:45:01,771 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 12250, loss[loss=0.1016, beats_loss=0.009958, ecapa_loss=0.0001577, whisper_loss=0.09005, over 22261.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01054, ecapa_loss=0.0001498, whisper_loss=0.09174, over 3895685.44 frames. ], batch size: 91, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:45:03,608 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 26 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-15 11:45:04,734 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.957e+01 2.418e+01 2.740e+01 3.244e+01 5.356e+01, threshold=5.480e+01, percent-clipped=1.0 2024-08-15 11:45:15,129 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.95 vs. limit=15.0 2024-08-15 11:45:18,888 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3165870.0, ans=0.2 2024-08-15 11:45:20,152 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-15 11:45:24,710 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 26 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-15 11:45:25,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3165870.0, ans=0.95 2024-08-15 11:45:52,838 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 15 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-15 11:46:09,179 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 16 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-15 11:46:16,388 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 12300, loss[loss=0.1014, beats_loss=0.009448, ecapa_loss=0.0001923, whisper_loss=0.09005, over 13253.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0105, ecapa_loss=0.0001502, whisper_loss=0.09134, over 3861659.84 frames. ], batch size: 55, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:46:22,445 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-15 11:46:53,460 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-15 11:46:59,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3166570.0, ans=0.1 2024-08-15 11:46:59,358 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3166570.0, ans=0.1 2024-08-15 11:47:05,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3166570.0, ans=0.5 2024-08-15 11:47:29,278 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 12350, loss[loss=0.1098, beats_loss=0.01057, ecapa_loss=0.0001509, whisper_loss=0.0977, over 22996.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0105, ecapa_loss=0.0001511, whisper_loss=0.0916, over 3889239.24 frames. ], batch size: 92, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:47:32,266 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.941e+01 2.362e+01 2.585e+01 2.912e+01 4.342e+01, threshold=5.170e+01, percent-clipped=0.0 2024-08-15 11:47:40,300 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3166770.0, ans=0.125 2024-08-15 11:48:02,992 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 18 from LS+wenet, 29 from Vox, 43 fro AS 2024-08-15 11:48:19,567 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.28 vs. limit=22.5 2024-08-15 11:48:22,153 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 23 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-15 11:48:30,013 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-15 11:48:43,660 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 12400, loss[loss=0.12, beats_loss=0.009332, ecapa_loss=0.0001297, whisper_loss=0.1094, over 14143.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0105, ecapa_loss=0.0001507, whisper_loss=0.09165, over 3892325.43 frames. ], batch size: 53, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:48:58,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3167370.0, ans=0.0 2024-08-15 11:48:59,453 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 28 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-15 11:49:02,045 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-15 11:49:41,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3167570.0, ans=0.05 2024-08-15 11:49:53,092 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.52 vs. limit=15.0 2024-08-15 11:49:58,049 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 12450, loss[loss=0.09018, beats_loss=0.0116, ecapa_loss=0.000144, whisper_loss=0.07715, over 21921.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0105, ecapa_loss=0.0001506, whisper_loss=0.09141, over 3909627.49 frames. ], batch size: 88, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:50:01,269 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.961e+01 2.281e+01 2.553e+01 2.853e+01 4.118e+01, threshold=5.106e+01, percent-clipped=0.0 2024-08-15 11:50:06,265 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3167770.0, ans=0.125 2024-08-15 11:50:15,398 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3167870.0, ans=0.0 2024-08-15 11:50:22,129 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 12 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-15 11:50:22,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3167870.0, ans=0.125 2024-08-15 11:50:29,817 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 13 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-15 11:50:37,069 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3167970.0, ans=0.125 2024-08-15 11:50:48,947 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-15 11:50:58,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3168170.0, ans=0.125 2024-08-15 11:51:04,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3168170.0, ans=0.0 2024-08-15 11:51:07,421 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 30 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-15 11:51:11,624 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 12500, loss[loss=0.1161, beats_loss=0.009346, ecapa_loss=0.0001614, whisper_loss=0.1051, over 22330.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01053, ecapa_loss=0.00015, whisper_loss=0.09141, over 3900206.09 frames. ], batch size: 87, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:51:26,857 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 34 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-15 11:51:28,493 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3168370.0, ans=0.1 2024-08-15 11:51:34,074 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-15 11:51:43,281 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 24 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-15 11:51:44,626 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-15 11:51:49,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3168470.0, ans=0.125 2024-08-15 11:51:55,036 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 25 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-15 11:52:07,397 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.54 vs. limit=15.0 2024-08-15 11:52:07,548 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.32 vs. limit=6.0 2024-08-15 11:52:18,269 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.60 vs. limit=15.0 2024-08-15 11:52:20,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3168670.0, ans=0.125 2024-08-15 11:52:23,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3168670.0, ans=0.05 2024-08-15 11:52:26,418 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 12550, loss[loss=0.1242, beats_loss=0.01092, ecapa_loss=0.0001205, whisper_loss=0.112, over 18860.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01054, ecapa_loss=0.0001516, whisper_loss=0.09125, over 3900702.57 frames. ], batch size: 70, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:52:29,391 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.896e+01 2.452e+01 2.757e+01 2.941e+01 1.392e+02, threshold=5.513e+01, percent-clipped=1.0 2024-08-15 11:52:58,678 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 24 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-15 11:53:12,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3169070.0, ans=0.1 2024-08-15 11:53:20,118 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 34 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-15 11:53:20,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3169070.0, ans=0.0 2024-08-15 11:53:29,682 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 20 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-15 11:53:32,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3169170.0, ans=0.2 2024-08-15 11:53:33,946 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 14 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-15 11:53:34,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3169170.0, ans=0.125 2024-08-15 11:53:40,539 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.84 vs. limit=15.0 2024-08-15 11:53:40,999 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 12600, loss[loss=0.1195, beats_loss=0.011, ecapa_loss=0.0001397, whisper_loss=0.1071, over 24070.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01056, ecapa_loss=0.0001514, whisper_loss=0.09105, over 3860341.10 frames. ], batch size: 91, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:53:41,497 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3169270.0, ans=0.125 2024-08-15 11:53:49,444 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.87 vs. limit=10.0 2024-08-15 11:54:26,075 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 19 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-15 11:54:32,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3169570.0, ans=0.0 2024-08-15 11:54:38,145 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-15 11:54:41,069 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3169670.0, ans=0.0 2024-08-15 11:54:42,912 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.83 vs. limit=15.0 2024-08-15 11:54:50,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3169670.0, ans=0.125 2024-08-15 11:54:56,020 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 12650, loss[loss=0.1002, beats_loss=0.01104, ecapa_loss=0.0001288, whisper_loss=0.08791, over 15445.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01061, ecapa_loss=0.0001508, whisper_loss=0.09089, over 3841854.34 frames. ], batch size: 60, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:54:58,956 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.896e+01 2.444e+01 2.686e+01 2.954e+01 5.186e+01, threshold=5.373e+01, percent-clipped=0.0 2024-08-15 11:55:00,920 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3169770.0, ans=0.125 2024-08-15 11:55:02,056 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 25 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-15 11:55:03,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3169770.0, ans=0.125 2024-08-15 11:55:12,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3169870.0, ans=0.1 2024-08-15 11:55:28,125 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 23 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-15 11:55:31,035 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 21 from LS+wenet, 15 from Vox, 46 fro AS 2024-08-15 11:55:51,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3170070.0, ans=0.2 2024-08-15 11:56:02,912 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-15 11:56:04,780 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-15 11:56:09,285 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 12700, loss[loss=0.09805, beats_loss=0.01148, ecapa_loss=0.0001731, whisper_loss=0.08483, over 21458.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01071, ecapa_loss=0.0001501, whisper_loss=0.09119, over 3888554.40 frames. ], batch size: 90, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:56:16,805 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 21 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-15 11:56:27,077 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3170370.0, ans=10.0 2024-08-15 11:56:38,891 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3170470.0, ans=0.125 2024-08-15 11:56:51,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3170570.0, ans=0.2 2024-08-15 11:57:19,322 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-15 11:57:22,269 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 12750, loss[loss=0.09884, beats_loss=0.01083, ecapa_loss=0.0001847, whisper_loss=0.08617, over 22377.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01071, ecapa_loss=0.0001507, whisper_loss=0.09106, over 3908982.53 frames. ], batch size: 93, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:57:25,235 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.911e+01 2.276e+01 2.433e+01 2.763e+01 4.017e+01, threshold=4.866e+01, percent-clipped=0.0 2024-08-15 11:57:35,724 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.93 vs. limit=15.0 2024-08-15 11:57:46,068 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.23 vs. limit=22.5 2024-08-15 11:57:55,516 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-15 11:58:12,704 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 25 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-15 11:58:22,020 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 23 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-15 11:58:26,426 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 23 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-15 11:58:28,783 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.64 vs. limit=22.5 2024-08-15 11:58:32,533 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 32 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-15 11:58:39,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3171270.0, ans=0.0 2024-08-15 11:58:39,819 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 12800, loss[loss=0.101, beats_loss=0.01181, ecapa_loss=0.0001391, whisper_loss=0.08783, over 22557.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.0107, ecapa_loss=0.0001511, whisper_loss=0.09163, over 3926418.62 frames. ], batch size: 92, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:58:44,945 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-15 11:58:47,402 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.47 vs. limit=10.0 2024-08-15 11:58:59,630 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 24 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-15 11:59:00,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3171370.0, ans=0.125 2024-08-15 11:59:01,034 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 24 from LS+wenet, 12 from Vox, 20 fro AS 2024-08-15 11:59:10,278 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3171470.0, ans=0.125 2024-08-15 11:59:15,622 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 16 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-15 11:59:32,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3171570.0, ans=0.1 2024-08-15 11:59:54,397 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 12850, loss[loss=0.08532, beats_loss=0.0112, ecapa_loss=0.0001459, whisper_loss=0.07266, over 17673.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01075, ecapa_loss=0.0001506, whisper_loss=0.09099, over 3857460.27 frames. ], batch size: 74, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:59:57,432 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.257e+01 2.519e+01 2.816e+01 4.550e+01, threshold=5.038e+01, percent-clipped=0.0 2024-08-15 12:00:11,286 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 20 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-15 12:00:23,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3171970.0, ans=0.0 2024-08-15 12:01:08,156 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 12900, loss[loss=0.1179, beats_loss=0.009376, ecapa_loss=0.0001375, whisper_loss=0.1072, over 24155.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01076, ecapa_loss=0.0001502, whisper_loss=0.09095, over 3851266.43 frames. ], batch size: 93, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:01:11,900 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.21 vs. limit=12.0 2024-08-15 12:01:15,502 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 19 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-15 12:01:29,406 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3172370.0, ans=0.125 2024-08-15 12:01:31,709 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-15 12:01:38,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3172470.0, ans=0.1 2024-08-15 12:01:46,829 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 12 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-15 12:01:51,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3172570.0, ans=0.0 2024-08-15 12:01:54,782 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3172570.0, ans=0.125 2024-08-15 12:02:05,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3172570.0, ans=0.1 2024-08-15 12:02:21,830 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 12950, loss[loss=0.118, beats_loss=0.009557, ecapa_loss=0.0001176, whisper_loss=0.1073, over 24171.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01065, ecapa_loss=0.0001494, whisper_loss=0.09112, over 3836738.08 frames. ], batch size: 91, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:02:22,630 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.39 vs. limit=15.0 2024-08-15 12:02:24,919 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.877e+01 2.275e+01 2.546e+01 2.873e+01 4.108e+01, threshold=5.092e+01, percent-clipped=0.0 2024-08-15 12:02:40,011 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 17 from LS+wenet, 18 from Vox, 55 fro AS 2024-08-15 12:02:47,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3172870.0, ans=0.2 2024-08-15 12:02:57,622 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-15 12:02:59,204 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3172970.0, ans=0.125 2024-08-15 12:03:16,376 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 29 from LS+wenet, 29 from Vox, 28 fro AS 2024-08-15 12:03:22,998 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 23 from LS+wenet, 18 from Vox, 17 fro AS 2024-08-15 12:03:25,960 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3173170.0, ans=0.0 2024-08-15 12:03:32,807 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-15 12:03:33,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3173170.0, ans=0.125 2024-08-15 12:03:36,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3173270.0, ans=0.125 2024-08-15 12:03:37,129 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 13000, loss[loss=0.09233, beats_loss=0.009781, ecapa_loss=0.0001341, whisper_loss=0.0812, over 14344.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01061, ecapa_loss=0.0001505, whisper_loss=0.09116, over 3877793.50 frames. ], batch size: 56, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:03:55,449 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3173370.0, ans=0.2 2024-08-15 12:04:02,211 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.85 vs. limit=6.0 2024-08-15 12:04:03,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3173370.0, ans=0.125 2024-08-15 12:04:15,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3173470.0, ans=0.0 2024-08-15 12:04:16,986 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.77 vs. limit=12.0 2024-08-15 12:04:18,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3173470.0, ans=0.1 2024-08-15 12:04:20,487 WARNING [optim.py:496] (0/4) Scaling gradients by 0.0589878149330616, model_norm_threshold=50.92251968383789 2024-08-15 12:04:20,680 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.49, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.672e+05, grad_sumsq=3.654e+07, orig_rms_sq=1.005e-02 2024-08-15 12:04:28,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3173570.0, ans=0.0 2024-08-15 12:04:47,925 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-15 12:04:51,893 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 13050, loss[loss=0.09501, beats_loss=0.01015, ecapa_loss=0.0001834, whisper_loss=0.08302, over 18636.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01062, ecapa_loss=0.0001509, whisper_loss=0.0905, over 3860066.46 frames. ], batch size: 79, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:04:53,613 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 24 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-15 12:04:54,722 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.413e+01 2.554e+01 2.771e+01 8.633e+02, threshold=5.107e+01, percent-clipped=2.0 2024-08-15 12:04:58,209 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3173770.0, ans=0.125 2024-08-15 12:05:03,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3173770.0, ans=0.125 2024-08-15 12:05:13,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3173870.0, ans=0.09899494936611666 2024-08-15 12:05:14,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3173870.0, ans=0.0 2024-08-15 12:05:26,260 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-15 12:05:37,084 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 25 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-15 12:05:40,595 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.65 vs. limit=15.0 2024-08-15 12:05:50,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3174170.0, ans=0.0 2024-08-15 12:05:59,394 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 21 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-15 12:06:06,656 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 13100, loss[loss=0.1128, beats_loss=0.009065, ecapa_loss=0.0001139, whisper_loss=0.1026, over 21483.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01065, ecapa_loss=0.0001494, whisper_loss=0.09047, over 3860090.83 frames. ], batch size: 80, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:06:19,537 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-15 12:06:43,192 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3174470.0, ans=0.125 2024-08-15 12:07:04,990 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-15 12:07:07,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3174670.0, ans=0.1 2024-08-15 12:07:15,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3174670.0, ans=0.0 2024-08-15 12:07:19,804 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 15 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-15 12:07:20,935 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 13150, loss[loss=0.0825, beats_loss=0.01173, ecapa_loss=0.0001264, whisper_loss=0.0695, over 14095.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01065, ecapa_loss=0.0001486, whisper_loss=0.09035, over 3841947.66 frames. ], batch size: 57, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:07:23,847 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.813e+01 2.322e+01 2.580e+01 2.894e+01 4.254e+01, threshold=5.159e+01, percent-clipped=0.0 2024-08-15 12:07:26,034 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=4.242e-02 2024-08-15 12:07:33,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3174770.0, ans=0.125 2024-08-15 12:07:36,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3174870.0, ans=0.0 2024-08-15 12:08:01,819 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.98 vs. limit=15.0 2024-08-15 12:08:04,072 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 32 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-15 12:08:04,584 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3175070.0, ans=0.2 2024-08-15 12:08:19,795 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 22 from LS+wenet, 23 from Vox, 48 fro AS 2024-08-15 12:08:34,120 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 13200, loss[loss=0.09429, beats_loss=0.01159, ecapa_loss=0.0001911, whisper_loss=0.08079, over 18855.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01065, ecapa_loss=0.0001491, whisper_loss=0.09016, over 3864645.68 frames. ], batch size: 80, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:08:36,327 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3175270.0, ans=0.2 2024-08-15 12:08:39,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3175270.0, ans=0.09899494936611666 2024-08-15 12:08:58,538 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2024-08-15 12:09:34,240 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.52 vs. limit=15.0 2024-08-15 12:09:35,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3175670.0, ans=0.0 2024-08-15 12:09:39,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3175670.0, ans=0.95 2024-08-15 12:09:42,701 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 21 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-15 12:09:50,264 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 13250, loss[loss=0.1019, beats_loss=0.009807, ecapa_loss=0.0001328, whisper_loss=0.09075, over 13996.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01052, ecapa_loss=0.0001502, whisper_loss=0.09086, over 3865385.48 frames. ], batch size: 54, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:09:53,268 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.280e+01 2.540e+01 2.785e+01 5.121e+01, threshold=5.079e+01, percent-clipped=0.0 2024-08-15 12:09:53,678 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 19 from LS+wenet, 21 from Vox, 49 fro AS 2024-08-15 12:09:56,339 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 18 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-15 12:09:59,370 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 40 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-15 12:10:18,178 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.16 vs. limit=12.0 2024-08-15 12:10:23,693 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-15 12:10:26,003 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.75 vs. limit=10.0 2024-08-15 12:10:29,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3175970.0, ans=0.0 2024-08-15 12:10:45,459 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3176070.0, ans=0.125 2024-08-15 12:10:47,937 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 20 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-15 12:10:55,459 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3176170.0, ans=0.0 2024-08-15 12:10:58,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3176170.0, ans=0.125 2024-08-15 12:11:05,755 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 13300, loss[loss=0.09558, beats_loss=0.01008, ecapa_loss=0.0001576, whisper_loss=0.08392, over 19460.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01059, ecapa_loss=0.0001495, whisper_loss=0.09066, over 3835558.64 frames. ], batch size: 79, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:11:17,881 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=4.700e+01 2024-08-15 12:11:25,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3176370.0, ans=0.2 2024-08-15 12:11:52,425 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-15 12:12:04,196 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-15 12:12:07,466 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3176670.0, ans=0.05 2024-08-15 12:12:07,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3176670.0, ans=0.125 2024-08-15 12:12:08,816 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 20 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-15 12:12:18,522 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 13350, loss[loss=0.08321, beats_loss=0.01282, ecapa_loss=0.0001479, whisper_loss=0.06891, over 21717.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01058, ecapa_loss=0.00015, whisper_loss=0.09053, over 3849230.58 frames. ], batch size: 91, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:12:21,408 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.761e+01 2.329e+01 2.607e+01 2.940e+01 2.592e+02, threshold=5.213e+01, percent-clipped=3.0 2024-08-15 12:12:22,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3176770.0, ans=0.1 2024-08-15 12:12:24,653 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 19 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-15 12:12:26,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3176770.0, ans=0.2 2024-08-15 12:12:28,326 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.29 vs. limit=15.0 2024-08-15 12:12:37,600 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3176870.0, ans=0.125 2024-08-15 12:12:41,757 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 17 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-15 12:12:51,507 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 15 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-15 12:13:05,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3177070.0, ans=0.0 2024-08-15 12:13:16,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3177170.0, ans=0.125 2024-08-15 12:13:18,505 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3177170.0, ans=0.125 2024-08-15 12:13:27,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3177170.0, ans=0.125 2024-08-15 12:13:32,328 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 13400, loss[loss=0.09923, beats_loss=0.009745, ecapa_loss=0.0001492, whisper_loss=0.088, over 18404.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0106, ecapa_loss=0.0001489, whisper_loss=0.09017, over 3818996.73 frames. ], batch size: 71, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:13:34,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3177270.0, ans=0.1 2024-08-15 12:13:35,725 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 35 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-15 12:13:36,002 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.027e-02 2024-08-15 12:13:38,694 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 16 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-15 12:14:19,790 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3177570.0, ans=0.2 2024-08-15 12:14:32,189 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-15 12:14:45,820 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 13450, loss[loss=0.1028, beats_loss=0.009731, ecapa_loss=0.0001556, whisper_loss=0.09155, over 21825.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0106, ecapa_loss=0.0001499, whisper_loss=0.09019, over 3869308.35 frames. ], batch size: 91, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:14:48,644 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.536e+01 2.666e+01 2.899e+01 1.016e+02, threshold=5.331e+01, percent-clipped=2.0 2024-08-15 12:14:50,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3177770.0, ans=0.125 2024-08-15 12:15:18,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3177970.0, ans=0.0 2024-08-15 12:15:31,240 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 35 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-15 12:15:43,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3178070.0, ans=0.0 2024-08-15 12:15:46,541 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.37 vs. limit=15.0 2024-08-15 12:16:00,554 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 13500, loss[loss=0.09705, beats_loss=0.01283, ecapa_loss=0.0001226, whisper_loss=0.08299, over 20123.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01057, ecapa_loss=0.0001505, whisper_loss=0.09037, over 3879915.87 frames. ], batch size: 79, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:16:00,935 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 16 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-15 12:16:09,724 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 23 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-15 12:16:20,672 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 27 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-15 12:16:25,062 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-15 12:16:25,221 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3178370.0, ans=0.2 2024-08-15 12:16:25,276 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3178370.0, ans=0.1 2024-08-15 12:16:30,085 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2024-08-15 12:16:31,175 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3178470.0, ans=0.0 2024-08-15 12:16:43,049 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-15 12:16:50,319 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 15 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-15 12:16:53,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3178570.0, ans=0.2 2024-08-15 12:16:55,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3178570.0, ans=0.2 2024-08-15 12:17:00,618 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-15 12:17:14,734 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 13550, loss[loss=0.07745, beats_loss=0.01298, ecapa_loss=0.0001747, whisper_loss=0.06272, over 19361.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0106, ecapa_loss=0.0001514, whisper_loss=0.09019, over 3873398.16 frames. ], batch size: 84, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:17:17,638 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.788e+01 2.308e+01 2.563e+01 2.825e+01 4.152e+01, threshold=5.126e+01, percent-clipped=0.0 2024-08-15 12:17:23,697 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 16 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-15 12:17:33,069 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.37 vs. limit=22.5 2024-08-15 12:17:37,024 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3178870.0, ans=0.125 2024-08-15 12:17:41,409 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=3.815e-01 2024-08-15 12:17:59,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3179070.0, ans=0.0 2024-08-15 12:18:23,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3179170.0, ans=0.0 2024-08-15 12:18:28,352 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 13600, loss[loss=0.1067, beats_loss=0.01011, ecapa_loss=0.0001696, whisper_loss=0.09494, over 21736.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01066, ecapa_loss=0.0001502, whisper_loss=0.08993, over 3866755.65 frames. ], batch size: 91, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:18:28,866 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3179270.0, ans=0.125 2024-08-15 12:18:31,773 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 23 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-15 12:18:34,436 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 22 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-15 12:19:01,160 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.87 vs. limit=15.0 2024-08-15 12:19:03,765 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 39 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-15 12:19:07,189 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3179470.0, ans=0.125 2024-08-15 12:19:08,512 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3179470.0, ans=0.1 2024-08-15 12:19:22,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=3179570.0, ans=0.1 2024-08-15 12:19:27,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3179670.0, ans=0.0 2024-08-15 12:19:40,970 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 23 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-15 12:19:41,951 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 13650, loss[loss=0.09115, beats_loss=0.01327, ecapa_loss=0.0001748, whisper_loss=0.07614, over 21291.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01066, ecapa_loss=0.0001506, whisper_loss=0.09007, over 3856614.49 frames. ], batch size: 90, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:19:44,954 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.753e+01 2.314e+01 2.512e+01 2.853e+01 1.013e+02, threshold=5.025e+01, percent-clipped=2.0 2024-08-15 12:20:21,270 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 22 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-15 12:20:24,921 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.80 vs. limit=15.0 2024-08-15 12:20:27,428 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 23 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-15 12:20:38,895 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 24 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-15 12:20:48,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3180170.0, ans=0.0 2024-08-15 12:20:49,402 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-15 12:20:55,493 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 13700, loss[loss=0.1032, beats_loss=0.01111, ecapa_loss=0.0001399, whisper_loss=0.09073, over 20881.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01068, ecapa_loss=0.0001503, whisper_loss=0.09033, over 3860895.85 frames. ], batch size: 82, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:21:12,786 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 17 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-15 12:21:15,956 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3180370.0, ans=0.2 2024-08-15 12:21:22,083 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-15 12:21:27,502 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-15 12:21:55,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3180670.0, ans=0.125 2024-08-15 12:22:03,057 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.78 vs. limit=10.0 2024-08-15 12:22:10,324 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 23 from LS+wenet, 10 from Vox, 23 fro AS 2024-08-15 12:22:11,420 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 13750, loss[loss=0.1164, beats_loss=0.008901, ecapa_loss=0.000142, whisper_loss=0.1061, over 15413.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0107, ecapa_loss=0.000151, whisper_loss=0.08971, over 3830951.34 frames. ], batch size: 56, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:22:14,191 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.271e+01 2.530e+01 2.885e+01 4.854e+01, threshold=5.060e+01, percent-clipped=0.0 2024-08-15 12:22:15,878 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 21 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-15 12:22:19,111 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 26 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-15 12:22:33,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3180870.0, ans=0.2 2024-08-15 12:22:57,285 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-15 12:23:00,570 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.15 vs. limit=15.0 2024-08-15 12:23:01,209 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-15 12:23:09,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3181070.0, ans=0.0 2024-08-15 12:23:25,263 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.77 vs. limit=12.0 2024-08-15 12:23:25,773 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 13800, loss[loss=0.1291, beats_loss=0.009086, ecapa_loss=0.0001301, whisper_loss=0.1187, over 17592.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01061, ecapa_loss=0.0001512, whisper_loss=0.09039, over 3868260.95 frames. ], batch size: 65, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:23:29,083 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 26 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-15 12:23:36,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3181270.0, ans=0.035 2024-08-15 12:24:07,476 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 21 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-15 12:24:10,561 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 22 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-15 12:24:25,512 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 27 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-15 12:24:40,095 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 13850, loss[loss=0.1131, beats_loss=0.008618, ecapa_loss=0.0002007, whisper_loss=0.1025, over 21552.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01052, ecapa_loss=0.000151, whisper_loss=0.09105, over 3881853.37 frames. ], batch size: 90, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:24:41,910 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3181770.0, ans=0.125 2024-08-15 12:24:43,016 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.942e+01 2.369e+01 2.668e+01 2.994e+01 7.332e+01, threshold=5.336e+01, percent-clipped=2.0 2024-08-15 12:24:50,870 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3181770.0, ans=0.125 2024-08-15 12:24:52,229 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3181770.0, ans=0.125 2024-08-15 12:24:58,422 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3181870.0, ans=0.0 2024-08-15 12:25:03,876 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-15 12:25:13,032 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.19 vs. limit=22.5 2024-08-15 12:25:51,239 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3182170.0, ans=0.09899494936611666 2024-08-15 12:25:53,670 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 13900, loss[loss=0.1023, beats_loss=0.01198, ecapa_loss=0.0001286, whisper_loss=0.08903, over 23035.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01056, ecapa_loss=0.0001517, whisper_loss=0.0906, over 3899165.20 frames. ], batch size: 91, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:25:53,895 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 24 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-15 12:26:13,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3182370.0, ans=0.125 2024-08-15 12:26:29,561 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.00 vs. limit=15.0 2024-08-15 12:26:35,239 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3182470.0, ans=0.125 2024-08-15 12:26:46,607 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 21 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-15 12:26:51,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3182670.0, ans=0.125 2024-08-15 12:26:53,228 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.69 vs. limit=15.0 2024-08-15 12:26:57,648 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3182670.0, ans=0.0 2024-08-15 12:27:01,541 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3182670.0, ans=0.2 2024-08-15 12:27:06,727 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 13950, loss[loss=0.1091, beats_loss=0.01081, ecapa_loss=0.0001102, whisper_loss=0.09721, over 21873.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01063, ecapa_loss=0.0001501, whisper_loss=0.09024, over 3895833.23 frames. ], batch size: 81, lr: 2.80e-03, grad_scale: 1.152921504606847e+18 2024-08-15 12:27:09,348 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.899e+01 2.273e+01 2.481e+01 2.745e+01 4.473e+01, threshold=4.963e+01, percent-clipped=0.0 2024-08-15 12:27:32,471 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 20 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-15 12:27:55,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3183070.0, ans=0.125 2024-08-15 12:28:15,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3183170.0, ans=0.0 2024-08-15 12:28:16,060 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 25 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-15 12:28:19,114 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-15 12:28:20,292 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 14000, loss[loss=0.1061, beats_loss=0.009615, ecapa_loss=0.0001465, whisper_loss=0.095, over 22090.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01058, ecapa_loss=0.00015, whisper_loss=0.0907, over 3878257.97 frames. ], batch size: 88, lr: 2.80e-03, grad_scale: 1.152921504606847e+18 2024-08-15 12:28:22,220 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3183270.0, ans=0.0 2024-08-15 12:28:22,589 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.32 vs. limit=22.5 2024-08-15 12:28:36,986 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 26 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-15 12:28:47,506 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 19 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-15 12:28:48,938 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 29 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-15 12:28:56,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3183470.0, ans=0.125 2024-08-15 12:29:04,772 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 36 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-15 12:29:13,111 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.78 vs. limit=15.0 2024-08-15 12:29:34,308 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 14050, loss[loss=0.08708, beats_loss=0.01248, ecapa_loss=0.0001377, whisper_loss=0.07322, over 22589.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0106, ecapa_loss=0.0001496, whisper_loss=0.09075, over 3872493.58 frames. ], batch size: 92, lr: 2.80e-03, grad_scale: 1.152921504606847e+18 2024-08-15 12:29:37,261 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.762e+01 2.178e+01 2.428e+01 2.740e+01 4.100e+01, threshold=4.856e+01, percent-clipped=0.0 2024-08-15 12:29:51,639 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.15 vs. limit=15.0 2024-08-15 12:29:53,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3183870.0, ans=0.125 2024-08-15 12:30:13,978 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 24 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-15 12:30:31,880 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 26 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-15 12:30:41,184 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.24 vs. limit=15.0 2024-08-15 12:30:49,749 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.69 vs. limit=22.5 2024-08-15 12:30:50,232 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 14100, loss[loss=0.1026, beats_loss=0.01112, ecapa_loss=0.0001105, whisper_loss=0.09035, over 20162.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01055, ecapa_loss=0.0001488, whisper_loss=0.09114, over 3857410.43 frames. ], batch size: 78, lr: 2.80e-03, grad_scale: 1.152921504606847e+18 2024-08-15 12:30:56,599 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.88 vs. limit=22.5 2024-08-15 12:31:07,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3184370.0, ans=0.125 2024-08-15 12:31:21,956 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 14 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-15 12:31:39,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3184570.0, ans=0.1 2024-08-15 12:31:44,082 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-15 12:32:03,074 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 14150, loss[loss=0.1148, beats_loss=0.01042, ecapa_loss=0.0001371, whisper_loss=0.103, over 23308.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01058, ecapa_loss=0.0001489, whisper_loss=0.09083, over 3844598.83 frames. ], batch size: 91, lr: 2.80e-03, grad_scale: 1.152921504606847e+18 2024-08-15 12:32:06,094 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.412e+01 2.567e+01 2.890e+01 3.775e+01, threshold=5.134e+01, percent-clipped=0.0 2024-08-15 12:32:20,159 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 22 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-15 12:32:44,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3184970.0, ans=0.125 2024-08-15 12:32:48,333 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.30 vs. limit=15.0 2024-08-15 12:33:18,942 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 12 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-15 12:33:22,181 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 14200, loss[loss=0.1005, beats_loss=0.01288, ecapa_loss=0.0001181, whisper_loss=0.08646, over 21892.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01059, ecapa_loss=0.000148, whisper_loss=0.09107, over 3842642.49 frames. ], batch size: 89, lr: 2.80e-03, grad_scale: 1.152921504606847e+18 2024-08-15 12:33:22,536 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3185270.0, ans=0.0 2024-08-15 12:33:22,930 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.63 vs. limit=15.0 2024-08-15 12:33:26,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3185270.0, ans=0.1 2024-08-15 12:33:28,708 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.21 vs. limit=15.0 2024-08-15 12:33:48,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3185370.0, ans=0.125 2024-08-15 12:33:49,967 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 25 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-15 12:34:16,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3185570.0, ans=0.05 2024-08-15 12:34:21,371 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 26 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-15 12:34:35,915 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 22 from LS+wenet, 17 from Vox, 53 fro AS 2024-08-15 12:34:37,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3185670.0, ans=0.125 2024-08-15 12:34:44,397 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 14250, loss[loss=0.08775, beats_loss=0.009982, ecapa_loss=0.0001468, whisper_loss=0.0763, over 20073.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01058, ecapa_loss=0.0001484, whisper_loss=0.09131, over 3863929.15 frames. ], batch size: 80, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:34:49,746 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.313e+01 2.543e+01 2.810e+01 4.306e+01, threshold=5.087e+01, percent-clipped=0.0 2024-08-15 12:35:19,249 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 18 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-15 12:35:19,912 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.51 vs. limit=6.0 2024-08-15 12:35:34,465 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3185970.0, ans=0.0 2024-08-15 12:35:41,456 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 16 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-15 12:35:53,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3186070.0, ans=0.125 2024-08-15 12:36:23,862 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 14300, loss[loss=0.1454, beats_loss=0.006803, ecapa_loss=0.0001715, whisper_loss=0.1368, over 21841.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01061, ecapa_loss=0.0001479, whisper_loss=0.09117, over 3885059.46 frames. ], batch size: 83, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:36:32,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3186270.0, ans=0.125 2024-08-15 12:36:37,609 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.70 vs. limit=15.0 2024-08-15 12:36:39,838 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 19 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-15 12:36:42,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3186370.0, ans=0.0 2024-08-15 12:36:57,416 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-15 12:37:11,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3186470.0, ans=0.125 2024-08-15 12:37:24,886 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 24 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-15 12:37:25,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3186570.0, ans=0.125 2024-08-15 12:37:45,230 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 25 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-15 12:37:45,633 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3186670.0, ans=0.125 2024-08-15 12:38:03,703 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 14350, loss[loss=0.1136, beats_loss=0.008243, ecapa_loss=0.0001468, whisper_loss=0.1039, over 19470.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0106, ecapa_loss=0.0001479, whisper_loss=0.09081, over 3888138.30 frames. ], batch size: 73, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:38:04,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3186770.0, ans=0.0 2024-08-15 12:38:09,851 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.012e+01 2.292e+01 2.515e+01 2.764e+01 5.097e+01, threshold=5.030e+01, percent-clipped=1.0 2024-08-15 12:38:13,869 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3186770.0, ans=0.1 2024-08-15 12:38:15,550 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-15 12:38:22,232 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3186870.0, ans=0.0 2024-08-15 12:38:24,309 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.74 vs. limit=15.0 2024-08-15 12:38:26,986 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-15 12:38:56,219 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.16 vs. limit=22.5 2024-08-15 12:39:19,544 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 24 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-15 12:39:34,374 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 34 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-15 12:39:46,712 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 14400, loss[loss=0.09635, beats_loss=0.01137, ecapa_loss=0.0001449, whisper_loss=0.08353, over 17147.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01062, ecapa_loss=0.0001476, whisper_loss=0.09052, over 3905563.53 frames. ], batch size: 72, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:39:48,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3187270.0, ans=0.0 2024-08-15 12:39:56,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3187270.0, ans=0.125 2024-08-15 12:40:15,854 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.38 vs. limit=10.0 2024-08-15 12:40:23,211 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.37 vs. limit=22.5 2024-08-15 12:40:37,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3187570.0, ans=0.0 2024-08-15 12:40:47,381 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 17 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-15 12:41:00,923 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.55 vs. limit=10.0 2024-08-15 12:41:06,241 INFO [train_multi_KD3.py:1116] (0/4) Epoch 22, batch 14450, loss[loss=0.08853, beats_loss=0.009442, ecapa_loss=0.0001508, whisper_loss=0.07758, over 15685.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01063, ecapa_loss=0.000148, whisper_loss=0.09077, over 3908708.60 frames. ], batch size: 60, lr: 2.80e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 12:41:11,336 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3187770.0, ans=0.0 2024-08-15 12:41:11,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3187770.0, ans=0.2 2024-08-15 12:41:12,113 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.715e+01 2.363e+01 2.570e+01 2.963e+01 1.669e+02, threshold=5.140e+01, percent-clipped=2.0 2024-08-15 12:41:18,463 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 27 from LS+wenet, 30 from Vox, 25 fro AS 2024-08-15 12:41:26,553 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-15 12:41:51,857 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3188070.0, ans=0.0 2024-08-15 12:41:55,859 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-15 12:41:58,860 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3188070.0, ans=0.2 2024-08-15 12:42:00,354 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3188070.0, ans=0.125 2024-08-15 12:42:03,232 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3188170.0, ans=0.125 2024-08-15 12:42:05,985 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3188170.0, ans=0.125 2024-08-15 12:42:14,079 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/epoch-22.pt 2024-08-15 12:42:46,624 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 0, loss[loss=0.08067, beats_loss=0.009902, ecapa_loss=0.0001746, whisper_loss=0.06902, over 19305.00 frames. ], tot_loss[loss=0.08067, beats_loss=0.009902, ecapa_loss=0.0001746, whisper_loss=0.06902, over 19305.00 frames. ], batch size: 79, lr: 2.74e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 12:42:46,625 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-15 12:43:28,463 INFO [train_multi_KD3.py:1149] (0/4) Epoch 23, validation on ASR_libri: loss=0.2517, beats_loss=0, ecapa_loss=0.0005338, whisper_loss=0.2464, over 922467.00 frames. 2024-08-15 12:43:45,302 INFO [train_multi_KD3.py:1149] (0/4) Epoch 23, validation on SV_voxceleb1: loss=0.00428, beats_loss=0, ecapa_loss=0.000428, whisper_loss=0, over 939242.00 frames. 2024-08-15 12:45:43,890 INFO [train_multi_KD3.py:1149] (0/4) Epoch 23, validation on AT_audioset: loss=0.02325, beats_loss=0.02325, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 12:45:43,899 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32601MB 2024-08-15 12:46:00,495 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-15 12:46:00,686 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3188220.0, ans=0.125 2024-08-15 12:46:34,872 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 12:46:58,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3188520.0, ans=0.1 2024-08-15 12:47:07,976 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 24 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-15 12:47:23,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3188620.0, ans=0.0 2024-08-15 12:47:38,768 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3188620.0, ans=0.125 2024-08-15 12:47:50,370 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 50, loss[loss=0.08793, beats_loss=0.009877, ecapa_loss=0.0001658, whisper_loss=0.0764, over 14759.00 frames. ], tot_loss[loss=0.09891, beats_loss=0.009772, ecapa_loss=0.0001538, whisper_loss=0.0876, over 903651.55 frames. ], batch size: 62, lr: 2.74e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 12:47:53,061 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-15 12:48:13,945 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.052e+01 2.413e+01 2.733e+01 3.074e+01 3.899e+01, threshold=5.466e+01, percent-clipped=0.0 2024-08-15 12:48:29,493 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 22 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-15 12:48:57,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3188920.0, ans=0.0 2024-08-15 12:48:59,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3188920.0, ans=0.0 2024-08-15 12:49:21,541 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 26 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-15 12:49:49,918 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 100, loss[loss=0.0766, beats_loss=0.01008, ecapa_loss=0.0001287, whisper_loss=0.06523, over 17025.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.00958, ecapa_loss=0.0001531, whisper_loss=0.08943, over 1559714.30 frames. ], batch size: 64, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 12:50:21,059 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 26 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-15 12:50:31,669 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 28 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-15 12:50:36,270 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 19 from LS+wenet, 6 from Vox, 34 fro AS 2024-08-15 12:50:38,969 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3189420.0, ans=0.0 2024-08-15 12:50:52,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3189420.0, ans=0.125 2024-08-15 12:50:58,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3189520.0, ans=0.125 2024-08-15 12:51:00,493 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-15 12:51:18,331 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 17 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-15 12:51:20,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3189620.0, ans=0.0 2024-08-15 12:51:41,283 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 150, loss[loss=0.09035, beats_loss=0.01074, ecapa_loss=0.0001508, whisper_loss=0.0781, over 22083.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.00968, ecapa_loss=0.0001509, whisper_loss=0.0892, over 2052078.91 frames. ], batch size: 88, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 12:51:48,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3189720.0, ans=0.125 2024-08-15 12:51:57,043 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.145e+01 2.541e+01 2.794e+01 3.145e+01 4.567e+01, threshold=5.588e+01, percent-clipped=0.0 2024-08-15 12:53:04,411 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-15 12:53:05,911 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 200, loss[loss=0.09919, beats_loss=0.01199, ecapa_loss=0.0001617, whisper_loss=0.08558, over 22455.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.009807, ecapa_loss=0.0001513, whisper_loss=0.0904, over 2436100.68 frames. ], batch size: 93, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 12:53:26,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3190320.0, ans=0.125 2024-08-15 12:53:28,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=3190320.0, ans=22.5 2024-08-15 12:53:38,702 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 26 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-15 12:53:44,739 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 16 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-15 12:53:58,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3190520.0, ans=0.2 2024-08-15 12:54:18,702 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-15 12:54:24,675 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 250, loss[loss=0.1157, beats_loss=0.01083, ecapa_loss=0.0001644, whisper_loss=0.1032, over 22136.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01011, ecapa_loss=0.0001503, whisper_loss=0.08949, over 2758911.16 frames. ], batch size: 88, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 12:54:38,834 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.620e+01 2.264e+01 2.507e+01 2.916e+01 4.701e+01, threshold=5.014e+01, percent-clipped=0.0 2024-08-15 12:54:40,710 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 20 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-15 12:54:43,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3190820.0, ans=0.0 2024-08-15 12:54:45,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3190820.0, ans=0.125 2024-08-15 12:54:52,221 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 17 from LS+wenet, 25 from Vox, 22 fro AS 2024-08-15 12:54:52,510 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3190820.0, ans=0.125 2024-08-15 12:54:54,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3190920.0, ans=0.09899494936611666 2024-08-15 12:54:54,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3190920.0, ans=0.2 2024-08-15 12:54:59,929 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 24 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-15 12:55:00,195 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3190920.0, ans=0.09899494936611666 2024-08-15 12:55:04,703 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-15 12:55:21,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3191020.0, ans=0.125 2024-08-15 12:55:25,449 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-15 12:55:38,336 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3191120.0, ans=0.125 2024-08-15 12:55:38,704 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.39 vs. limit=12.0 2024-08-15 12:55:41,392 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 300, loss[loss=0.1041, beats_loss=0.01133, ecapa_loss=0.0001447, whisper_loss=0.09137, over 23661.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01017, ecapa_loss=0.0001504, whisper_loss=0.08907, over 2989607.85 frames. ], batch size: 94, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 12:55:46,732 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3191220.0, ans=0.1 2024-08-15 12:55:51,655 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3191220.0, ans=0.1 2024-08-15 12:56:04,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3191320.0, ans=0.2 2024-08-15 12:56:32,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3191520.0, ans=0.1 2024-08-15 12:56:33,794 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3191520.0, ans=0.125 2024-08-15 12:56:48,365 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 29 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-15 12:56:58,860 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 350, loss[loss=0.1006, beats_loss=0.009586, ecapa_loss=0.0001497, whisper_loss=0.08956, over 19305.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01028, ecapa_loss=0.0001494, whisper_loss=0.08875, over 3174918.71 frames. ], batch size: 76, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 12:57:09,039 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.89 vs. limit=15.0 2024-08-15 12:57:09,125 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.62 vs. limit=10.0 2024-08-15 12:57:10,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3191720.0, ans=0.2 2024-08-15 12:57:12,589 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.755e+01 2.342e+01 2.524e+01 2.862e+01 4.157e+01, threshold=5.047e+01, percent-clipped=0.0 2024-08-15 12:57:19,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3191820.0, ans=0.125 2024-08-15 12:57:33,760 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-15 12:57:38,479 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3191920.0, ans=0.125 2024-08-15 12:57:40,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3191920.0, ans=0.05 2024-08-15 12:57:49,297 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 24 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-15 12:57:58,608 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3192120.0, ans=0.125 2024-08-15 12:58:16,052 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 400, loss[loss=0.07915, beats_loss=0.008263, ecapa_loss=0.0001676, whisper_loss=0.06921, over 16729.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01027, ecapa_loss=0.00015, whisper_loss=0.08846, over 3277227.48 frames. ], batch size: 65, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 12:58:30,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3192320.0, ans=0.2 2024-08-15 12:58:40,877 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 23 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-15 12:58:43,592 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 11 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-15 12:58:50,162 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 19 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-15 12:58:51,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=3192420.0, ans=0.5 2024-08-15 12:59:01,467 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.41 vs. limit=22.5 2024-08-15 12:59:07,055 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-15 12:59:16,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3192520.0, ans=0.125 2024-08-15 12:59:26,136 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 20 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-15 12:59:30,850 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 29 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-15 12:59:35,513 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 450, loss[loss=0.07137, beats_loss=0.01115, ecapa_loss=0.0001664, whisper_loss=0.05856, over 12782.00 frames. ], tot_loss[loss=0.0999, beats_loss=0.01037, ecapa_loss=0.00015, whisper_loss=0.08803, over 3370530.65 frames. ], batch size: 54, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 12:59:42,644 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 21 from LS+wenet, 21 from Vox, 17 fro AS 2024-08-15 12:59:46,283 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 21 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-15 12:59:49,623 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.262e+01 2.454e+01 2.784e+01 4.737e+01, threshold=4.907e+01, percent-clipped=0.0 2024-08-15 12:59:53,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3192820.0, ans=0.07 2024-08-15 13:00:14,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3192920.0, ans=0.0 2024-08-15 13:00:46,871 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 25 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-15 13:00:50,376 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 24 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-15 13:00:58,591 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 500, loss[loss=0.1039, beats_loss=0.01153, ecapa_loss=0.0001269, whisper_loss=0.09106, over 23471.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01039, ecapa_loss=0.0001485, whisper_loss=0.08836, over 3460515.04 frames. ], batch size: 90, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:01:13,620 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-15 13:01:14,121 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.17 vs. limit=12.0 2024-08-15 13:01:45,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3193420.0, ans=0.0 2024-08-15 13:02:05,905 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 29 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-15 13:02:30,193 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 550, loss[loss=0.1012, beats_loss=0.0117, ecapa_loss=0.0001214, whisper_loss=0.08828, over 18814.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01034, ecapa_loss=0.0001489, whisper_loss=0.08898, over 3519923.39 frames. ], batch size: 73, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:02:45,514 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.695e+01 2.304e+01 2.513e+01 2.793e+01 3.514e+01, threshold=5.025e+01, percent-clipped=0.0 2024-08-15 13:02:59,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3193820.0, ans=0.0 2024-08-15 13:02:59,757 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3193820.0, ans=0.0 2024-08-15 13:03:05,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3193920.0, ans=0.0 2024-08-15 13:03:07,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3193920.0, ans=0.0 2024-08-15 13:03:07,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3193920.0, ans=0.125 2024-08-15 13:03:11,488 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3193920.0, ans=0.125 2024-08-15 13:03:17,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3193920.0, ans=0.05 2024-08-15 13:03:35,259 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 36 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-15 13:03:39,049 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-15 13:03:44,608 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3194120.0, ans=0.125 2024-08-15 13:03:47,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3194120.0, ans=0.1 2024-08-15 13:03:50,896 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.24 vs. limit=6.0 2024-08-15 13:03:56,377 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 600, loss[loss=0.09245, beats_loss=0.01167, ecapa_loss=0.0001294, whisper_loss=0.07949, over 18710.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01029, ecapa_loss=0.0001484, whisper_loss=0.09004, over 3602557.54 frames. ], batch size: 69, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:04:01,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3194220.0, ans=0.125 2024-08-15 13:04:06,297 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 27 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-15 13:04:09,692 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 13:04:12,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3194320.0, ans=0.0 2024-08-15 13:04:21,706 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.58 vs. limit=15.0 2024-08-15 13:04:31,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3194420.0, ans=0.1 2024-08-15 13:04:32,602 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 23 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-15 13:04:39,187 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 12 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-15 13:04:44,451 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 23 from LS+wenet, 17 from Vox, 16 fro AS 2024-08-15 13:04:51,390 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 17 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-15 13:05:04,788 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 24 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-15 13:05:07,079 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 650, loss[loss=0.07959, beats_loss=0.01173, ecapa_loss=0.0001732, whisper_loss=0.06612, over 17933.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01031, ecapa_loss=0.0001488, whisper_loss=0.09028, over 3655784.52 frames. ], batch size: 75, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:05:18,200 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.866e+01 2.357e+01 2.569e+01 2.869e+01 2.947e+02, threshold=5.138e+01, percent-clipped=4.0 2024-08-15 13:05:40,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3194920.0, ans=0.2 2024-08-15 13:05:41,701 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.08 vs. limit=15.0 2024-08-15 13:05:47,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3195020.0, ans=0.0 2024-08-15 13:06:00,906 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3195120.0, ans=0.1 2024-08-15 13:06:08,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3195120.0, ans=0.125 2024-08-15 13:06:12,309 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 700, loss[loss=0.1126, beats_loss=0.009322, ecapa_loss=0.0001516, whisper_loss=0.1018, over 20836.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01027, ecapa_loss=0.000149, whisper_loss=0.09086, over 3721917.34 frames. ], batch size: 85, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:06:13,804 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 26 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-15 13:06:39,586 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.58 vs. limit=22.5 2024-08-15 13:06:45,156 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-15 13:06:47,799 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3195420.0, ans=0.0 2024-08-15 13:06:47,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3195420.0, ans=0.2 2024-08-15 13:07:02,331 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 20 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-15 13:07:16,522 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 750, loss[loss=0.1237, beats_loss=0.007943, ecapa_loss=0.0001532, whisper_loss=0.1142, over 23152.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01037, ecapa_loss=0.0001488, whisper_loss=0.09051, over 3761190.63 frames. ], batch size: 90, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:07:19,221 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 23 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-15 13:07:19,959 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.20 vs. limit=22.5 2024-08-15 13:07:21,773 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 23 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-15 13:07:28,511 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.716e+01 2.327e+01 2.582e+01 2.848e+01 1.200e+02, threshold=5.164e+01, percent-clipped=2.0 2024-08-15 13:07:29,364 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.30 vs. limit=15.0 2024-08-15 13:07:39,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3195820.0, ans=0.125 2024-08-15 13:07:54,978 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 13:07:57,256 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-15 13:07:59,813 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 23 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-15 13:08:11,755 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-15 13:08:14,809 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.74 vs. limit=22.5 2024-08-15 13:08:21,599 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 800, loss[loss=0.09722, beats_loss=0.008793, ecapa_loss=0.0002172, whisper_loss=0.08625, over 21112.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01026, ecapa_loss=0.0001489, whisper_loss=0.09082, over 3781667.01 frames. ], batch size: 91, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:08:26,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3196220.0, ans=0.09899494936611666 2024-08-15 13:08:31,063 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 13 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-15 13:09:01,600 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.78 vs. limit=15.0 2024-08-15 13:09:13,190 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 32 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-15 13:09:23,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3196620.0, ans=0.125 2024-08-15 13:09:24,586 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 18 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-15 13:09:27,232 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 850, loss[loss=0.09377, beats_loss=0.01087, ecapa_loss=0.0001691, whisper_loss=0.08121, over 21591.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01032, ecapa_loss=0.0001488, whisper_loss=0.0904, over 3790027.00 frames. ], batch size: 91, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:09:32,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3196720.0, ans=0.1 2024-08-15 13:09:36,743 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3196720.0, ans=0.125 2024-08-15 13:09:38,921 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.952e+01 2.346e+01 2.636e+01 2.893e+01 3.086e+02, threshold=5.271e+01, percent-clipped=3.0 2024-08-15 13:09:49,899 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3196820.0, ans=0.2 2024-08-15 13:09:54,997 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3196920.0, ans=0.5 2024-08-15 13:09:58,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3196920.0, ans=0.0 2024-08-15 13:10:33,014 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 900, loss[loss=0.0919, beats_loss=0.01087, ecapa_loss=0.0001408, whisper_loss=0.07962, over 18053.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01025, ecapa_loss=0.0001497, whisper_loss=0.0902, over 3785095.81 frames. ], batch size: 72, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:10:38,510 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3197220.0, ans=0.0 2024-08-15 13:10:41,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3197220.0, ans=0.1 2024-08-15 13:10:54,461 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3197320.0, ans=0.0 2024-08-15 13:11:04,426 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 23 from LS+wenet, 30 from Vox, 39 fro AS 2024-08-15 13:11:08,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3197420.0, ans=0.0 2024-08-15 13:11:15,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3197520.0, ans=0.2 2024-08-15 13:11:17,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3197520.0, ans=0.0 2024-08-15 13:11:26,526 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 38 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-15 13:11:28,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=3197620.0, ans=22.5 2024-08-15 13:11:38,298 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 950, loss[loss=0.1214, beats_loss=0.009411, ecapa_loss=0.0001465, whisper_loss=0.1105, over 23463.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01035, ecapa_loss=0.0001488, whisper_loss=0.08998, over 3799084.65 frames. ], batch size: 91, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:11:43,841 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.23 vs. limit=22.5 2024-08-15 13:11:44,704 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-15 13:11:44,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3197720.0, ans=0.125 2024-08-15 13:11:50,138 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.722e+01 2.292e+01 2.595e+01 2.867e+01 1.968e+02, threshold=5.190e+01, percent-clipped=1.0 2024-08-15 13:11:55,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3197820.0, ans=0.035 2024-08-15 13:12:18,374 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-15 13:12:22,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3198020.0, ans=0.125 2024-08-15 13:12:32,745 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3198120.0, ans=0.2 2024-08-15 13:12:37,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3198120.0, ans=0.1 2024-08-15 13:12:44,257 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 1000, loss[loss=0.106, beats_loss=0.006715, ecapa_loss=0.0001682, whisper_loss=0.09764, over 17400.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01034, ecapa_loss=0.0001487, whisper_loss=0.08993, over 3798236.84 frames. ], batch size: 65, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:12:50,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3198220.0, ans=0.07 2024-08-15 13:12:53,785 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 28 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-15 13:13:09,672 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3198420.0, ans=0.125 2024-08-15 13:13:15,024 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 31 from Vox, 37 fro AS 2024-08-15 13:13:26,874 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3198520.0, ans=0.125 2024-08-15 13:13:45,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3198620.0, ans=0.125 2024-08-15 13:13:48,625 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 19 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-15 13:13:49,672 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 1050, loss[loss=0.1051, beats_loss=0.009909, ecapa_loss=0.0001427, whisper_loss=0.09375, over 14885.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01032, ecapa_loss=0.0001487, whisper_loss=0.09028, over 3810524.39 frames. ], batch size: 58, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:14:00,212 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 15 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-15 13:14:01,318 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.752e+01 2.361e+01 2.585e+01 2.930e+01 4.862e+01, threshold=5.170e+01, percent-clipped=0.0 2024-08-15 13:14:01,534 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 23 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-15 13:14:06,046 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.53 vs. limit=6.0 2024-08-15 13:14:06,539 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-15 13:14:10,570 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=4.776e-02 2024-08-15 13:14:29,901 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 26 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-15 13:14:32,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3199020.0, ans=0.125 2024-08-15 13:14:48,189 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-15 13:14:54,328 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 1100, loss[loss=0.09259, beats_loss=0.01186, ecapa_loss=0.0001307, whisper_loss=0.07942, over 17045.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01044, ecapa_loss=0.0001476, whisper_loss=0.08947, over 3810056.74 frames. ], batch size: 68, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:14:56,243 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3199220.0, ans=0.125 2024-08-15 13:14:59,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3199220.0, ans=0.04949747468305833 2024-08-15 13:14:59,947 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-15 13:15:03,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3199220.0, ans=0.125 2024-08-15 13:15:06,349 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3199320.0, ans=0.1 2024-08-15 13:15:14,892 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3199320.0, ans=0.0 2024-08-15 13:15:14,981 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3199320.0, ans=0.04949747468305833 2024-08-15 13:15:23,070 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-15 13:15:29,854 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.10 vs. limit=15.0 2024-08-15 13:15:38,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3199520.0, ans=0.2 2024-08-15 13:15:45,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3199620.0, ans=0.125 2024-08-15 13:15:48,170 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 17 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-15 13:15:59,820 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 1150, loss[loss=0.1063, beats_loss=0.008734, ecapa_loss=0.0001719, whisper_loss=0.09587, over 21804.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01045, ecapa_loss=0.0001479, whisper_loss=0.08937, over 3803453.26 frames. ], batch size: 90, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:15:59,935 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-15 13:16:08,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3199720.0, ans=0.125 2024-08-15 13:16:11,518 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.841e+01 2.313e+01 2.590e+01 2.898e+01 5.614e+01, threshold=5.180e+01, percent-clipped=1.0 2024-08-15 13:16:15,472 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 18 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-15 13:16:21,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3199820.0, ans=0.1 2024-08-15 13:16:26,750 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3199920.0, ans=0.125 2024-08-15 13:16:26,823 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3199920.0, ans=0.0 2024-08-15 13:16:32,805 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 38 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-15 13:16:35,362 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-320000.pt 2024-08-15 13:16:52,427 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-15 13:16:55,265 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3200120.0, ans=0.125 2024-08-15 13:17:04,830 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.60 vs. limit=15.0 2024-08-15 13:17:09,295 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 1200, loss[loss=0.1155, beats_loss=0.01003, ecapa_loss=0.0001213, whisper_loss=0.1043, over 22016.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01054, ecapa_loss=0.0001468, whisper_loss=0.09015, over 3866219.24 frames. ], batch size: 80, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:17:10,750 WARNING [optim.py:496] (0/4) Scaling gradients by 0.052070412784814835, model_norm_threshold=51.8048095703125 2024-08-15 13:17:10,938 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.18, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.799e+05, grad_sumsq=1.791e+07, orig_rms_sq=1.005e-02 2024-08-15 13:17:14,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3200220.0, ans=0.0 2024-08-15 13:17:20,411 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3200220.0, ans=0.0 2024-08-15 13:17:26,235 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.84 vs. limit=15.0 2024-08-15 13:17:38,608 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.87 vs. limit=22.5 2024-08-15 13:17:42,740 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 13:18:11,005 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 18 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-15 13:18:15,667 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 1250, loss[loss=0.1008, beats_loss=0.01101, ecapa_loss=0.0001057, whisper_loss=0.08869, over 21184.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01061, ecapa_loss=0.0001459, whisper_loss=0.08998, over 3833242.87 frames. ], batch size: 81, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:18:16,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3200720.0, ans=0.07 2024-08-15 13:18:23,610 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-15 13:18:27,338 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.694e+01 2.258e+01 2.452e+01 2.719e+01 9.949e+02, threshold=4.904e+01, percent-clipped=2.0 2024-08-15 13:18:30,436 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 20 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-15 13:18:31,990 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=3200820.0, ans=0.1 2024-08-15 13:18:33,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3200820.0, ans=0.0 2024-08-15 13:18:35,643 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-15 13:18:48,081 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.27 vs. limit=15.0 2024-08-15 13:18:54,666 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.21 vs. limit=15.0 2024-08-15 13:18:59,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3201020.0, ans=0.0 2024-08-15 13:19:00,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3201020.0, ans=0.125 2024-08-15 13:19:01,106 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.12 vs. limit=22.5 2024-08-15 13:19:10,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3201120.0, ans=0.125 2024-08-15 13:19:21,117 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 1300, loss[loss=0.1078, beats_loss=0.009817, ecapa_loss=0.0001256, whisper_loss=0.09675, over 15990.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01066, ecapa_loss=0.0001457, whisper_loss=0.08933, over 3827652.24 frames. ], batch size: 60, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:19:58,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3201420.0, ans=0.0 2024-08-15 13:20:13,223 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 18 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-15 13:20:22,672 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.31 vs. limit=22.5 2024-08-15 13:20:23,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3201620.0, ans=0.1 2024-08-15 13:20:27,037 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 1350, loss[loss=0.1063, beats_loss=0.00908, ecapa_loss=0.0001542, whisper_loss=0.09566, over 15175.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01071, ecapa_loss=0.0001456, whisper_loss=0.08931, over 3818858.60 frames. ], batch size: 58, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:20:28,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3201720.0, ans=0.0 2024-08-15 13:20:32,697 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 30 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-15 13:20:38,178 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 16 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-15 13:20:39,294 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.840e+01 2.225e+01 2.528e+01 2.736e+01 6.244e+01, threshold=5.056e+01, percent-clipped=1.0 2024-08-15 13:20:44,229 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 22 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-15 13:20:52,189 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 32 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-15 13:20:53,910 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3201920.0, ans=0.05 2024-08-15 13:21:04,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3201920.0, ans=0.125 2024-08-15 13:21:05,342 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 22 from LS+wenet, 26 from Vox, 25 fro AS 2024-08-15 13:21:10,315 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.13 vs. limit=15.0 2024-08-15 13:21:11,578 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3202020.0, ans=0.0 2024-08-15 13:21:32,272 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-15 13:21:34,897 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 1400, loss[loss=0.09265, beats_loss=0.01037, ecapa_loss=0.0002013, whisper_loss=0.08026, over 15829.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01064, ecapa_loss=0.0001452, whisper_loss=0.08977, over 3840832.35 frames. ], batch size: 68, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:21:51,010 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3202320.0, ans=0.125 2024-08-15 13:21:51,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3202320.0, ans=0.0 2024-08-15 13:21:51,345 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.49 vs. limit=15.0 2024-08-15 13:22:00,440 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 13:22:04,415 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 24 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-15 13:22:26,883 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3202520.0, ans=0.1 2024-08-15 13:22:28,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3202520.0, ans=0.2 2024-08-15 13:22:47,487 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 1450, loss[loss=0.09279, beats_loss=0.01087, ecapa_loss=0.0001543, whisper_loss=0.08037, over 22446.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01068, ecapa_loss=0.0001454, whisper_loss=0.08908, over 3817041.74 frames. ], batch size: 90, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:23:18,512 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-15 13:23:24,043 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.773e+01 2.256e+01 2.495e+01 2.819e+01 4.681e+02, threshold=4.990e+01, percent-clipped=2.0 2024-08-15 13:23:28,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3202820.0, ans=0.125 2024-08-15 13:23:29,705 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 27 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-15 13:23:35,254 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3202820.0, ans=0.125 2024-08-15 13:23:54,020 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 28 from LS+wenet, 23 from Vox, 19 fro AS 2024-08-15 13:24:02,775 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-15 13:24:25,776 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 1500, loss[loss=0.09132, beats_loss=0.01373, ecapa_loss=0.0001328, whisper_loss=0.07627, over 22943.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01063, ecapa_loss=0.0001452, whisper_loss=0.08915, over 3804990.80 frames. ], batch size: 93, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:24:32,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3203220.0, ans=0.1 2024-08-15 13:24:34,945 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3203220.0, ans=0.0 2024-08-15 13:24:38,417 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 24 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-15 13:24:48,064 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 23 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-15 13:24:52,140 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 27 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-15 13:25:00,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3203420.0, ans=0.0 2024-08-15 13:25:02,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3203420.0, ans=0.04949747468305833 2024-08-15 13:25:38,805 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 1550, loss[loss=0.08538, beats_loss=0.01038, ecapa_loss=0.0001149, whisper_loss=0.07385, over 16796.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01059, ecapa_loss=0.0001453, whisper_loss=0.08925, over 3808194.77 frames. ], batch size: 61, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:25:48,533 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.74 vs. limit=15.0 2024-08-15 13:25:51,800 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.256e+01 2.497e+01 2.794e+01 4.870e+01, threshold=4.993e+01, percent-clipped=0.0 2024-08-15 13:26:07,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3203920.0, ans=0.05 2024-08-15 13:26:24,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3204020.0, ans=0.125 2024-08-15 13:26:27,532 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 19 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-15 13:26:31,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3204020.0, ans=0.1 2024-08-15 13:26:36,505 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.491e-02 2024-08-15 13:26:37,497 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 22 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-15 13:26:54,822 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 1600, loss[loss=0.072, beats_loss=0.01449, ecapa_loss=0.000128, whisper_loss=0.05623, over 22553.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01054, ecapa_loss=0.000145, whisper_loss=0.08932, over 3821762.98 frames. ], batch size: 94, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:26:54,956 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-15 13:26:58,006 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.13 vs. limit=22.5 2024-08-15 13:27:02,983 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3204220.0, ans=0.125 2024-08-15 13:27:05,797 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-15 13:27:07,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3204220.0, ans=0.125 2024-08-15 13:27:21,711 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3204320.0, ans=0.125 2024-08-15 13:27:33,317 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 16 from LS+wenet, 26 from Vox, 21 fro AS 2024-08-15 13:27:46,024 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 27 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-15 13:27:57,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3204620.0, ans=0.125 2024-08-15 13:28:08,845 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 1650, loss[loss=0.104, beats_loss=0.01021, ecapa_loss=0.0001516, whisper_loss=0.09226, over 15781.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01047, ecapa_loss=0.0001444, whisper_loss=0.09, over 3851531.11 frames. ], batch size: 63, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:28:10,503 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 19 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-15 13:28:16,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3204720.0, ans=0.2 2024-08-15 13:28:21,705 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.261e+01 2.464e+01 2.812e+01 1.426e+02, threshold=4.927e+01, percent-clipped=1.0 2024-08-15 13:28:47,352 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.50 vs. limit=15.0 2024-08-15 13:28:54,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3205020.0, ans=0.125 2024-08-15 13:28:58,291 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3205020.0, ans=0.125 2024-08-15 13:29:01,183 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 31 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-15 13:29:14,633 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.99 vs. limit=15.0 2024-08-15 13:29:22,905 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 1700, loss[loss=0.1124, beats_loss=0.01069, ecapa_loss=0.000129, whisper_loss=0.1004, over 23551.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0104, ecapa_loss=0.0001434, whisper_loss=0.0905, over 3838827.77 frames. ], batch size: 92, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:29:55,409 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 18 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-15 13:30:12,174 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 20 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-15 13:30:13,854 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3205520.0, ans=0.125 2024-08-15 13:30:22,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3205620.0, ans=0.0 2024-08-15 13:30:22,677 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3205620.0, ans=0.125 2024-08-15 13:30:38,603 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 1750, loss[loss=0.08793, beats_loss=0.01094, ecapa_loss=0.0001673, whisper_loss=0.07532, over 20829.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01045, ecapa_loss=0.0001439, whisper_loss=0.09006, over 3828939.43 frames. ], batch size: 88, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:30:44,401 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 14 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-15 13:30:44,660 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3205720.0, ans=0.125 2024-08-15 13:30:51,548 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.735e+01 2.279e+01 2.476e+01 2.729e+01 6.838e+01, threshold=4.951e+01, percent-clipped=2.0 2024-08-15 13:31:19,284 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 23 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-15 13:31:33,117 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 17 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-15 13:31:36,707 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3206120.0, ans=0.125 2024-08-15 13:31:38,610 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.96 vs. limit=15.0 2024-08-15 13:31:53,504 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 1800, loss[loss=0.1001, beats_loss=0.008077, ecapa_loss=0.0001551, whisper_loss=0.09047, over 17604.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01041, ecapa_loss=0.0001441, whisper_loss=0.09001, over 3830016.56 frames. ], batch size: 67, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:32:09,263 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-15 13:32:15,311 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 21 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-15 13:32:19,304 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-15 13:32:21,001 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3206420.0, ans=0.0 2024-08-15 13:32:26,904 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3206420.0, ans=0.125 2024-08-15 13:32:50,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=3206620.0, ans=0.2 2024-08-15 13:33:06,822 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 1850, loss[loss=0.1182, beats_loss=0.0106, ecapa_loss=0.0001344, whisper_loss=0.1062, over 23591.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01034, ecapa_loss=0.0001447, whisper_loss=0.09045, over 3857351.51 frames. ], batch size: 92, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:33:09,245 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.83 vs. limit=15.0 2024-08-15 13:33:14,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3206720.0, ans=0.0 2024-08-15 13:33:20,042 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.257e+01 2.507e+01 2.743e+01 3.719e+01, threshold=5.013e+01, percent-clipped=0.0 2024-08-15 13:33:25,125 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.45 vs. limit=15.0 2024-08-15 13:33:26,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3206820.0, ans=0.0 2024-08-15 13:33:26,514 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.716e-03 2024-08-15 13:33:27,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3206820.0, ans=0.025 2024-08-15 13:33:49,001 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3206920.0, ans=0.125 2024-08-15 13:33:49,773 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-15 13:33:50,224 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.11 vs. limit=15.0 2024-08-15 13:33:52,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3207020.0, ans=0.0 2024-08-15 13:33:59,595 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 19 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-15 13:34:21,152 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 1900, loss[loss=0.05473, beats_loss=0.01328, ecapa_loss=0.0001744, whisper_loss=0.0397, over 12588.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01039, ecapa_loss=0.0001457, whisper_loss=0.08972, over 3827981.26 frames. ], batch size: 54, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:34:29,711 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.65 vs. limit=15.0 2024-08-15 13:34:52,474 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3207420.0, ans=0.0 2024-08-15 13:34:57,156 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 23 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-15 13:35:08,960 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3207520.0, ans=0.1 2024-08-15 13:35:29,788 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.97 vs. limit=12.0 2024-08-15 13:35:30,877 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 19 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-15 13:35:36,864 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 1950, loss[loss=0.1095, beats_loss=0.01105, ecapa_loss=0.0001657, whisper_loss=0.09675, over 16076.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01043, ecapa_loss=0.0001461, whisper_loss=0.08971, over 3813264.79 frames. ], batch size: 64, lr: 2.73e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:35:40,417 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3207720.0, ans=0.1 2024-08-15 13:35:49,887 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.666e+01 2.350e+01 2.550e+01 2.908e+01 4.451e+01, threshold=5.100e+01, percent-clipped=0.0 2024-08-15 13:36:00,034 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-15 13:36:01,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3207820.0, ans=0.1 2024-08-15 13:36:05,279 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.40 vs. limit=10.0 2024-08-15 13:36:08,671 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3207920.0, ans=0.0 2024-08-15 13:36:14,524 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.55 vs. limit=15.0 2024-08-15 13:36:16,228 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.61 vs. limit=15.0 2024-08-15 13:36:16,310 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.95 vs. limit=10.0 2024-08-15 13:36:17,289 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=3207920.0, ans=0.025 2024-08-15 13:36:18,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3207920.0, ans=0.0 2024-08-15 13:36:29,427 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3208020.0, ans=0.95 2024-08-15 13:36:44,270 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3208120.0, ans=0.125 2024-08-15 13:36:48,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3208120.0, ans=0.2 2024-08-15 13:36:50,744 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 2000, loss[loss=0.1058, beats_loss=0.006966, ecapa_loss=0.0001295, whisper_loss=0.09751, over 15943.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01055, ecapa_loss=0.0001444, whisper_loss=0.08919, over 3855593.48 frames. ], batch size: 55, lr: 2.73e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:37:03,455 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3208220.0, ans=0.125 2024-08-15 13:37:11,049 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=13.41 vs. limit=15.0 2024-08-15 13:37:24,375 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.54 vs. limit=15.0 2024-08-15 13:37:35,532 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.96 vs. limit=15.0 2024-08-15 13:37:39,105 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.40 vs. limit=15.0 2024-08-15 13:37:45,123 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.84 vs. limit=15.0 2024-08-15 13:38:08,451 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 2050, loss[loss=0.1201, beats_loss=0.008782, ecapa_loss=0.0001491, whisper_loss=0.1099, over 19227.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01054, ecapa_loss=0.0001447, whisper_loss=0.08917, over 3824497.94 frames. ], batch size: 76, lr: 2.73e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:38:08,883 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3208720.0, ans=0.1 2024-08-15 13:38:22,087 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.750e+01 2.259e+01 2.501e+01 2.773e+01 1.854e+02, threshold=5.002e+01, percent-clipped=2.0 2024-08-15 13:38:29,099 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.80 vs. limit=15.0 2024-08-15 13:38:38,432 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.10 vs. limit=22.5 2024-08-15 13:38:42,892 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 21 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-15 13:38:52,768 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3209020.0, ans=0.0 2024-08-15 13:39:05,634 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 22 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-15 13:39:15,976 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.042e-02 2024-08-15 13:39:22,692 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 2100, loss[loss=0.1142, beats_loss=0.007703, ecapa_loss=0.0001619, whisper_loss=0.1049, over 20935.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01052, ecapa_loss=0.0001437, whisper_loss=0.08929, over 3811759.40 frames. ], batch size: 81, lr: 2.73e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:39:28,568 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3209220.0, ans=0.0 2024-08-15 13:39:32,738 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3209220.0, ans=0.125 2024-08-15 13:39:38,787 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.66 vs. limit=12.0 2024-08-15 13:39:47,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3209320.0, ans=0.0 2024-08-15 13:39:54,266 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 15 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-15 13:40:14,831 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3209520.0, ans=0.0 2024-08-15 13:40:15,175 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.37 vs. limit=10.0 2024-08-15 13:40:15,901 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-15 13:40:33,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3209620.0, ans=0.0 2024-08-15 13:40:35,941 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 2150, loss[loss=0.09629, beats_loss=0.01096, ecapa_loss=0.0001493, whisper_loss=0.08383, over 19754.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01062, ecapa_loss=0.0001436, whisper_loss=0.08933, over 3829220.50 frames. ], batch size: 77, lr: 2.73e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:40:37,621 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 22 from LS+wenet, 20 from Vox, 51 fro AS 2024-08-15 13:40:49,100 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.347e+01 2.631e+01 2.979e+01 4.158e+01, threshold=5.262e+01, percent-clipped=0.0 2024-08-15 13:40:50,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3209820.0, ans=0.0 2024-08-15 13:41:01,632 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.879e-02 2024-08-15 13:41:09,927 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-15 13:41:14,247 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-15 13:41:15,925 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3209920.0, ans=0.0 2024-08-15 13:41:23,088 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3210020.0, ans=0.1 2024-08-15 13:41:25,986 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3210020.0, ans=0.015 2024-08-15 13:41:49,898 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 2200, loss[loss=0.1046, beats_loss=0.01243, ecapa_loss=0.0001339, whisper_loss=0.0908, over 21040.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01068, ecapa_loss=0.0001431, whisper_loss=0.0899, over 3837919.73 frames. ], batch size: 87, lr: 2.73e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:41:50,676 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.66 vs. limit=12.0 2024-08-15 13:42:09,167 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 34 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-15 13:42:10,784 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 23 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-15 13:42:20,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3210420.0, ans=0.1 2024-08-15 13:42:53,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3210620.0, ans=0.1 2024-08-15 13:42:57,913 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.745e+00 2024-08-15 13:43:04,769 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 2250, loss[loss=0.09061, beats_loss=0.01013, ecapa_loss=0.0001582, whisper_loss=0.0789, over 14800.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01061, ecapa_loss=0.0001457, whisper_loss=0.09025, over 3830324.50 frames. ], batch size: 63, lr: 2.73e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:43:17,994 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.966e+01 2.315e+01 2.592e+01 2.973e+01 1.052e+02, threshold=5.184e+01, percent-clipped=4.0 2024-08-15 13:43:44,842 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 38 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-15 13:43:49,211 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-15 13:43:59,870 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3211020.0, ans=0.2 2024-08-15 13:44:21,483 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 2300, loss[loss=0.1155, beats_loss=0.01081, ecapa_loss=0.000153, whisper_loss=0.1032, over 23345.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01064, ecapa_loss=0.0001458, whisper_loss=0.09115, over 3873433.40 frames. ], batch size: 92, lr: 2.73e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:44:34,929 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 19 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-15 13:44:44,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3211320.0, ans=0.0 2024-08-15 13:45:03,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3211420.0, ans=0.0 2024-08-15 13:45:08,079 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=3211420.0, ans=0.05 2024-08-15 13:45:16,046 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.52 vs. limit=15.0 2024-08-15 13:45:47,921 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 2350, loss[loss=0.1054, beats_loss=0.0113, ecapa_loss=0.0001501, whisper_loss=0.0926, over 22147.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01068, ecapa_loss=0.0001455, whisper_loss=0.09094, over 3859228.52 frames. ], batch size: 90, lr: 2.73e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:45:53,544 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 19 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-15 13:46:03,950 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.354e+01 2.614e+01 2.902e+01 1.801e+02, threshold=5.228e+01, percent-clipped=1.0 2024-08-15 13:46:14,967 WARNING [optim.py:496] (0/4) Scaling gradients by 0.0751657783985138, model_norm_threshold=52.2847900390625 2024-08-15 13:46:15,143 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.18, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.469e+04, grad_sumsq=8.469e+04, orig_rms_sq=1.000e+00 2024-08-15 13:46:20,983 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3211920.0, ans=0.125 2024-08-15 13:46:40,356 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3212020.0, ans=0.125 2024-08-15 13:46:48,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3212020.0, ans=0.0 2024-08-15 13:46:59,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3212120.0, ans=0.125 2024-08-15 13:47:13,692 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 2400, loss[loss=0.09803, beats_loss=0.01362, ecapa_loss=0.0001029, whisper_loss=0.08339, over 22277.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0106, ecapa_loss=0.0001471, whisper_loss=0.09114, over 3875382.33 frames. ], batch size: 89, lr: 2.73e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:47:14,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3212220.0, ans=0.1 2024-08-15 13:47:17,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3212220.0, ans=0.2 2024-08-15 13:47:26,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3212220.0, ans=0.0 2024-08-15 13:48:27,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3212620.0, ans=0.0 2024-08-15 13:48:32,575 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.28 vs. limit=6.0 2024-08-15 13:48:33,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3212620.0, ans=0.125 2024-08-15 13:48:35,882 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 2450, loss[loss=0.1123, beats_loss=0.01101, ecapa_loss=0.0001579, whisper_loss=0.09976, over 20085.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0106, ecapa_loss=0.0001473, whisper_loss=0.09056, over 3843785.65 frames. ], batch size: 80, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:48:49,808 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 15 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-15 13:48:51,660 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.822e+01 2.196e+01 2.471e+01 2.708e+01 6.956e+02, threshold=4.941e+01, percent-clipped=1.0 2024-08-15 13:49:08,763 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 11 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-15 13:49:10,789 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.44 vs. limit=6.0 2024-08-15 13:49:16,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3212920.0, ans=0.125 2024-08-15 13:49:25,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3213020.0, ans=0.125 2024-08-15 13:49:29,079 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 27 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-15 13:49:30,544 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-15 13:49:45,599 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3213120.0, ans=0.125 2024-08-15 13:49:49,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3213120.0, ans=0.0 2024-08-15 13:49:57,816 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 2500, loss[loss=0.1057, beats_loss=0.01165, ecapa_loss=0.0001325, whisper_loss=0.09276, over 18142.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01058, ecapa_loss=0.0001479, whisper_loss=0.09042, over 3851275.99 frames. ], batch size: 74, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:50:08,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3213220.0, ans=0.125 2024-08-15 13:50:09,308 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 29 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-15 13:50:09,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3213220.0, ans=0.125 2024-08-15 13:50:13,948 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3213320.0, ans=0.125 2024-08-15 13:50:24,044 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 28 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-15 13:50:37,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=3213420.0, ans=0.05 2024-08-15 13:50:39,899 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.69 vs. limit=15.0 2024-08-15 13:50:45,996 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 17 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-15 13:50:46,217 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3213420.0, ans=0.1 2024-08-15 13:51:19,831 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 27 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-15 13:51:21,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3213720.0, ans=0.0 2024-08-15 13:51:22,767 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 2550, loss[loss=0.07114, beats_loss=0.01475, ecapa_loss=0.0001088, whisper_loss=0.05531, over 14053.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01054, ecapa_loss=0.0001471, whisper_loss=0.09115, over 3868646.85 frames. ], batch size: 55, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:51:37,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3213720.0, ans=0.125 2024-08-15 13:51:38,378 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.851e+01 2.247e+01 2.527e+01 2.799e+01 4.421e+01, threshold=5.053e+01, percent-clipped=0.0 2024-08-15 13:52:09,259 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 20 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-15 13:52:38,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3214120.0, ans=0.125 2024-08-15 13:52:44,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3214120.0, ans=0.2 2024-08-15 13:52:46,204 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3214120.0, ans=0.1 2024-08-15 13:52:52,167 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 2600, loss[loss=0.1124, beats_loss=0.01081, ecapa_loss=0.0001611, whisper_loss=0.1, over 20551.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01048, ecapa_loss=0.000148, whisper_loss=0.09123, over 3834017.98 frames. ], batch size: 84, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:52:55,297 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3214220.0, ans=0.125 2024-08-15 13:53:07,257 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 17 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-15 13:53:14,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3214320.0, ans=0.125 2024-08-15 13:53:18,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3214320.0, ans=0.125 2024-08-15 13:53:29,489 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-15 13:53:54,716 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 20 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-15 13:54:02,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3214620.0, ans=0.0 2024-08-15 13:54:17,108 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 2650, loss[loss=0.09858, beats_loss=0.01228, ecapa_loss=9.595e-05, whisper_loss=0.08534, over 17186.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01044, ecapa_loss=0.000148, whisper_loss=0.09095, over 3809989.31 frames. ], batch size: 64, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:54:32,105 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.900e+01 2.308e+01 2.516e+01 2.935e+01 7.349e+01, threshold=5.032e+01, percent-clipped=1.0 2024-08-15 13:54:44,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3214820.0, ans=0.125 2024-08-15 13:55:01,953 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3214920.0, ans=0.0 2024-08-15 13:55:03,204 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3214920.0, ans=0.1 2024-08-15 13:55:16,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3215020.0, ans=0.0 2024-08-15 13:55:17,837 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.35 vs. limit=15.0 2024-08-15 13:55:40,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3215220.0, ans=0.2 2024-08-15 13:55:41,772 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 2700, loss[loss=0.1254, beats_loss=0.008961, ecapa_loss=0.0001864, whisper_loss=0.1146, over 20206.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01047, ecapa_loss=0.0001477, whisper_loss=0.09097, over 3860957.74 frames. ], batch size: 83, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:55:42,506 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 27 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-15 13:55:49,419 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.24 vs. limit=15.0 2024-08-15 13:56:09,415 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.05 vs. limit=15.0 2024-08-15 13:56:13,515 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3215320.0, ans=0.125 2024-08-15 13:56:22,860 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.08 vs. limit=15.0 2024-08-15 13:56:24,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3215420.0, ans=0.0 2024-08-15 13:56:36,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3215520.0, ans=0.1 2024-08-15 13:56:49,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3215620.0, ans=0.025 2024-08-15 13:57:00,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3215620.0, ans=0.1 2024-08-15 13:57:02,500 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.05 vs. limit=15.0 2024-08-15 13:57:07,332 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 21 from LS+wenet, 8 from Vox, 28 fro AS 2024-08-15 13:57:08,438 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 2750, loss[loss=0.1127, beats_loss=0.01111, ecapa_loss=0.0001089, whisper_loss=0.1005, over 15256.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0105, ecapa_loss=0.0001474, whisper_loss=0.09096, over 3903986.44 frames. ], batch size: 57, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:57:20,848 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 25 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-15 13:57:23,638 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.865e+01 2.384e+01 2.723e+01 3.158e+01 5.499e+01, threshold=5.446e+01, percent-clipped=1.0 2024-08-15 13:58:16,532 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 24 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-15 13:58:20,675 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 20 from LS+wenet, 30 from Vox, 40 fro AS 2024-08-15 13:58:23,841 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 20 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-15 13:58:25,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3216120.0, ans=0.09899494936611666 2024-08-15 13:58:27,578 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3216120.0, ans=0.1 2024-08-15 13:58:32,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3216120.0, ans=0.125 2024-08-15 13:58:35,309 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 2800, loss[loss=0.0971, beats_loss=0.009351, ecapa_loss=0.0001777, whisper_loss=0.08597, over 17423.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01052, ecapa_loss=0.0001473, whisper_loss=0.0911, over 3897858.60 frames. ], batch size: 69, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:58:37,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3216220.0, ans=0.0 2024-08-15 13:58:46,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3216220.0, ans=0.0 2024-08-15 13:58:48,005 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3216220.0, ans=0.125 2024-08-15 13:58:51,043 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3216320.0, ans=0.125 2024-08-15 13:59:32,625 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3216520.0, ans=0.125 2024-08-15 13:59:38,607 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 20 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-15 13:59:49,592 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 23 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-15 13:59:51,732 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.28 vs. limit=15.0 2024-08-15 13:59:59,024 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3216620.0, ans=0.2 2024-08-15 14:00:00,466 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.11 vs. limit=15.0 2024-08-15 14:00:02,947 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 2850, loss[loss=0.1013, beats_loss=0.01252, ecapa_loss=0.0001207, whisper_loss=0.08755, over 22026.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01053, ecapa_loss=0.0001469, whisper_loss=0.09101, over 3877745.65 frames. ], batch size: 89, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:00:19,263 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.948e+01 2.378e+01 2.685e+01 2.976e+01 3.795e+01, threshold=5.370e+01, percent-clipped=0.0 2024-08-15 14:00:52,095 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-15 14:00:56,220 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 16 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-15 14:00:58,482 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 31 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-15 14:01:05,082 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-15 14:01:05,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3217020.0, ans=0.125 2024-08-15 14:01:19,686 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 24 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-15 14:01:30,188 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 2900, loss[loss=0.09202, beats_loss=0.009957, ecapa_loss=0.0001393, whisper_loss=0.08067, over 19324.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01059, ecapa_loss=0.0001481, whisper_loss=0.09096, over 3891735.16 frames. ], batch size: 75, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:01:39,273 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 17 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-15 14:01:56,783 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.19 vs. limit=8.0 2024-08-15 14:02:09,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3217420.0, ans=0.0 2024-08-15 14:02:15,382 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.76 vs. limit=22.5 2024-08-15 14:02:24,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3217520.0, ans=0.0 2024-08-15 14:02:51,755 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 2950, loss[loss=0.09595, beats_loss=0.01052, ecapa_loss=0.0001894, whisper_loss=0.08354, over 19942.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01055, ecapa_loss=0.0001489, whisper_loss=0.09105, over 3887406.13 frames. ], batch size: 89, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:02:53,879 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3217720.0, ans=0.125 2024-08-15 14:03:03,287 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-15 14:03:05,815 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3217720.0, ans=0.0 2024-08-15 14:03:06,701 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.950e+01 2.332e+01 2.577e+01 2.863e+01 4.280e+01, threshold=5.153e+01, percent-clipped=0.0 2024-08-15 14:03:13,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3217820.0, ans=0.0 2024-08-15 14:03:23,339 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.54 vs. limit=12.0 2024-08-15 14:03:25,623 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 19 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-15 14:03:42,189 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3218020.0, ans=0.2 2024-08-15 14:03:43,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3218020.0, ans=0.125 2024-08-15 14:03:45,922 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3218020.0, ans=0.0 2024-08-15 14:03:46,125 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.90 vs. limit=12.0 2024-08-15 14:04:15,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3218220.0, ans=0.125 2024-08-15 14:04:16,025 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 3000, loss[loss=0.1118, beats_loss=0.01107, ecapa_loss=0.0001453, whisper_loss=0.09928, over 22351.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0105, ecapa_loss=0.00015, whisper_loss=0.09128, over 3874441.50 frames. ], batch size: 87, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:04:16,027 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-15 14:04:39,948 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.3341, 3.1443, 2.5188, 2.8262], device='cuda:0') 2024-08-15 14:04:54,934 INFO [train_multi_KD3.py:1149] (0/4) Epoch 23, validation on ASR_libri: loss=0.2523, beats_loss=0, ecapa_loss=0.0005381, whisper_loss=0.2469, over 922467.00 frames. 2024-08-15 14:05:14,511 INFO [train_multi_KD3.py:1149] (0/4) Epoch 23, validation on SV_voxceleb1: loss=0.004148, beats_loss=0, ecapa_loss=0.0004148, whisper_loss=0, over 939242.00 frames. 2024-08-15 14:07:09,239 INFO [train_multi_KD3.py:1149] (0/4) Epoch 23, validation on AT_audioset: loss=0.02341, beats_loss=0.02341, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 14:07:09,243 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32601MB 2024-08-15 14:07:09,952 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3218220.0, ans=0.04949747468305833 2024-08-15 14:07:11,637 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.81 vs. limit=12.0 2024-08-15 14:07:25,636 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3218320.0, ans=0.1 2024-08-15 14:07:58,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3218520.0, ans=0.125 2024-08-15 14:08:15,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3218620.0, ans=0.125 2024-08-15 14:08:26,999 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.66 vs. limit=10.0 2024-08-15 14:08:31,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3218620.0, ans=0.0 2024-08-15 14:08:31,905 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.22 vs. limit=22.5 2024-08-15 14:08:33,888 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 3050, loss[loss=0.1078, beats_loss=0.01189, ecapa_loss=0.000151, whisper_loss=0.09438, over 22706.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01053, ecapa_loss=0.0001495, whisper_loss=0.09157, over 3872872.51 frames. ], batch size: 92, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:08:38,707 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 36 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-15 14:08:51,049 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.875e+01 2.307e+01 2.650e+01 2.894e+01 1.730e+02, threshold=5.300e+01, percent-clipped=1.0 2024-08-15 14:09:11,962 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3218920.0, ans=0.0 2024-08-15 14:09:15,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3218920.0, ans=0.07 2024-08-15 14:09:21,740 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=7.271e-01 2024-08-15 14:09:29,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3219020.0, ans=0.0 2024-08-15 14:09:29,257 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3219020.0, ans=0.2 2024-08-15 14:09:31,541 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3219020.0, ans=0.0 2024-08-15 14:09:32,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3219020.0, ans=0.125 2024-08-15 14:09:35,031 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 29 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-15 14:09:38,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3219020.0, ans=0.1 2024-08-15 14:10:01,168 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 3100, loss[loss=0.1108, beats_loss=0.007887, ecapa_loss=0.0002057, whisper_loss=0.1009, over 14402.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01051, ecapa_loss=0.0001504, whisper_loss=0.09188, over 3877084.91 frames. ], batch size: 57, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:10:17,497 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 16 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-15 14:10:20,550 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 31 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-15 14:10:24,241 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 32 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-15 14:10:34,264 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 20 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-15 14:10:34,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3219420.0, ans=0.2 2024-08-15 14:10:55,300 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-15 14:10:57,278 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 18 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-15 14:10:57,425 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3219520.0, ans=0.125 2024-08-15 14:11:20,354 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-15 14:11:21,646 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-15 14:11:22,802 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 3150, loss[loss=0.1134, beats_loss=0.01013, ecapa_loss=0.0001746, whisper_loss=0.1016, over 22737.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01057, ecapa_loss=0.0001499, whisper_loss=0.09165, over 3877763.62 frames. ], batch size: 92, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:11:38,449 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.945e+01 2.271e+01 2.467e+01 2.810e+01 4.738e+01, threshold=4.935e+01, percent-clipped=0.0 2024-08-15 14:11:39,162 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3219820.0, ans=0.1 2024-08-15 14:11:40,747 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 15 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-15 14:11:48,050 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-15 14:12:03,680 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 17 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-15 14:12:14,212 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.74 vs. limit=15.0 2024-08-15 14:12:17,328 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 20 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-15 14:12:17,665 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3220020.0, ans=0.125 2024-08-15 14:12:25,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3220020.0, ans=0.0 2024-08-15 14:12:30,848 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 21 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-15 14:12:35,443 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.73 vs. limit=10.0 2024-08-15 14:12:40,598 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.66 vs. limit=22.5 2024-08-15 14:12:40,776 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.02 vs. limit=15.0 2024-08-15 14:12:42,520 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-15 14:12:44,188 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 15 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-15 14:12:44,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3220120.0, ans=0.1 2024-08-15 14:12:48,117 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 3200, loss[loss=0.08464, beats_loss=0.01304, ecapa_loss=0.0001316, whisper_loss=0.07029, over 22006.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01053, ecapa_loss=0.0001498, whisper_loss=0.09199, over 3884957.52 frames. ], batch size: 90, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:12:58,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3220220.0, ans=0.0 2024-08-15 14:13:23,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3220420.0, ans=0.1 2024-08-15 14:13:24,719 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 43 from LS+wenet, 27 from Vox, 24 fro AS 2024-08-15 14:13:45,593 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.82 vs. limit=22.5 2024-08-15 14:13:50,657 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.61 vs. limit=22.5 2024-08-15 14:13:55,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3220620.0, ans=0.2 2024-08-15 14:14:03,114 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 19 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-15 14:14:07,089 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3220620.0, ans=0.125 2024-08-15 14:14:14,086 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 3250, loss[loss=0.1015, beats_loss=0.009475, ecapa_loss=0.000219, whisper_loss=0.08983, over 21064.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01043, ecapa_loss=0.0001514, whisper_loss=0.09253, over 3881754.44 frames. ], batch size: 88, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:14:15,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3220720.0, ans=0.125 2024-08-15 14:14:26,107 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 24 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-15 14:14:30,847 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.376e+01 2.667e+01 3.123e+01 1.417e+02, threshold=5.334e+01, percent-clipped=1.0 2024-08-15 14:14:31,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3220820.0, ans=0.125 2024-08-15 14:14:36,031 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 21 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-15 14:15:09,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3221020.0, ans=0.1 2024-08-15 14:15:13,700 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3221020.0, ans=0.0 2024-08-15 14:15:34,962 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 28 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-15 14:15:38,090 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 3300, loss[loss=0.09334, beats_loss=0.009043, ecapa_loss=0.0001461, whisper_loss=0.08283, over 16744.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01045, ecapa_loss=0.0001508, whisper_loss=0.09215, over 3841943.81 frames. ], batch size: 61, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:16:07,199 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 20 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-15 14:16:33,511 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3221520.0, ans=0.1 2024-08-15 14:16:47,008 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-15 14:16:47,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3221620.0, ans=0.07 2024-08-15 14:16:51,436 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3221620.0, ans=0.125 2024-08-15 14:17:04,229 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 3350, loss[loss=0.08748, beats_loss=0.01294, ecapa_loss=0.0001632, whisper_loss=0.0729, over 21965.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.0105, ecapa_loss=0.0001505, whisper_loss=0.09234, over 3872233.89 frames. ], batch size: 93, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:17:19,380 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.262e+01 2.579e+01 2.848e+01 8.552e+01, threshold=5.158e+01, percent-clipped=1.0 2024-08-15 14:17:49,085 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.10 vs. limit=15.0 2024-08-15 14:18:24,171 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 26 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-15 14:18:27,903 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.69 vs. limit=15.0 2024-08-15 14:18:29,166 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=3222220.0, ans=0.05 2024-08-15 14:18:29,825 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 3400, loss[loss=0.09184, beats_loss=0.01287, ecapa_loss=0.0001186, whisper_loss=0.07778, over 14990.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01055, ecapa_loss=0.000149, whisper_loss=0.09114, over 3872927.28 frames. ], batch size: 59, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:18:37,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3222220.0, ans=0.1 2024-08-15 14:18:40,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3222220.0, ans=0.0 2024-08-15 14:18:47,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3222320.0, ans=0.0 2024-08-15 14:18:52,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3222320.0, ans=0.125 2024-08-15 14:18:55,724 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.185e-02 2024-08-15 14:19:44,376 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3222620.0, ans=0.2 2024-08-15 14:19:45,615 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3222620.0, ans=0.07 2024-08-15 14:19:45,829 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.23 vs. limit=15.0 2024-08-15 14:19:51,252 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 3450, loss[loss=0.04782, beats_loss=0.01249, ecapa_loss=0.0001328, whisper_loss=0.03401, over 14179.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01062, ecapa_loss=0.0001494, whisper_loss=0.0901, over 3852583.96 frames. ], batch size: 56, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:20:07,407 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.344e+01 2.608e+01 2.883e+01 4.857e+01, threshold=5.217e+01, percent-clipped=0.0 2024-08-15 14:20:09,471 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.51 vs. limit=15.0 2024-08-15 14:20:38,297 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3222920.0, ans=0.125 2024-08-15 14:20:43,352 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2024-08-15 14:20:49,738 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 17 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-15 14:20:50,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3223020.0, ans=0.125 2024-08-15 14:21:17,678 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 3500, loss[loss=0.07792, beats_loss=0.01042, ecapa_loss=0.0001864, whisper_loss=0.06564, over 18511.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01061, ecapa_loss=0.0001499, whisper_loss=0.09008, over 3808391.41 frames. ], batch size: 75, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:21:25,130 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 36 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-15 14:21:28,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3223220.0, ans=0.2 2024-08-15 14:21:33,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3223320.0, ans=0.1 2024-08-15 14:21:42,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3223320.0, ans=0.125 2024-08-15 14:21:42,613 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=6.419e-02 2024-08-15 14:21:53,531 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 29 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-15 14:21:53,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3223320.0, ans=0.2 2024-08-15 14:22:04,811 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 22 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-15 14:22:19,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3223520.0, ans=0.125 2024-08-15 14:22:22,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3223520.0, ans=0.0 2024-08-15 14:22:28,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3223520.0, ans=0.1 2024-08-15 14:22:30,217 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3223520.0, ans=0.0 2024-08-15 14:22:43,189 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 18 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-15 14:22:49,490 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 3550, loss[loss=0.09665, beats_loss=0.01212, ecapa_loss=0.0001333, whisper_loss=0.0832, over 22201.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01062, ecapa_loss=0.0001495, whisper_loss=0.09009, over 3823882.31 frames. ], batch size: 89, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:22:51,503 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 31 from Vox, 37 fro AS 2024-08-15 14:22:51,782 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3223720.0, ans=0.0 2024-08-15 14:22:55,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3223720.0, ans=0.125 2024-08-15 14:22:57,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3223720.0, ans=0.125 2024-08-15 14:23:02,783 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.914e+01 2.289e+01 2.498e+01 2.772e+01 4.287e+01, threshold=4.995e+01, percent-clipped=0.0 2024-08-15 14:23:07,169 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 27 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-15 14:23:19,953 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 24 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-15 14:23:21,790 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3223920.0, ans=0.09899494936611666 2024-08-15 14:23:53,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3224020.0, ans=0.1 2024-08-15 14:24:10,158 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 17 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-15 14:24:14,341 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-15 14:24:25,604 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 3600, loss[loss=0.118, beats_loss=0.009427, ecapa_loss=0.0001587, whisper_loss=0.1069, over 23231.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01059, ecapa_loss=0.0001492, whisper_loss=0.09071, over 3844003.59 frames. ], batch size: 88, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:24:38,154 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 18 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-15 14:24:38,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3224220.0, ans=0.0 2024-08-15 14:24:39,837 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 17 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-15 14:24:44,624 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 26 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-15 14:25:20,172 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 26 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-15 14:25:27,651 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.61 vs. limit=15.0 2024-08-15 14:25:36,272 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 23 from LS+wenet, 24 from Vox, 18 fro AS 2024-08-15 14:25:59,422 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3224620.0, ans=0.2 2024-08-15 14:25:59,991 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=16.14 vs. limit=15.0 2024-08-15 14:26:09,063 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 3650, loss[loss=0.09148, beats_loss=0.01171, ecapa_loss=0.0001501, whisper_loss=0.07827, over 22168.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01062, ecapa_loss=0.0001488, whisper_loss=0.09041, over 3820815.65 frames. ], batch size: 93, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:26:09,507 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3224720.0, ans=0.125 2024-08-15 14:26:26,133 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.29 vs. limit=15.0 2024-08-15 14:26:28,693 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3224720.0, ans=0.1 2024-08-15 14:26:30,269 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.918e+01 2.327e+01 2.522e+01 2.934e+01 4.655e+01, threshold=5.044e+01, percent-clipped=0.0 2024-08-15 14:26:54,232 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3224820.0, ans=0.0 2024-08-15 14:27:17,289 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3224920.0, ans=0.0 2024-08-15 14:27:27,562 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.42 vs. limit=15.0 2024-08-15 14:27:33,920 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3225020.0, ans=0.125 2024-08-15 14:27:34,755 WARNING [optim.py:496] (0/4) Scaling gradients by 0.05073240399360657, model_norm_threshold=50.43817901611328 2024-08-15 14:27:34,955 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.16, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.552e+05, grad_sumsq=1.540e+07, orig_rms_sq=1.008e-02 2024-08-15 14:28:04,077 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3225120.0, ans=0.125 2024-08-15 14:28:22,659 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.80 vs. limit=15.0 2024-08-15 14:28:23,224 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 3700, loss[loss=0.09026, beats_loss=0.009171, ecapa_loss=0.000151, whisper_loss=0.07958, over 14262.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01059, ecapa_loss=0.0001484, whisper_loss=0.09062, over 3817566.07 frames. ], batch size: 54, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:28:29,199 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 37 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-15 14:28:33,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3225220.0, ans=0.125 2024-08-15 14:28:41,449 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3225220.0, ans=0.2 2024-08-15 14:28:41,527 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3225220.0, ans=0.125 2024-08-15 14:28:41,538 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3225220.0, ans=0.0 2024-08-15 14:28:53,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3225320.0, ans=0.0 2024-08-15 14:29:09,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3225320.0, ans=0.2 2024-08-15 14:29:17,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3225420.0, ans=0.1 2024-08-15 14:29:25,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3225420.0, ans=0.125 2024-08-15 14:29:53,335 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 25 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-15 14:29:55,681 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 20 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-15 14:30:18,511 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-15 14:30:29,625 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.94 vs. limit=15.0 2024-08-15 14:30:37,135 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 3750, loss[loss=0.1165, beats_loss=0.01067, ecapa_loss=0.000144, whisper_loss=0.1044, over 23764.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01067, ecapa_loss=0.0001487, whisper_loss=0.09037, over 3834136.22 frames. ], batch size: 92, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:30:50,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3225720.0, ans=0.07 2024-08-15 14:31:00,452 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.879e+01 2.295e+01 2.515e+01 2.786e+01 9.942e+02, threshold=5.030e+01, percent-clipped=1.0 2024-08-15 14:31:12,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3225820.0, ans=0.125 2024-08-15 14:31:42,821 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 28 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-15 14:32:08,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3226020.0, ans=0.0 2024-08-15 14:32:13,089 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 22 from LS+wenet, 31 from Vox, 34 fro AS 2024-08-15 14:32:20,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3226120.0, ans=0.0 2024-08-15 14:32:36,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3226220.0, ans=0.125 2024-08-15 14:32:38,020 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 3800, loss[loss=0.1026, beats_loss=0.0126, ecapa_loss=0.000127, whisper_loss=0.08877, over 17300.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01068, ecapa_loss=0.0001496, whisper_loss=0.09001, over 3850932.08 frames. ], batch size: 69, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:32:38,437 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3226220.0, ans=0.0 2024-08-15 14:32:53,112 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.32 vs. limit=15.0 2024-08-15 14:33:00,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3226320.0, ans=0.1 2024-08-15 14:33:09,865 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2024-08-15 14:33:14,244 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 28 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-15 14:33:20,003 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 26 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-15 14:33:37,518 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.88 vs. limit=12.0 2024-08-15 14:34:10,469 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 3850, loss[loss=0.1016, beats_loss=0.01025, ecapa_loss=0.0001218, whisper_loss=0.09013, over 16700.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01065, ecapa_loss=0.0001492, whisper_loss=0.08948, over 3834915.30 frames. ], batch size: 65, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:34:27,401 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.293e+01 2.527e+01 2.817e+01 3.723e+01, threshold=5.053e+01, percent-clipped=0.0 2024-08-15 14:34:43,262 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3226820.0, ans=0.0 2024-08-15 14:34:56,239 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 21 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-15 14:35:17,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3227020.0, ans=0.0 2024-08-15 14:35:22,352 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-15 14:35:22,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3227120.0, ans=0.1 2024-08-15 14:35:43,176 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 3900, loss[loss=0.08966, beats_loss=0.01244, ecapa_loss=0.0001251, whisper_loss=0.07597, over 18478.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01061, ecapa_loss=0.0001493, whisper_loss=0.0905, over 3875256.18 frames. ], batch size: 73, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:35:54,114 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 28 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-15 14:35:59,799 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3227320.0, ans=0.0 2024-08-15 14:36:03,499 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3227320.0, ans=0.0 2024-08-15 14:36:06,334 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.47 vs. limit=15.0 2024-08-15 14:36:31,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3227420.0, ans=0.125 2024-08-15 14:36:44,677 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3227520.0, ans=0.125 2024-08-15 14:37:08,553 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=15.78 vs. limit=15.0 2024-08-15 14:37:10,547 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 3950, loss[loss=0.09372, beats_loss=0.01058, ecapa_loss=0.000172, whisper_loss=0.08142, over 22439.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01052, ecapa_loss=0.0001513, whisper_loss=0.09137, over 3889803.72 frames. ], batch size: 92, lr: 2.72e-03, grad_scale: 1.152921504606847e+18 2024-08-15 14:37:23,431 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-15 14:37:23,720 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3227720.0, ans=0.125 2024-08-15 14:37:26,154 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.995e+01 2.465e+01 2.719e+01 3.087e+01 1.515e+02, threshold=5.437e+01, percent-clipped=3.0 2024-08-15 14:37:30,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3227820.0, ans=0.125 2024-08-15 14:37:37,974 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3227820.0, ans=0.0 2024-08-15 14:37:45,165 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 22 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-15 14:38:20,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3228120.0, ans=0.125 2024-08-15 14:38:35,323 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-15 14:38:39,652 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 4000, loss[loss=0.1186, beats_loss=0.007752, ecapa_loss=0.0001845, whisper_loss=0.109, over 17282.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01055, ecapa_loss=0.000151, whisper_loss=0.09099, over 3932313.33 frames. ], batch size: 67, lr: 2.72e-03, grad_scale: 1.152921504606847e+18 2024-08-15 14:38:50,379 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 25 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-15 14:38:53,854 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3228320.0, ans=0.125 2024-08-15 14:39:07,550 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.09 vs. limit=15.0 2024-08-15 14:39:21,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3228420.0, ans=0.1 2024-08-15 14:39:36,568 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3228520.0, ans=0.125 2024-08-15 14:39:41,578 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3228520.0, ans=0.125 2024-08-15 14:39:53,478 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3228620.0, ans=0.125 2024-08-15 14:40:01,997 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 21 from LS+wenet, 9 from Vox, 36 fro AS 2024-08-15 14:40:05,355 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 4050, loss[loss=0.1044, beats_loss=0.0117, ecapa_loss=0.0001668, whisper_loss=0.09101, over 21165.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01047, ecapa_loss=0.0001518, whisper_loss=0.0916, over 3900722.75 frames. ], batch size: 90, lr: 2.72e-03, grad_scale: 1.152921504606847e+18 2024-08-15 14:40:24,188 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.850e+01 2.316e+01 2.614e+01 2.943e+01 4.388e+01, threshold=5.229e+01, percent-clipped=0.0 2024-08-15 14:40:24,527 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 22 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-15 14:40:57,816 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-15 14:41:19,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3229020.0, ans=0.0 2024-08-15 14:41:58,864 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 4100, loss[loss=0.1006, beats_loss=0.01237, ecapa_loss=0.0001012, whisper_loss=0.08722, over 23150.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01049, ecapa_loss=0.0001518, whisper_loss=0.09117, over 3901584.81 frames. ], batch size: 86, lr: 2.72e-03, grad_scale: 1.152921504606847e+18 2024-08-15 14:42:05,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3229220.0, ans=0.0 2024-08-15 14:42:42,820 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3229320.0, ans=0.125 2024-08-15 14:42:48,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3229420.0, ans=0.125 2024-08-15 14:42:51,323 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 15 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-15 14:43:03,858 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.68 vs. limit=10.0 2024-08-15 14:43:19,968 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 30 from LS+wenet, 26 from Vox, 23 fro AS 2024-08-15 14:43:29,665 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 19 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-15 14:44:01,860 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-15 14:44:04,063 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 4150, loss[loss=0.09377, beats_loss=0.01173, ecapa_loss=0.0001465, whisper_loss=0.08058, over 17844.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0106, ecapa_loss=0.0001512, whisper_loss=0.09105, over 3917455.45 frames. ], batch size: 70, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:44:26,792 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 20 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-15 14:44:28,210 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.912e+01 2.317e+01 2.580e+01 2.886e+01 4.298e+01, threshold=5.160e+01, percent-clipped=0.0 2024-08-15 14:44:48,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3229920.0, ans=0.2 2024-08-15 14:44:59,699 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=7.325e+00 2024-08-15 14:45:17,813 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.10 vs. limit=15.0 2024-08-15 14:45:26,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3230120.0, ans=0.125 2024-08-15 14:45:33,799 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 4200, loss[loss=0.0936, beats_loss=0.01115, ecapa_loss=0.0001237, whisper_loss=0.08121, over 14262.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01058, ecapa_loss=0.0001505, whisper_loss=0.09125, over 3906090.62 frames. ], batch size: 55, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:45:37,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3230220.0, ans=0.2 2024-08-15 14:45:45,397 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.95 vs. limit=10.0 2024-08-15 14:45:59,944 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 26 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-15 14:46:04,960 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3230320.0, ans=0.125 2024-08-15 14:46:13,420 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3230420.0, ans=0.1 2024-08-15 14:46:19,480 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-15 14:46:28,231 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-15 14:47:02,374 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 4250, loss[loss=0.108, beats_loss=0.01019, ecapa_loss=0.0001598, whisper_loss=0.09622, over 22312.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01058, ecapa_loss=0.0001509, whisper_loss=0.0912, over 3898336.01 frames. ], batch size: 91, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:47:19,760 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3230820.0, ans=0.125 2024-08-15 14:47:20,519 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.308e+01 2.518e+01 2.859e+01 8.550e+01, threshold=5.036e+01, percent-clipped=1.0 2024-08-15 14:47:21,060 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3230820.0, ans=0.125 2024-08-15 14:47:30,406 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.63 vs. limit=22.5 2024-08-15 14:47:46,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3230920.0, ans=0.0 2024-08-15 14:47:58,012 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.04 vs. limit=6.0 2024-08-15 14:47:59,256 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 16 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-15 14:48:14,418 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 22 from LS+wenet, 10 from Vox, 21 fro AS 2024-08-15 14:48:14,912 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.49 vs. limit=15.0 2024-08-15 14:48:16,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3231120.0, ans=0.125 2024-08-15 14:48:16,151 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3231120.0, ans=0.125 2024-08-15 14:48:28,738 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3231120.0, ans=0.2 2024-08-15 14:48:33,572 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-15 14:48:34,511 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 4300, loss[loss=0.101, beats_loss=0.01104, ecapa_loss=0.0001729, whisper_loss=0.0882, over 21307.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01055, ecapa_loss=0.0001514, whisper_loss=0.09133, over 3861928.75 frames. ], batch size: 89, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:48:41,916 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 21 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-15 14:49:01,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3231320.0, ans=0.125 2024-08-15 14:49:02,215 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-15 14:49:02,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3231320.0, ans=0.0 2024-08-15 14:49:10,216 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3231420.0, ans=0.1 2024-08-15 14:49:13,588 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 18 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-15 14:49:17,845 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.43 vs. limit=15.0 2024-08-15 14:49:22,325 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.02 vs. limit=15.0 2024-08-15 14:49:29,703 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 25 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-15 14:49:30,039 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3231520.0, ans=0.125 2024-08-15 14:49:32,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3231520.0, ans=0.125 2024-08-15 14:49:49,292 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 12 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-15 14:49:57,842 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-15 14:49:59,338 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 4350, loss[loss=0.1154, beats_loss=0.01151, ecapa_loss=0.0001319, whisper_loss=0.1025, over 23084.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01056, ecapa_loss=0.0001508, whisper_loss=0.09106, over 3835223.96 frames. ], batch size: 89, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:50:16,728 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 16 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-15 14:50:17,891 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.740e+01 2.376e+01 2.619e+01 2.961e+01 5.969e+01, threshold=5.237e+01, percent-clipped=2.0 2024-08-15 14:50:18,491 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3231820.0, ans=0.1 2024-08-15 14:50:48,704 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3231920.0, ans=0.125 2024-08-15 14:50:54,960 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 16 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-15 14:51:14,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3232120.0, ans=0.2 2024-08-15 14:51:28,309 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 4400, loss[loss=0.09509, beats_loss=0.0131, ecapa_loss=0.0001493, whisper_loss=0.0805, over 20818.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01055, ecapa_loss=0.0001496, whisper_loss=0.09074, over 3855960.66 frames. ], batch size: 88, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:51:33,404 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.50 vs. limit=6.0 2024-08-15 14:51:35,761 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-15 14:51:54,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3232320.0, ans=0.125 2024-08-15 14:52:15,461 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3232420.0, ans=0.125 2024-08-15 14:52:27,304 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3232520.0, ans=0.0 2024-08-15 14:52:41,668 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.87 vs. limit=15.0 2024-08-15 14:52:51,301 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 4450, loss[loss=0.08815, beats_loss=0.00836, ecapa_loss=0.00018, whisper_loss=0.07799, over 18346.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01047, ecapa_loss=0.0001506, whisper_loss=0.09087, over 3866390.25 frames. ], batch size: 72, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:52:53,541 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.07 vs. limit=15.0 2024-08-15 14:53:05,052 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 23 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-15 14:53:08,830 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.602e+01 2.314e+01 2.575e+01 2.815e+01 3.995e+01, threshold=5.150e+01, percent-clipped=0.0 2024-08-15 14:53:09,152 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 12 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-15 14:53:32,568 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-15 14:53:47,741 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3233020.0, ans=0.0 2024-08-15 14:54:25,012 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 4500, loss[loss=0.1034, beats_loss=0.009553, ecapa_loss=0.0001262, whisper_loss=0.09254, over 17324.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01065, ecapa_loss=0.0001507, whisper_loss=0.0898, over 3870000.00 frames. ], batch size: 65, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:54:28,897 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.57 vs. limit=22.5 2024-08-15 14:54:50,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3233320.0, ans=0.1 2024-08-15 14:54:55,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3233320.0, ans=0.125 2024-08-15 14:55:15,457 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 29 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-15 14:55:22,364 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 35 from LS+wenet, 21 from Vox, 17 fro AS 2024-08-15 14:55:32,386 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 23 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-15 14:55:38,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3233620.0, ans=0.125 2024-08-15 14:55:42,685 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-15 14:55:51,201 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 4550, loss[loss=0.1002, beats_loss=0.01135, ecapa_loss=0.0001816, whisper_loss=0.08706, over 21149.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01051, ecapa_loss=0.0001521, whisper_loss=0.09118, over 3882331.26 frames. ], batch size: 89, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:55:55,649 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.97 vs. limit=10.0 2024-08-15 14:56:02,574 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 23 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-15 14:56:07,622 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.811e+01 2.406e+01 2.642e+01 2.962e+01 1.202e+02, threshold=5.284e+01, percent-clipped=1.0 2024-08-15 14:56:18,752 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3233820.0, ans=0.125 2024-08-15 14:56:20,494 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.46 vs. limit=15.0 2024-08-15 14:57:02,296 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3234120.0, ans=0.125 2024-08-15 14:57:03,771 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 17 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-15 14:57:11,505 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 21 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-15 14:57:18,815 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 4600, loss[loss=0.1125, beats_loss=0.008773, ecapa_loss=0.0001669, whisper_loss=0.1021, over 21107.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01055, ecapa_loss=0.0001514, whisper_loss=0.09156, over 3884419.06 frames. ], batch size: 81, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:57:22,397 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 28 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-15 14:57:33,445 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 17 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-15 14:57:35,347 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-15 14:57:50,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3234320.0, ans=0.125 2024-08-15 14:58:03,438 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 16 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-15 14:58:06,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3234520.0, ans=0.125 2024-08-15 14:58:13,349 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3234520.0, ans=0.125 2024-08-15 14:58:40,005 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 4650, loss[loss=0.1017, beats_loss=0.009686, ecapa_loss=0.0001357, whisper_loss=0.09066, over 18771.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01064, ecapa_loss=0.0001514, whisper_loss=0.09047, over 3873689.28 frames. ], batch size: 74, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:58:56,437 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.866e+01 2.314e+01 2.498e+01 2.884e+01 4.685e+01, threshold=4.995e+01, percent-clipped=0.0 2024-08-15 14:58:57,126 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 28 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-15 14:59:00,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3234820.0, ans=0.2 2024-08-15 14:59:02,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3234820.0, ans=0.0 2024-08-15 14:59:08,239 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3234820.0, ans=0.0 2024-08-15 14:59:08,708 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.67 vs. limit=22.5 2024-08-15 14:59:17,529 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3234920.0, ans=0.1 2024-08-15 14:59:19,650 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.21 vs. limit=15.0 2024-08-15 14:59:24,997 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-15 14:59:25,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3234920.0, ans=0.0 2024-08-15 14:59:31,225 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3235020.0, ans=0.125 2024-08-15 14:59:37,367 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3235020.0, ans=0.125 2024-08-15 14:59:39,350 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=12.19 vs. limit=12.0 2024-08-15 14:59:50,545 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3235120.0, ans=0.125 2024-08-15 15:00:08,344 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 4700, loss[loss=0.1214, beats_loss=0.009578, ecapa_loss=0.0001388, whisper_loss=0.1104, over 22732.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01059, ecapa_loss=0.0001526, whisper_loss=0.09108, over 3902089.27 frames. ], batch size: 88, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:00:35,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3235320.0, ans=0.0 2024-08-15 15:00:55,294 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 15:00:59,209 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3235520.0, ans=0.0 2024-08-15 15:01:04,522 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 14 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-15 15:01:13,615 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3235520.0, ans=0.1 2024-08-15 15:01:15,541 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.71 vs. limit=15.0 2024-08-15 15:01:20,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3235620.0, ans=0.0 2024-08-15 15:01:23,864 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 15 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-15 15:01:32,343 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3235720.0, ans=0.125 2024-08-15 15:01:33,489 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 4750, loss[loss=0.1184, beats_loss=0.008333, ecapa_loss=0.0001721, whisper_loss=0.1083, over 22389.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01064, ecapa_loss=0.0001516, whisper_loss=0.0908, over 3878475.20 frames. ], batch size: 89, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:01:33,639 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 20 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-15 15:01:36,540 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 22 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-15 15:01:39,334 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-15 15:01:41,270 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 23 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-15 15:01:42,498 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-15 15:01:49,035 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.772e+01 2.242e+01 2.450e+01 2.790e+01 3.790e+01, threshold=4.901e+01, percent-clipped=0.0 2024-08-15 15:01:52,236 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-15 15:01:55,888 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3235820.0, ans=0.0 2024-08-15 15:01:55,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3235820.0, ans=0.125 2024-08-15 15:02:17,009 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 35 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-15 15:02:31,658 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-15 15:02:47,901 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3236120.0, ans=0.2 2024-08-15 15:02:51,658 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 4800, loss[loss=0.08883, beats_loss=0.01173, ecapa_loss=0.0001398, whisper_loss=0.0757, over 13875.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01068, ecapa_loss=0.0001517, whisper_loss=0.09042, over 3899938.06 frames. ], batch size: 57, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:03:18,117 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 18 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-15 15:03:37,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3236520.0, ans=0.09899494936611666 2024-08-15 15:03:43,039 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3236520.0, ans=0.2 2024-08-15 15:03:58,302 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.862e+01 2024-08-15 15:04:09,487 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 4850, loss[loss=0.106, beats_loss=0.009743, ecapa_loss=0.0001399, whisper_loss=0.09482, over 20093.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01066, ecapa_loss=0.0001507, whisper_loss=0.09066, over 3926499.64 frames. ], batch size: 79, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:04:24,479 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.428e+01 2.638e+01 3.060e+01 4.898e+01, threshold=5.277e+01, percent-clipped=0.0 2024-08-15 15:04:41,442 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3236920.0, ans=0.125 2024-08-15 15:04:43,999 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-15 15:04:51,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3236920.0, ans=0.2 2024-08-15 15:04:56,555 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-15 15:05:09,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3237120.0, ans=0.0 2024-08-15 15:05:09,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3237120.0, ans=0.2 2024-08-15 15:05:18,054 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 12 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-15 15:05:19,258 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 22 from LS+wenet, 29 from Vox, 40 fro AS 2024-08-15 15:05:20,636 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 20 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-15 15:05:21,939 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 4900, loss[loss=0.111, beats_loss=0.009434, ecapa_loss=0.0001337, whisper_loss=0.1002, over 14986.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01066, ecapa_loss=0.0001506, whisper_loss=0.09098, over 3943001.07 frames. ], batch size: 56, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:05:32,430 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3237220.0, ans=0.04949747468305833 2024-08-15 15:05:33,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3237220.0, ans=0.1 2024-08-15 15:05:44,429 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 32 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-15 15:05:50,169 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 23 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-15 15:06:04,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3237520.0, ans=0.0 2024-08-15 15:06:17,725 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 20 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-15 15:06:22,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3237620.0, ans=0.0 2024-08-15 15:06:23,764 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 23 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-15 15:06:23,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3237620.0, ans=0.0 2024-08-15 15:06:27,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=3237620.0, ans=0.02 2024-08-15 15:06:29,573 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.33 vs. limit=15.0 2024-08-15 15:06:30,236 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 32 from Vox, 35 fro AS 2024-08-15 15:06:31,232 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 4950, loss[loss=0.09088, beats_loss=0.009962, ecapa_loss=0.0001871, whisper_loss=0.07905, over 21584.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01062, ecapa_loss=0.0001521, whisper_loss=0.09047, over 3929507.87 frames. ], batch size: 91, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:06:45,365 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.025e+01 2.352e+01 2.615e+01 2.945e+01 2.370e+02, threshold=5.229e+01, percent-clipped=2.0 2024-08-15 15:06:52,693 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 14 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-15 15:06:56,669 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 43 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-15 15:07:02,132 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 21 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-15 15:07:10,720 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3237920.0, ans=0.0 2024-08-15 15:07:25,967 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 16 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-15 15:07:28,556 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 22 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-15 15:07:28,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3238120.0, ans=0.125 2024-08-15 15:07:31,154 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-15 15:07:33,347 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.85 vs. limit=15.0 2024-08-15 15:07:40,722 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 5000, loss[loss=0.09151, beats_loss=0.01149, ecapa_loss=0.0001639, whisper_loss=0.07838, over 18609.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01059, ecapa_loss=0.0001522, whisper_loss=0.09057, over 3903036.92 frames. ], batch size: 79, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:07:42,277 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 28 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-15 15:07:43,018 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.54 vs. limit=10.0 2024-08-15 15:07:47,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=3238220.0, ans=15.0 2024-08-15 15:08:00,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3238320.0, ans=0.04949747468305833 2024-08-15 15:08:40,677 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 26 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-15 15:08:48,455 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 5050, loss[loss=0.09673, beats_loss=0.01107, ecapa_loss=0.0001564, whisper_loss=0.0841, over 15743.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01067, ecapa_loss=0.000152, whisper_loss=0.09056, over 3901364.17 frames. ], batch size: 67, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:08:59,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3238720.0, ans=0.125 2024-08-15 15:09:01,304 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3238820.0, ans=0.2 2024-08-15 15:09:02,031 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.810e+01 2.271e+01 2.496e+01 2.876e+01 1.159e+02, threshold=4.993e+01, percent-clipped=2.0 2024-08-15 15:09:02,193 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 28 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-15 15:09:03,406 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-15 15:09:07,899 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.41 vs. limit=15.0 2024-08-15 15:09:08,770 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3238820.0, ans=0.2 2024-08-15 15:09:55,797 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 5100, loss[loss=0.09364, beats_loss=0.01191, ecapa_loss=0.0001773, whisper_loss=0.07995, over 21206.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01073, ecapa_loss=0.0001506, whisper_loss=0.09047, over 3900281.61 frames. ], batch size: 90, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:09:55,987 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-15 15:10:13,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3239320.0, ans=0.125 2024-08-15 15:10:15,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3239320.0, ans=0.1 2024-08-15 15:10:24,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3239420.0, ans=0.2 2024-08-15 15:10:25,152 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.70 vs. limit=15.0 2024-08-15 15:10:26,964 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 24 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-15 15:10:29,494 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 24 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-15 15:10:33,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3239420.0, ans=0.0 2024-08-15 15:10:57,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3239620.0, ans=0.0 2024-08-15 15:10:57,091 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.952e+05 2024-08-15 15:11:03,286 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 5150, loss[loss=0.0885, beats_loss=0.01163, ecapa_loss=0.0001719, whisper_loss=0.07515, over 18699.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01069, ecapa_loss=0.0001501, whisper_loss=0.09108, over 3906964.52 frames. ], batch size: 79, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:11:16,071 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3239820.0, ans=0.125 2024-08-15 15:11:16,918 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.662e+01 2.367e+01 2.663e+01 3.034e+01 8.372e+01, threshold=5.326e+01, percent-clipped=1.0 2024-08-15 15:11:21,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3239820.0, ans=0.125 2024-08-15 15:11:22,880 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 21 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-15 15:11:33,717 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3239920.0, ans=0.025 2024-08-15 15:11:40,408 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-324000.pt 2024-08-15 15:12:15,175 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 5200, loss[loss=0.1219, beats_loss=0.00884, ecapa_loss=0.000105, whisper_loss=0.112, over 22765.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01065, ecapa_loss=0.0001498, whisper_loss=0.09097, over 3886603.44 frames. ], batch size: 81, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:12:16,868 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 21 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-15 15:12:25,983 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.42 vs. limit=15.0 2024-08-15 15:12:45,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3240420.0, ans=0.125 2024-08-15 15:13:00,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3240520.0, ans=0.125 2024-08-15 15:13:18,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3240620.0, ans=0.125 2024-08-15 15:13:22,129 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 22 from LS+wenet, 31 from Vox, 39 fro AS 2024-08-15 15:13:26,224 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 5250, loss[loss=0.1041, beats_loss=0.01023, ecapa_loss=0.0001541, whisper_loss=0.09237, over 21306.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0106, ecapa_loss=0.0001512, whisper_loss=0.09102, over 3908403.83 frames. ], batch size: 87, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:13:28,515 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.65 vs. limit=6.0 2024-08-15 15:13:30,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3240720.0, ans=0.0 2024-08-15 15:13:34,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3240720.0, ans=10.0 2024-08-15 15:13:34,358 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3240720.0, ans=0.2 2024-08-15 15:13:35,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3240720.0, ans=0.2 2024-08-15 15:13:36,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3240720.0, ans=0.125 2024-08-15 15:13:40,586 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.871e+01 2.275e+01 2.578e+01 2.785e+01 8.879e+01, threshold=5.156e+01, percent-clipped=2.0 2024-08-15 15:13:42,594 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3240820.0, ans=0.125 2024-08-15 15:13:45,313 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3240820.0, ans=0.1 2024-08-15 15:13:50,879 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3240820.0, ans=0.125 2024-08-15 15:13:52,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3240820.0, ans=0.2 2024-08-15 15:13:55,705 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 20 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-15 15:14:34,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3241120.0, ans=0.0 2024-08-15 15:14:37,455 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 5300, loss[loss=0.09712, beats_loss=0.0117, ecapa_loss=0.0001628, whisper_loss=0.08379, over 22778.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01056, ecapa_loss=0.0001504, whisper_loss=0.09115, over 3898764.68 frames. ], batch size: 95, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:14:37,812 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3241220.0, ans=0.125 2024-08-15 15:14:40,515 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 12 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-15 15:14:44,863 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 15:14:46,247 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3241220.0, ans=0.2 2024-08-15 15:14:59,002 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 32 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-15 15:14:59,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3241320.0, ans=0.0 2024-08-15 15:15:10,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3241420.0, ans=0.0 2024-08-15 15:15:17,293 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-15 15:15:23,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3241520.0, ans=0.5 2024-08-15 15:15:24,642 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.92 vs. limit=15.0 2024-08-15 15:15:47,633 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 5350, loss[loss=0.1116, beats_loss=0.009583, ecapa_loss=0.0001023, whisper_loss=0.101, over 19287.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01058, ecapa_loss=0.0001483, whisper_loss=0.09072, over 3852109.32 frames. ], batch size: 70, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:15:47,915 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-15 15:15:50,622 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 15:16:01,467 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.335e+01 2.708e+01 3.077e+01 2.135e+02, threshold=5.416e+01, percent-clipped=3.0 2024-08-15 15:16:06,096 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 31 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-15 15:16:08,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3241820.0, ans=0.125 2024-08-15 15:16:10,449 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3241820.0, ans=0.07 2024-08-15 15:16:12,568 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 18 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-15 15:16:17,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3241920.0, ans=0.125 2024-08-15 15:16:22,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3241920.0, ans=0.2 2024-08-15 15:16:23,870 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3241920.0, ans=0.0 2024-08-15 15:16:40,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3242020.0, ans=0.2 2024-08-15 15:16:40,129 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3242020.0, ans=0.07 2024-08-15 15:16:40,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3242020.0, ans=0.2 2024-08-15 15:16:43,122 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 26 from LS+wenet, 33 from Vox, 35 fro AS 2024-08-15 15:16:48,891 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.59 vs. limit=15.0 2024-08-15 15:16:56,977 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 5400, loss[loss=0.1027, beats_loss=0.01335, ecapa_loss=0.0001527, whisper_loss=0.08782, over 19483.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01056, ecapa_loss=0.0001488, whisper_loss=0.09052, over 3883896.43 frames. ], batch size: 81, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:17:03,578 WARNING [optim.py:496] (0/4) Scaling gradients by 0.08951901644468307, model_norm_threshold=54.15595626831055 2024-08-15 15:17:03,772 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.765e+04, grad_sumsq=4.730e+06, orig_rms_sq=1.007e-02 2024-08-15 15:17:16,543 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3242320.0, ans=0.035 2024-08-15 15:17:42,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3242520.0, ans=0.1 2024-08-15 15:17:54,740 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 21 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-15 15:17:55,318 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.74 vs. limit=15.0 2024-08-15 15:18:01,447 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3242620.0, ans=0.0 2024-08-15 15:18:06,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3242720.0, ans=0.125 2024-08-15 15:18:07,662 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 5450, loss[loss=0.09448, beats_loss=0.01053, ecapa_loss=0.0001139, whisper_loss=0.08281, over 15778.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01066, ecapa_loss=0.0001492, whisper_loss=0.09079, over 3869578.96 frames. ], batch size: 60, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:18:08,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3242720.0, ans=0.125 2024-08-15 15:18:15,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3242720.0, ans=0.0 2024-08-15 15:18:21,755 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.07 vs. limit=15.0 2024-08-15 15:18:22,593 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.791e+01 2.261e+01 2.541e+01 2.887e+01 6.050e+02, threshold=5.082e+01, percent-clipped=2.0 2024-08-15 15:18:34,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3242820.0, ans=0.125 2024-08-15 15:18:35,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3242820.0, ans=0.125 2024-08-15 15:18:47,023 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 19 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-15 15:18:52,013 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.67 vs. limit=15.0 2024-08-15 15:18:59,349 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3243020.0, ans=0.125 2024-08-15 15:19:21,954 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 22 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-15 15:19:24,618 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 27 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-15 15:19:26,004 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 5500, loss[loss=0.1003, beats_loss=0.0108, ecapa_loss=0.0001624, whisper_loss=0.08787, over 20838.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01066, ecapa_loss=0.0001498, whisper_loss=0.09051, over 3874541.15 frames. ], batch size: 87, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:19:52,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3243320.0, ans=0.0 2024-08-15 15:19:53,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3243320.0, ans=0.07 2024-08-15 15:19:59,986 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3243420.0, ans=10.0 2024-08-15 15:20:23,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3243520.0, ans=0.0 2024-08-15 15:20:49,380 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 5550, loss[loss=0.1061, beats_loss=0.01095, ecapa_loss=0.0001498, whisper_loss=0.09369, over 22458.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01076, ecapa_loss=0.0001486, whisper_loss=0.08951, over 3894927.22 frames. ], batch size: 92, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:21:06,159 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.298e+01 2.585e+01 2.775e+01 4.176e+01, threshold=5.170e+01, percent-clipped=0.0 2024-08-15 15:21:06,922 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.23 vs. limit=15.0 2024-08-15 15:21:13,442 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=8.429e-02 2024-08-15 15:21:13,461 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3243820.0, ans=0.125 2024-08-15 15:21:23,785 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 17 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-15 15:21:26,609 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 23 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-15 15:21:34,302 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-15 15:21:51,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3244020.0, ans=0.125 2024-08-15 15:22:04,984 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3244120.0, ans=0.125 2024-08-15 15:22:10,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3244120.0, ans=0.125 2024-08-15 15:22:13,559 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 5600, loss[loss=0.1157, beats_loss=0.009133, ecapa_loss=0.0001683, whisper_loss=0.1049, over 16274.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01069, ecapa_loss=0.0001482, whisper_loss=0.08992, over 3880804.34 frames. ], batch size: 64, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:22:32,927 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3244320.0, ans=0.125 2024-08-15 15:22:41,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3244320.0, ans=0.125 2024-08-15 15:23:24,723 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3244620.0, ans=0.04949747468305833 2024-08-15 15:23:30,837 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-15 15:23:35,730 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 5650, loss[loss=0.1036, beats_loss=0.011, ecapa_loss=0.00014, whisper_loss=0.09123, over 22726.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01068, ecapa_loss=0.0001484, whisper_loss=0.08987, over 3885685.24 frames. ], batch size: 94, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:23:36,178 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3244720.0, ans=0.125 2024-08-15 15:23:55,553 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.879e+01 2.317e+01 2.489e+01 2.780e+01 3.847e+01, threshold=4.978e+01, percent-clipped=0.0 2024-08-15 15:24:18,381 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.66 vs. limit=12.0 2024-08-15 15:24:29,871 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.50 vs. limit=12.0 2024-08-15 15:24:31,454 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 19 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-15 15:24:33,503 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.60 vs. limit=15.0 2024-08-15 15:24:35,937 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3245020.0, ans=0.125 2024-08-15 15:24:55,717 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-15 15:24:59,089 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3245120.0, ans=0.0 2024-08-15 15:25:01,895 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 5700, loss[loss=0.1123, beats_loss=0.009757, ecapa_loss=0.0001448, whisper_loss=0.1011, over 22187.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0106, ecapa_loss=0.0001505, whisper_loss=0.09014, over 3914592.25 frames. ], batch size: 88, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:25:04,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3245220.0, ans=0.0 2024-08-15 15:25:27,005 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-15 15:25:34,689 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.47 vs. limit=10.0 2024-08-15 15:25:38,405 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.04 vs. limit=15.0 2024-08-15 15:25:46,114 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3245420.0, ans=0.125 2024-08-15 15:25:54,164 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.33 vs. limit=10.0 2024-08-15 15:26:10,490 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 23 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-15 15:26:12,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3245620.0, ans=0.1 2024-08-15 15:26:16,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3245620.0, ans=0.125 2024-08-15 15:26:28,888 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 5750, loss[loss=0.1158, beats_loss=0.008979, ecapa_loss=0.0001555, whisper_loss=0.1052, over 24022.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01067, ecapa_loss=0.0001496, whisper_loss=0.08997, over 3945476.32 frames. ], batch size: 95, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:26:38,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3245720.0, ans=0.2 2024-08-15 15:26:45,722 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.60 vs. limit=22.5 2024-08-15 15:26:46,368 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.388e+01 2.608e+01 2.971e+01 1.987e+02, threshold=5.216e+01, percent-clipped=2.0 2024-08-15 15:26:48,080 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 12 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-15 15:26:50,495 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3245820.0, ans=0.2 2024-08-15 15:27:03,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3245920.0, ans=0.125 2024-08-15 15:27:17,856 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 16 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-15 15:27:24,200 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3246020.0, ans=0.125 2024-08-15 15:27:33,154 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 15:27:37,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3246120.0, ans=0.125 2024-08-15 15:27:44,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3246120.0, ans=0.125 2024-08-15 15:27:51,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3246120.0, ans=0.125 2024-08-15 15:27:53,852 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 5800, loss[loss=0.09272, beats_loss=0.01201, ecapa_loss=0.0001349, whisper_loss=0.07936, over 22322.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01062, ecapa_loss=0.0001496, whisper_loss=0.09006, over 3905268.64 frames. ], batch size: 89, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:27:54,091 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-15 15:28:13,768 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=3246320.0, ans=0.02 2024-08-15 15:28:32,814 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.57 vs. limit=10.0 2024-08-15 15:28:44,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3246520.0, ans=0.125 2024-08-15 15:28:53,883 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-15 15:28:57,872 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 19 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-15 15:28:58,850 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=10.16 vs. limit=12.0 2024-08-15 15:29:04,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3246620.0, ans=0.125 2024-08-15 15:29:11,627 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 5850, loss[loss=0.1085, beats_loss=0.009238, ecapa_loss=0.0001643, whisper_loss=0.09764, over 22697.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01063, ecapa_loss=0.0001498, whisper_loss=0.08965, over 3899690.81 frames. ], batch size: 91, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:29:18,038 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 26 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-15 15:29:18,399 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-15 15:29:26,434 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.819e+01 2.269e+01 2.533e+01 2.888e+01 4.930e+01, threshold=5.067e+01, percent-clipped=0.0 2024-08-15 15:29:58,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3247020.0, ans=0.2 2024-08-15 15:29:58,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3247020.0, ans=0.5 2024-08-15 15:30:12,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3247120.0, ans=0.125 2024-08-15 15:30:20,087 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 27 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-15 15:30:24,741 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 26 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-15 15:30:25,926 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 5900, loss[loss=0.1036, beats_loss=0.009249, ecapa_loss=0.000158, whisper_loss=0.09278, over 21098.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01065, ecapa_loss=0.0001494, whisper_loss=0.08934, over 3909895.47 frames. ], batch size: 83, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:30:45,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3247320.0, ans=0.125 2024-08-15 15:31:11,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3247420.0, ans=0.0 2024-08-15 15:31:15,794 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 20 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-15 15:31:18,105 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.06 vs. limit=22.5 2024-08-15 15:31:34,830 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3247620.0, ans=0.125 2024-08-15 15:31:43,870 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 5950, loss[loss=0.1016, beats_loss=0.00993, ecapa_loss=0.0001981, whisper_loss=0.0897, over 21838.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01063, ecapa_loss=0.0001507, whisper_loss=0.08968, over 3910262.03 frames. ], batch size: 89, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:31:51,477 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 20 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-15 15:31:54,790 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-15 15:31:57,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3247820.0, ans=0.125 2024-08-15 15:31:58,718 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.298e+01 2.552e+01 2.863e+01 3.856e+01, threshold=5.104e+01, percent-clipped=0.0 2024-08-15 15:32:03,323 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 20 from LS+wenet, 17 from Vox, 56 fro AS 2024-08-15 15:32:18,348 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2024-08-15 15:32:21,677 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 22 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-15 15:32:36,006 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.53 vs. limit=15.0 2024-08-15 15:32:54,476 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 6000, loss[loss=0.1031, beats_loss=0.01058, ecapa_loss=0.000145, whisper_loss=0.09103, over 22027.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01062, ecapa_loss=0.0001502, whisper_loss=0.08971, over 3891484.87 frames. ], batch size: 90, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:32:54,477 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-15 15:33:33,400 INFO [train_multi_KD3.py:1149] (0/4) Epoch 23, validation on ASR_libri: loss=0.2517, beats_loss=0, ecapa_loss=0.0005302, whisper_loss=0.2464, over 922467.00 frames. 2024-08-15 15:33:54,232 INFO [train_multi_KD3.py:1149] (0/4) Epoch 23, validation on SV_voxceleb1: loss=0.004186, beats_loss=0, ecapa_loss=0.0004186, whisper_loss=0, over 939242.00 frames. 2024-08-15 15:35:52,415 INFO [train_multi_KD3.py:1149] (0/4) Epoch 23, validation on AT_audioset: loss=0.02334, beats_loss=0.02334, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 15:35:52,424 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32601MB 2024-08-15 15:35:55,358 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-15 15:36:12,903 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3248320.0, ans=0.125 2024-08-15 15:36:14,081 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3248320.0, ans=0.1 2024-08-15 15:36:21,123 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.01 vs. limit=15.0 2024-08-15 15:36:31,409 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3248420.0, ans=0.0 2024-08-15 15:36:32,462 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 24 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-15 15:36:39,268 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 17 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-15 15:36:50,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3248620.0, ans=0.0 2024-08-15 15:37:02,509 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 6050, loss[loss=0.1092, beats_loss=0.01037, ecapa_loss=0.0001697, whisper_loss=0.09715, over 20946.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01065, ecapa_loss=0.0001497, whisper_loss=0.09, over 3900559.24 frames. ], batch size: 87, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:37:04,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3248720.0, ans=0.125 2024-08-15 15:37:13,823 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3248720.0, ans=0.05 2024-08-15 15:37:16,248 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.938e+01 2.402e+01 2.591e+01 2.977e+01 8.754e+01, threshold=5.182e+01, percent-clipped=1.0 2024-08-15 15:37:32,597 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 28 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-15 15:37:40,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=3248920.0, ans=0.05 2024-08-15 15:38:04,714 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 34 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-15 15:38:12,795 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 6100, loss[loss=0.09132, beats_loss=0.01239, ecapa_loss=0.0001373, whisper_loss=0.07756, over 18027.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01067, ecapa_loss=0.0001498, whisper_loss=0.09026, over 3902044.70 frames. ], batch size: 72, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:38:13,601 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.04 vs. limit=15.0 2024-08-15 15:38:20,149 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3249220.0, ans=0.07 2024-08-15 15:38:25,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3249220.0, ans=0.025 2024-08-15 15:38:28,914 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-15 15:38:49,013 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.52 vs. limit=22.5 2024-08-15 15:38:52,340 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 16 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-15 15:39:22,916 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 6150, loss[loss=0.1074, beats_loss=0.008353, ecapa_loss=0.0001598, whisper_loss=0.09744, over 16110.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01063, ecapa_loss=0.0001505, whisper_loss=0.0906, over 3891825.35 frames. ], batch size: 64, lr: 2.71e-03, grad_scale: 1.152921504606847e+18 2024-08-15 15:39:33,458 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.75 vs. limit=10.0 2024-08-15 15:39:37,098 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.845e+01 2.219e+01 2.437e+01 2.676e+01 4.381e+01, threshold=4.874e+01, percent-clipped=0.0 2024-08-15 15:39:48,904 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 20 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-15 15:39:51,717 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 42 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-15 15:40:05,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3250020.0, ans=0.0 2024-08-15 15:40:21,684 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 15 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-15 15:40:23,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3250120.0, ans=0.125 2024-08-15 15:40:34,290 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 6200, loss[loss=0.09673, beats_loss=0.007865, ecapa_loss=0.0001868, whisper_loss=0.087, over 17419.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01059, ecapa_loss=0.0001511, whisper_loss=0.09023, over 3882731.50 frames. ], batch size: 70, lr: 2.71e-03, grad_scale: 1.152921504606847e+18 2024-08-15 15:40:51,333 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.97 vs. limit=15.0 2024-08-15 15:40:58,598 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.17 vs. limit=15.0 2024-08-15 15:41:09,920 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-15 15:41:14,631 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.133e-02 2024-08-15 15:41:40,944 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=14.51 vs. limit=15.0 2024-08-15 15:41:46,465 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 6250, loss[loss=0.08912, beats_loss=0.008229, ecapa_loss=0.0001659, whisper_loss=0.07923, over 17063.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01057, ecapa_loss=0.0001508, whisper_loss=0.09074, over 3883931.15 frames. ], batch size: 68, lr: 2.71e-03, grad_scale: 1.152921504606847e+18 2024-08-15 15:41:48,207 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 19 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-15 15:41:59,312 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-15 15:42:00,776 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 35 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-15 15:42:01,935 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.964e+01 2.383e+01 2.624e+01 2.920e+01 1.622e+02, threshold=5.248e+01, percent-clipped=1.0 2024-08-15 15:42:02,424 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 15:42:03,541 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 23 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-15 15:42:04,885 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 25 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-15 15:42:05,151 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3250820.0, ans=0.5 2024-08-15 15:42:07,810 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 17 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-15 15:42:15,012 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-15 15:42:27,918 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.37 vs. limit=10.0 2024-08-15 15:42:35,928 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 34 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-15 15:42:56,426 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 6300, loss[loss=0.1055, beats_loss=0.01322, ecapa_loss=0.0001424, whisper_loss=0.09082, over 23630.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01058, ecapa_loss=0.0001502, whisper_loss=0.09148, over 3929866.13 frames. ], batch size: 95, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:43:05,178 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 29 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-15 15:43:05,571 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3251220.0, ans=0.125 2024-08-15 15:43:22,387 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.75 vs. limit=15.0 2024-08-15 15:43:30,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3251420.0, ans=0.09899494936611666 2024-08-15 15:43:31,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3251420.0, ans=0.1 2024-08-15 15:43:41,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3251520.0, ans=0.1 2024-08-15 15:43:48,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=3251520.0, ans=0.2 2024-08-15 15:43:51,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3251620.0, ans=0.125 2024-08-15 15:43:54,910 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3251620.0, ans=0.2 2024-08-15 15:43:59,895 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3251620.0, ans=0.125 2024-08-15 15:44:06,884 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 6350, loss[loss=0.1046, beats_loss=0.007814, ecapa_loss=0.0001527, whisper_loss=0.09529, over 18255.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01052, ecapa_loss=0.0001511, whisper_loss=0.09141, over 3924840.26 frames. ], batch size: 70, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:44:14,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3251720.0, ans=0.0 2024-08-15 15:44:22,023 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.294e+01 2.523e+01 2.815e+01 3.585e+01, threshold=5.047e+01, percent-clipped=0.0 2024-08-15 15:44:34,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3251920.0, ans=0.2 2024-08-15 15:44:46,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3251920.0, ans=0.125 2024-08-15 15:45:17,850 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 6400, loss[loss=0.1068, beats_loss=0.009916, ecapa_loss=0.000145, whisper_loss=0.09546, over 14021.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01057, ecapa_loss=0.0001511, whisper_loss=0.09115, over 3910513.04 frames. ], batch size: 54, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:45:19,336 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 26 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-15 15:45:19,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3252220.0, ans=0.125 2024-08-15 15:45:20,747 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-15 15:45:31,808 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-15 15:45:33,233 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3252320.0, ans=0.1 2024-08-15 15:45:35,939 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 15 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-15 15:45:51,769 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3252420.0, ans=0.2 2024-08-15 15:46:03,076 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-15 15:46:05,214 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.68 vs. limit=15.0 2024-08-15 15:46:10,514 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.36 vs. limit=6.0 2024-08-15 15:46:11,263 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-15 15:46:27,821 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 6450, loss[loss=0.09366, beats_loss=0.01203, ecapa_loss=0.0001219, whisper_loss=0.08041, over 23678.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01055, ecapa_loss=0.0001514, whisper_loss=0.09146, over 3938303.23 frames. ], batch size: 95, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:46:33,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3252720.0, ans=0.1 2024-08-15 15:46:42,912 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.984e+01 2.348e+01 2.696e+01 2.907e+01 4.718e+01, threshold=5.393e+01, percent-clipped=0.0 2024-08-15 15:46:46,176 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 15:46:53,454 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-15 15:46:56,461 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3252920.0, ans=10.0 2024-08-15 15:46:57,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3252920.0, ans=0.125 2024-08-15 15:46:57,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3252920.0, ans=0.1 2024-08-15 15:46:59,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3252920.0, ans=0.125 2024-08-15 15:47:01,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3252920.0, ans=0.0 2024-08-15 15:47:17,579 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.39 vs. limit=15.0 2024-08-15 15:47:19,861 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 30 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-15 15:47:29,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3253120.0, ans=0.2 2024-08-15 15:47:31,257 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3253120.0, ans=0.125 2024-08-15 15:47:41,876 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 6500, loss[loss=0.1192, beats_loss=0.01008, ecapa_loss=0.0001474, whisper_loss=0.1076, over 18529.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01052, ecapa_loss=0.0001501, whisper_loss=0.09196, over 3926371.27 frames. ], batch size: 74, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:47:49,281 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3253220.0, ans=0.0 2024-08-15 15:48:08,949 INFO [train_multi_KD3.py:844] (0/4) A total of 96 cuts. 28 from LS+wenet, 32 from Vox, 36 fro AS 2024-08-15 15:48:21,145 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-15 15:48:27,785 WARNING [optim.py:496] (0/4) Scaling gradients by 0.07086287438869476, model_norm_threshold=53.929649353027344 2024-08-15 15:48:27,983 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.15, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.590e+04, grad_sumsq=8.502e+06, orig_rms_sq=1.010e-02 2024-08-15 15:48:34,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3253520.0, ans=0.1 2024-08-15 15:48:35,570 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 39 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-15 15:48:45,786 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.46 vs. limit=15.0 2024-08-15 15:48:56,038 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 6550, loss[loss=0.08854, beats_loss=0.01002, ecapa_loss=0.0001833, whisper_loss=0.07669, over 20804.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01051, ecapa_loss=0.0001501, whisper_loss=0.09182, over 3958066.09 frames. ], batch size: 89, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:48:57,667 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 13 from LS+wenet, 10 from Vox, 33 fro AS 2024-08-15 15:49:03,458 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3253720.0, ans=0.125 2024-08-15 15:49:03,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3253720.0, ans=0.125 2024-08-15 15:49:11,804 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.956e+01 2.371e+01 2.638e+01 2.935e+01 7.610e+02, threshold=5.275e+01, percent-clipped=2.0 2024-08-15 15:49:20,718 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-15 15:49:23,794 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3253920.0, ans=0.125 2024-08-15 15:49:35,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3253920.0, ans=0.1 2024-08-15 15:49:35,762 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3253920.0, ans=0.1 2024-08-15 15:49:39,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3254020.0, ans=0.0 2024-08-15 15:49:39,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3254020.0, ans=0.125 2024-08-15 15:49:45,854 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.79 vs. limit=10.0 2024-08-15 15:49:57,312 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3254120.0, ans=0.0 2024-08-15 15:49:58,955 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.79 vs. limit=15.0 2024-08-15 15:50:07,739 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 6600, loss[loss=0.127, beats_loss=0.008667, ecapa_loss=0.0001516, whisper_loss=0.1169, over 22308.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01049, ecapa_loss=0.0001506, whisper_loss=0.09212, over 3962019.25 frames. ], batch size: 86, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:50:47,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3254420.0, ans=0.0 2024-08-15 15:50:50,808 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-15 15:50:51,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3254520.0, ans=0.125 2024-08-15 15:51:19,733 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 6650, loss[loss=0.1083, beats_loss=0.009525, ecapa_loss=0.0001744, whisper_loss=0.09702, over 20144.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01052, ecapa_loss=0.0001516, whisper_loss=0.09108, over 3924809.49 frames. ], batch size: 82, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:51:20,009 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 29 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-15 15:51:21,592 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3254720.0, ans=0.0 2024-08-15 15:51:35,047 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.912e+01 2.370e+01 2.592e+01 2.847e+01 4.238e+01, threshold=5.184e+01, percent-clipped=0.0 2024-08-15 15:51:48,400 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-15 15:51:49,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3254920.0, ans=0.1 2024-08-15 15:52:01,393 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 25 from LS+wenet, 10 from Vox, 27 fro AS 2024-08-15 15:52:07,473 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 25 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-15 15:52:26,880 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 28 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-15 15:52:30,171 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-15 15:52:33,189 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 6700, loss[loss=0.1176, beats_loss=0.008824, ecapa_loss=0.0001969, whisper_loss=0.1068, over 19937.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01059, ecapa_loss=0.0001508, whisper_loss=0.09074, over 3915160.38 frames. ], batch size: 84, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:52:51,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=3255320.0, ans=10.0 2024-08-15 15:53:01,149 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3255320.0, ans=0.125 2024-08-15 15:53:11,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3255420.0, ans=0.125 2024-08-15 15:53:30,647 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 23 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-15 15:53:36,417 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 23 from LS+wenet, 27 from Vox, 25 fro AS 2024-08-15 15:53:38,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3255620.0, ans=0.2 2024-08-15 15:53:45,405 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 6750, loss[loss=0.1035, beats_loss=0.01181, ecapa_loss=0.0001257, whisper_loss=0.09039, over 19417.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01051, ecapa_loss=0.0001513, whisper_loss=0.09074, over 3898132.44 frames. ], batch size: 76, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:54:01,209 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.760e+01 2.293e+01 2.545e+01 2.878e+01 4.170e+01, threshold=5.090e+01, percent-clipped=0.0 2024-08-15 15:54:05,147 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.64 vs. limit=15.0 2024-08-15 15:54:14,793 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3255920.0, ans=0.125 2024-08-15 15:54:17,173 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 28 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-15 15:54:37,792 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.03 vs. limit=15.0 2024-08-15 15:54:41,364 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 17 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-15 15:54:42,172 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.44 vs. limit=6.0 2024-08-15 15:54:45,438 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 23 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-15 15:54:47,051 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.48 vs. limit=15.0 2024-08-15 15:54:53,511 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 35 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-15 15:54:56,437 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 6800, loss[loss=0.1161, beats_loss=0.007677, ecapa_loss=0.0001851, whisper_loss=0.1066, over 15550.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01043, ecapa_loss=0.000153, whisper_loss=0.09034, over 3863521.35 frames. ], batch size: 61, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:55:11,199 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.16 vs. limit=6.0 2024-08-15 15:55:32,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3256420.0, ans=0.125 2024-08-15 15:55:40,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3256520.0, ans=0.2 2024-08-15 15:55:43,594 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3256520.0, ans=0.1 2024-08-15 15:55:46,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3256520.0, ans=0.0 2024-08-15 15:55:54,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3256620.0, ans=0.1 2024-08-15 15:56:02,423 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 29 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-15 15:56:02,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3256620.0, ans=0.125 2024-08-15 15:56:06,398 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 6850, loss[loss=0.1066, beats_loss=0.008983, ecapa_loss=0.0001365, whisper_loss=0.09629, over 17791.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01049, ecapa_loss=0.0001524, whisper_loss=0.08987, over 3831105.95 frames. ], batch size: 68, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:56:17,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3256720.0, ans=0.1 2024-08-15 15:56:22,625 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.749e+01 2.268e+01 2.467e+01 2.871e+01 7.953e+01, threshold=4.935e+01, percent-clipped=1.0 2024-08-15 15:56:28,853 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 18 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-15 15:56:33,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3256820.0, ans=0.125 2024-08-15 15:56:33,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3256820.0, ans=0.2 2024-08-15 15:56:37,401 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.05 vs. limit=8.0 2024-08-15 15:56:43,955 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.02 vs. limit=6.0 2024-08-15 15:56:44,716 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 23 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-15 15:57:00,581 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3257020.0, ans=0.0 2024-08-15 15:57:04,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3257120.0, ans=0.0 2024-08-15 15:57:07,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3257120.0, ans=0.125 2024-08-15 15:57:13,284 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-15 15:57:20,495 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 6900, loss[loss=0.1132, beats_loss=0.01057, ecapa_loss=0.0001405, whisper_loss=0.1012, over 24418.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01054, ecapa_loss=0.0001518, whisper_loss=0.08977, over 3849736.87 frames. ], batch size: 95, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:57:22,588 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3257220.0, ans=0.125 2024-08-15 15:57:26,280 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.79 vs. limit=15.0 2024-08-15 15:58:00,992 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.98 vs. limit=15.0 2024-08-15 15:58:22,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3257620.0, ans=0.125 2024-08-15 15:58:25,221 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.58 vs. limit=15.0 2024-08-15 15:58:34,006 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 6950, loss[loss=0.09297, beats_loss=0.01137, ecapa_loss=0.000118, whisper_loss=0.08042, over 23295.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01053, ecapa_loss=0.0001506, whisper_loss=0.09002, over 3884837.58 frames. ], batch size: 92, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:58:48,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3257820.0, ans=0.125 2024-08-15 15:58:49,572 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.881e+01 2.345e+01 2.623e+01 2.937e+01 1.105e+02, threshold=5.245e+01, percent-clipped=3.0 2024-08-15 15:59:07,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3257920.0, ans=0.125 2024-08-15 15:59:08,420 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 17 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-15 15:59:17,213 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3258020.0, ans=0.1 2024-08-15 15:59:31,315 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 22 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-15 15:59:33,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3258120.0, ans=0.0 2024-08-15 15:59:33,260 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.35 vs. limit=15.0 2024-08-15 15:59:44,407 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 7000, loss[loss=0.1097, beats_loss=0.01195, ecapa_loss=0.00012, whisper_loss=0.0966, over 21755.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01063, ecapa_loss=0.0001489, whisper_loss=0.08993, over 3861740.92 frames. ], batch size: 87, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:00:08,761 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.92 vs. limit=15.0 2024-08-15 16:00:14,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3258420.0, ans=0.2 2024-08-15 16:00:18,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3258420.0, ans=0.0 2024-08-15 16:00:22,253 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.52 vs. limit=15.0 2024-08-15 16:00:24,688 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3258520.0, ans=0.0 2024-08-15 16:00:25,602 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 23 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-15 16:00:26,192 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=3258520.0, ans=15.0 2024-08-15 16:00:53,700 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 7050, loss[loss=0.1094, beats_loss=0.008738, ecapa_loss=0.0001621, whisper_loss=0.099, over 19682.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01067, ecapa_loss=0.0001483, whisper_loss=0.09019, over 3899218.73 frames. ], batch size: 78, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:00:59,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3258720.0, ans=0.0 2024-08-15 16:01:02,167 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-15 16:01:05,254 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3258720.0, ans=0.125 2024-08-15 16:01:08,931 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.667e+01 2.307e+01 2.519e+01 2.895e+01 2.053e+02, threshold=5.037e+01, percent-clipped=1.0 2024-08-15 16:01:25,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3258920.0, ans=0.0 2024-08-15 16:01:33,422 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.89 vs. limit=15.0 2024-08-15 16:01:41,487 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 17 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-15 16:01:48,530 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 22 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-15 16:01:49,837 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-15 16:01:54,606 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.525e+00 2024-08-15 16:02:03,537 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3259220.0, ans=0.125 2024-08-15 16:02:04,209 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 7100, loss[loss=0.1069, beats_loss=0.009056, ecapa_loss=0.0001237, whisper_loss=0.09657, over 19682.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01065, ecapa_loss=0.000148, whisper_loss=0.08989, over 3903306.03 frames. ], batch size: 71, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:02:08,903 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 27 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-15 16:02:10,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3259220.0, ans=0.1 2024-08-15 16:02:14,981 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3259220.0, ans=0.025 2024-08-15 16:02:19,703 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3259320.0, ans=0.0 2024-08-15 16:02:26,148 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 24 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-15 16:02:37,757 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3259420.0, ans=0.125 2024-08-15 16:02:43,073 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 20 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-15 16:02:52,952 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3259520.0, ans=0.0 2024-08-15 16:03:02,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3259620.0, ans=0.0 2024-08-15 16:03:11,492 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3259620.0, ans=0.0 2024-08-15 16:03:13,654 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.84 vs. limit=10.0 2024-08-15 16:03:15,606 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 7150, loss[loss=0.122, beats_loss=0.00866, ecapa_loss=0.0001668, whisper_loss=0.1116, over 21670.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01066, ecapa_loss=0.000148, whisper_loss=0.08991, over 3926858.34 frames. ], batch size: 86, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:03:31,250 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.305e+01 2.549e+01 2.852e+01 2.933e+02, threshold=5.099e+01, percent-clipped=1.0 2024-08-15 16:03:52,452 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 18 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-15 16:03:55,485 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 23 from LS+wenet, 11 from Vox, 35 fro AS 2024-08-15 16:04:02,842 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-15 16:04:06,071 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3260020.0, ans=0.125 2024-08-15 16:04:22,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3260120.0, ans=0.125 2024-08-15 16:04:26,730 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 7200, loss[loss=0.07155, beats_loss=0.01393, ecapa_loss=0.0001679, whisper_loss=0.05594, over 18480.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01062, ecapa_loss=0.0001486, whisper_loss=0.08995, over 3913451.32 frames. ], batch size: 79, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:05:03,575 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3260420.0, ans=0.125 2024-08-15 16:05:04,541 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 18 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-15 16:05:23,113 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 27 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-15 16:05:33,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3260620.0, ans=0.125 2024-08-15 16:05:37,227 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 7250, loss[loss=0.1246, beats_loss=0.009795, ecapa_loss=0.0001163, whisper_loss=0.1137, over 15239.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01053, ecapa_loss=0.0001491, whisper_loss=0.09067, over 3907971.69 frames. ], batch size: 57, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:05:41,846 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 16:05:45,682 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 19 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-15 16:05:46,243 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.80 vs. limit=22.5 2024-08-15 16:05:52,180 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.362e+01 2.587e+01 2.816e+01 1.917e+02, threshold=5.173e+01, percent-clipped=1.0 2024-08-15 16:06:00,521 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 21 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-15 16:06:02,403 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.61 vs. limit=15.0 2024-08-15 16:06:11,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3260920.0, ans=0.125 2024-08-15 16:06:13,973 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3260920.0, ans=0.0 2024-08-15 16:06:32,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3261120.0, ans=0.2 2024-08-15 16:06:46,844 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 7300, loss[loss=0.1019, beats_loss=0.009469, ecapa_loss=0.0001702, whisper_loss=0.0907, over 14737.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01047, ecapa_loss=0.0001493, whisper_loss=0.09125, over 3912463.97 frames. ], batch size: 59, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:07:07,761 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 20 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-15 16:07:16,912 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.93 vs. limit=15.0 2024-08-15 16:07:17,734 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 40 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-15 16:07:33,898 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 28 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-15 16:07:57,252 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 7350, loss[loss=0.1143, beats_loss=0.007681, ecapa_loss=0.0001292, whisper_loss=0.1054, over 19199.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01049, ecapa_loss=0.0001499, whisper_loss=0.09114, over 3905614.72 frames. ], batch size: 70, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:08:13,239 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.328e+01 2.533e+01 2.862e+01 3.908e+01, threshold=5.065e+01, percent-clipped=0.0 2024-08-15 16:08:13,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3261820.0, ans=0.05 2024-08-15 16:08:16,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3261820.0, ans=0.07 2024-08-15 16:08:23,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3261820.0, ans=0.125 2024-08-15 16:08:31,660 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 25 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-15 16:08:32,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3261920.0, ans=0.125 2024-08-15 16:08:35,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3261920.0, ans=0.125 2024-08-15 16:09:05,760 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-15 16:09:08,249 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 7400, loss[loss=0.1086, beats_loss=0.007452, ecapa_loss=0.0001329, whisper_loss=0.09985, over 17112.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01048, ecapa_loss=0.0001497, whisper_loss=0.09136, over 3901495.01 frames. ], batch size: 63, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:09:17,386 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.13 vs. limit=15.0 2024-08-15 16:09:19,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3262220.0, ans=0.2 2024-08-15 16:09:26,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3262320.0, ans=0.0 2024-08-15 16:09:52,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3262520.0, ans=0.09899494936611666 2024-08-15 16:10:12,195 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.27 vs. limit=6.0 2024-08-15 16:10:12,990 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3262620.0, ans=0.125 2024-08-15 16:10:17,751 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 7450, loss[loss=0.1107, beats_loss=0.007853, ecapa_loss=0.0001413, whisper_loss=0.1015, over 14428.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01045, ecapa_loss=0.0001501, whisper_loss=0.0913, over 3876682.49 frames. ], batch size: 55, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:10:27,793 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 27 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-15 16:10:30,651 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-15 16:10:32,936 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.008e+01 2.336e+01 2.535e+01 2.838e+01 5.757e+01, threshold=5.069e+01, percent-clipped=1.0 2024-08-15 16:10:39,838 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 29 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-15 16:10:41,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3262820.0, ans=0.125 2024-08-15 16:10:44,142 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3262920.0, ans=0.1 2024-08-15 16:10:46,396 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 24 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-15 16:10:56,498 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 21 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-15 16:10:58,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3263020.0, ans=0.125 2024-08-15 16:11:07,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3263020.0, ans=0.1 2024-08-15 16:11:09,168 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 19 from LS+wenet, 23 from Vox, 20 fro AS 2024-08-15 16:11:16,762 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3263120.0, ans=0.2 2024-08-15 16:11:27,202 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 7500, loss[loss=0.1093, beats_loss=0.009031, ecapa_loss=0.0001454, whisper_loss=0.09882, over 22108.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01045, ecapa_loss=0.0001495, whisper_loss=0.09172, over 3898556.33 frames. ], batch size: 83, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:11:31,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3263220.0, ans=0.1 2024-08-15 16:11:37,557 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3263220.0, ans=0.0 2024-08-15 16:11:37,608 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 16:11:37,986 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.52 vs. limit=22.5 2024-08-15 16:11:47,267 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 31 from Vox, 30 fro AS 2024-08-15 16:11:48,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3263320.0, ans=0.04949747468305833 2024-08-15 16:11:54,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3263420.0, ans=0.0 2024-08-15 16:12:00,191 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3263420.0, ans=0.0 2024-08-15 16:12:01,810 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.64 vs. limit=15.0 2024-08-15 16:12:02,986 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3263420.0, ans=0.125 2024-08-15 16:12:09,616 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3263520.0, ans=0.07 2024-08-15 16:12:18,569 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.94 vs. limit=15.0 2024-08-15 16:12:37,247 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 7550, loss[loss=0.1325, beats_loss=0.00719, ecapa_loss=0.0001518, whisper_loss=0.1238, over 23176.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01052, ecapa_loss=0.0001503, whisper_loss=0.0911, over 3880095.56 frames. ], batch size: 90, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:12:43,616 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3263720.0, ans=0.125 2024-08-15 16:12:52,887 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.915e+01 2.288e+01 2.542e+01 2.895e+01 9.119e+01, threshold=5.085e+01, percent-clipped=2.0 2024-08-15 16:12:58,575 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-15 16:13:00,562 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.94 vs. limit=15.0 2024-08-15 16:13:02,924 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 28 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-15 16:13:07,252 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 26 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-15 16:13:12,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3263920.0, ans=0.1 2024-08-15 16:13:20,189 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3264020.0, ans=0.09899494936611666 2024-08-15 16:13:24,203 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 24 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-15 16:13:30,659 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.82 vs. limit=15.0 2024-08-15 16:13:40,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3264120.0, ans=0.125 2024-08-15 16:13:48,455 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 7600, loss[loss=0.1064, beats_loss=0.01112, ecapa_loss=0.0001364, whisper_loss=0.09392, over 22112.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01056, ecapa_loss=0.0001511, whisper_loss=0.09034, over 3861263.07 frames. ], batch size: 88, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:13:50,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3264220.0, ans=0.0 2024-08-15 16:14:00,445 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 23 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-15 16:14:09,079 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 26 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-15 16:14:22,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3264420.0, ans=0.125 2024-08-15 16:14:46,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3264620.0, ans=0.0 2024-08-15 16:14:48,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3264620.0, ans=0.0 2024-08-15 16:15:00,139 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 7650, loss[loss=0.09299, beats_loss=0.01134, ecapa_loss=0.0001328, whisper_loss=0.08032, over 19880.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01052, ecapa_loss=0.0001503, whisper_loss=0.09103, over 3848882.02 frames. ], batch size: 78, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:15:02,355 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3264720.0, ans=0.125 2024-08-15 16:15:04,666 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 36 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-15 16:15:07,609 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 32 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-15 16:15:08,995 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 17 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-15 16:15:11,938 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 26 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-15 16:15:15,829 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.326e+01 2.582e+01 2.912e+01 5.220e+01, threshold=5.164e+01, percent-clipped=1.0 2024-08-15 16:15:21,673 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.908e-01 2024-08-15 16:15:30,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3264920.0, ans=0.0 2024-08-15 16:15:40,091 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 20 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-15 16:15:43,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3265020.0, ans=0.125 2024-08-15 16:15:59,706 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.13 vs. limit=15.0 2024-08-15 16:16:02,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3265120.0, ans=0.2 2024-08-15 16:16:11,178 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 7700, loss[loss=0.08797, beats_loss=0.01081, ecapa_loss=0.000187, whisper_loss=0.07529, over 14152.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01049, ecapa_loss=0.0001508, whisper_loss=0.09136, over 3862588.96 frames. ], batch size: 57, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:16:19,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3265220.0, ans=0.1 2024-08-15 16:16:36,569 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3265320.0, ans=0.09899494936611666 2024-08-15 16:17:23,524 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3265720.0, ans=0.125 2024-08-15 16:17:24,498 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 7750, loss[loss=0.09136, beats_loss=0.01194, ecapa_loss=0.0001367, whisper_loss=0.07805, over 22668.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0106, ecapa_loss=0.0001498, whisper_loss=0.08998, over 3890904.84 frames. ], batch size: 91, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:17:39,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3265720.0, ans=0.0 2024-08-15 16:17:46,604 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.309e+01 2.587e+01 2.792e+01 3.462e+01, threshold=5.174e+01, percent-clipped=0.0 2024-08-15 16:17:49,240 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 16 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-15 16:18:15,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3265920.0, ans=0.125 2024-08-15 16:18:20,885 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 18 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-15 16:18:30,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3266020.0, ans=0.125 2024-08-15 16:18:49,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3266220.0, ans=0.0 2024-08-15 16:18:50,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3266220.0, ans=0.1 2024-08-15 16:18:50,826 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 7800, loss[loss=0.0743, beats_loss=0.01061, ecapa_loss=0.0001505, whisper_loss=0.06218, over 13963.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01054, ecapa_loss=0.0001501, whisper_loss=0.09049, over 3885747.94 frames. ], batch size: 57, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:18:51,081 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 30 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-15 16:19:20,185 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.39 vs. limit=15.0 2024-08-15 16:19:36,561 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.08 vs. limit=15.0 2024-08-15 16:19:37,904 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 15 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-15 16:19:54,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3266520.0, ans=0.0 2024-08-15 16:20:03,296 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3266520.0, ans=0.0 2024-08-15 16:20:07,275 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3266520.0, ans=0.07 2024-08-15 16:20:16,290 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.73 vs. limit=15.0 2024-08-15 16:20:18,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3266620.0, ans=10.0 2024-08-15 16:20:33,336 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 7850, loss[loss=0.09148, beats_loss=0.01292, ecapa_loss=0.0001573, whisper_loss=0.07699, over 16648.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0106, ecapa_loss=0.0001493, whisper_loss=0.09037, over 3882815.84 frames. ], batch size: 66, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:20:56,117 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.771e+01 2.312e+01 2.657e+01 2.999e+01 5.998e+01, threshold=5.314e+01, percent-clipped=1.0 2024-08-15 16:20:58,731 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 22 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-15 16:21:12,524 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3266920.0, ans=0.125 2024-08-15 16:21:15,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3266920.0, ans=0.125 2024-08-15 16:21:34,202 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-15 16:21:52,095 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3267020.0, ans=0.1 2024-08-15 16:22:11,120 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 18 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-15 16:22:16,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3267120.0, ans=0.125 2024-08-15 16:22:16,895 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.40 vs. limit=12.0 2024-08-15 16:22:21,337 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 7900, loss[loss=0.1022, beats_loss=0.01082, ecapa_loss=0.000139, whisper_loss=0.08998, over 22569.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01064, ecapa_loss=0.0001498, whisper_loss=0.09029, over 3900982.46 frames. ], batch size: 86, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:22:25,716 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.40 vs. limit=10.0 2024-08-15 16:22:38,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3267220.0, ans=0.0 2024-08-15 16:23:02,835 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-15 16:23:07,235 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-15 16:23:33,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3267520.0, ans=0.1 2024-08-15 16:23:38,949 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 23 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-15 16:23:39,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3267520.0, ans=0.125 2024-08-15 16:23:47,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3267520.0, ans=0.125 2024-08-15 16:23:51,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3267520.0, ans=0.125 2024-08-15 16:23:53,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3267520.0, ans=0.0 2024-08-15 16:24:27,112 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 7950, loss[loss=0.09356, beats_loss=0.009361, ecapa_loss=0.0001721, whisper_loss=0.08248, over 22459.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01066, ecapa_loss=0.0001498, whisper_loss=0.09028, over 3910688.34 frames. ], batch size: 93, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:24:52,830 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 32 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-15 16:24:53,551 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.67 vs. limit=15.0 2024-08-15 16:24:53,894 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.883e+01 2.363e+01 2.541e+01 2.931e+01 3.622e+01, threshold=5.082e+01, percent-clipped=0.0 2024-08-15 16:24:55,055 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3267820.0, ans=0.125 2024-08-15 16:25:43,622 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-15 16:25:45,972 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.863e-01 2024-08-15 16:25:47,806 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-15 16:25:50,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3268020.0, ans=0.125 2024-08-15 16:26:33,025 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 8000, loss[loss=0.0931, beats_loss=0.01395, ecapa_loss=0.000129, whisper_loss=0.07786, over 23253.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01068, ecapa_loss=0.0001493, whisper_loss=0.09002, over 3882481.38 frames. ], batch size: 95, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:27:13,086 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 24 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-15 16:27:15,523 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.40 vs. limit=22.5 2024-08-15 16:27:24,503 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3268420.0, ans=0.125 2024-08-15 16:27:31,386 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.21 vs. limit=6.0 2024-08-15 16:27:42,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3268420.0, ans=0.0 2024-08-15 16:27:49,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3268520.0, ans=0.2 2024-08-15 16:28:15,912 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 8050, loss[loss=0.116, beats_loss=0.01019, ecapa_loss=0.0001101, whisper_loss=0.1047, over 21399.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01068, ecapa_loss=0.0001488, whisper_loss=0.08988, over 3893610.33 frames. ], batch size: 81, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:28:32,998 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.283e+01 2.526e+01 2.890e+01 4.835e+01, threshold=5.052e+01, percent-clipped=0.0 2024-08-15 16:28:34,692 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.61 vs. limit=22.5 2024-08-15 16:28:38,902 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.08 vs. limit=15.0 2024-08-15 16:29:15,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3269020.0, ans=0.125 2024-08-15 16:29:32,323 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-15 16:29:35,051 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 8100, loss[loss=0.1233, beats_loss=0.008769, ecapa_loss=0.0001312, whisper_loss=0.1133, over 14347.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01064, ecapa_loss=0.0001494, whisper_loss=0.09024, over 3858772.13 frames. ], batch size: 54, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:29:42,347 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.93 vs. limit=15.0 2024-08-15 16:29:42,505 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.12 vs. limit=15.0 2024-08-15 16:30:55,833 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3269720.0, ans=0.1 2024-08-15 16:30:56,638 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 8150, loss[loss=0.09939, beats_loss=0.01114, ecapa_loss=0.0001301, whisper_loss=0.08695, over 14328.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01072, ecapa_loss=0.000149, whisper_loss=0.08915, over 3854066.36 frames. ], batch size: 57, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:30:59,188 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-15 16:31:15,492 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.817e+01 2.201e+01 2.455e+01 2.771e+01 3.780e+01, threshold=4.910e+01, percent-clipped=0.0 2024-08-15 16:31:15,756 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 18 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-15 16:31:17,137 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 21 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-15 16:31:21,963 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=3269820.0, ans=0.05 2024-08-15 16:31:23,800 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 18 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-15 16:31:25,749 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-15 16:31:28,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3269920.0, ans=0.125 2024-08-15 16:31:42,673 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.69 vs. limit=15.0 2024-08-15 16:31:55,978 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 26 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-15 16:32:11,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3270120.0, ans=0.125 2024-08-15 16:32:16,916 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 8200, loss[loss=0.0833, beats_loss=0.01232, ecapa_loss=0.0001391, whisper_loss=0.06959, over 19597.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01075, ecapa_loss=0.0001494, whisper_loss=0.0891, over 3885327.04 frames. ], batch size: 79, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:32:48,131 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 22 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-15 16:33:01,460 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 20 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-15 16:33:04,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3270520.0, ans=0.0 2024-08-15 16:33:08,636 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.93 vs. limit=15.0 2024-08-15 16:33:09,375 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 28 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-15 16:33:12,702 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 27 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-15 16:33:12,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3270520.0, ans=0.1 2024-08-15 16:33:19,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3270620.0, ans=0.125 2024-08-15 16:33:30,971 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 27 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-15 16:33:34,024 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 8250, loss[loss=0.0985, beats_loss=0.011, ecapa_loss=0.0001982, whisper_loss=0.08552, over 18886.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01076, ecapa_loss=0.0001495, whisper_loss=0.08948, over 3884847.59 frames. ], batch size: 82, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:33:37,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3270720.0, ans=0.125 2024-08-15 16:33:40,182 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3270720.0, ans=0.0 2024-08-15 16:33:41,432 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 26 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-15 16:33:50,940 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.927e+01 2.400e+01 2.685e+01 3.048e+01 2.636e+02, threshold=5.369e+01, percent-clipped=3.0 2024-08-15 16:33:59,386 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.18 vs. limit=12.0 2024-08-15 16:34:06,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3270920.0, ans=0.125 2024-08-15 16:34:19,389 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 23 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-15 16:34:21,296 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3271020.0, ans=0.0 2024-08-15 16:34:25,659 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.41 vs. limit=12.0 2024-08-15 16:34:39,772 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 21 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-15 16:34:48,524 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 8300, loss[loss=0.1044, beats_loss=0.0129, ecapa_loss=0.0001435, whisper_loss=0.09006, over 21922.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01071, ecapa_loss=0.0001486, whisper_loss=0.08949, over 3868445.14 frames. ], batch size: 89, lr: 2.70e-03, grad_scale: 1.152921504606847e+18 2024-08-15 16:34:59,419 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 14 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-15 16:35:12,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3271320.0, ans=0.125 2024-08-15 16:35:24,391 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 16:35:36,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3271520.0, ans=0.1 2024-08-15 16:35:39,557 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3271520.0, ans=0.2 2024-08-15 16:35:44,758 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 21 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-15 16:35:51,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3271620.0, ans=0.0 2024-08-15 16:36:02,989 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 8350, loss[loss=0.1114, beats_loss=0.01115, ecapa_loss=0.0001583, whisper_loss=0.09862, over 21038.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01068, ecapa_loss=0.0001494, whisper_loss=0.08922, over 3854052.98 frames. ], batch size: 85, lr: 2.70e-03, grad_scale: 1.152921504606847e+18 2024-08-15 16:36:03,636 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3271720.0, ans=0.1 2024-08-15 16:36:15,341 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2024-08-15 16:36:17,706 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 17 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-15 16:36:18,840 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.348e+01 2.577e+01 2.897e+01 4.165e+01, threshold=5.153e+01, percent-clipped=0.0 2024-08-15 16:36:20,650 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 28 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-15 16:36:39,842 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.28 vs. limit=6.0 2024-08-15 16:37:17,083 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 8400, loss[loss=0.1113, beats_loss=0.01034, ecapa_loss=0.0001357, whisper_loss=0.09962, over 17388.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01056, ecapa_loss=0.0001494, whisper_loss=0.09083, over 3871099.71 frames. ], batch size: 67, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:37:26,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3272220.0, ans=0.125 2024-08-15 16:37:36,175 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.76 vs. limit=6.0 2024-08-15 16:37:49,332 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 26 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-15 16:38:17,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3272620.0, ans=0.125 2024-08-15 16:38:18,073 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-15 16:38:18,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=3272620.0, ans=0.02 2024-08-15 16:38:32,753 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 8450, loss[loss=0.117, beats_loss=0.01058, ecapa_loss=0.000122, whisper_loss=0.1052, over 24443.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01059, ecapa_loss=0.0001502, whisper_loss=0.09071, over 3876072.36 frames. ], batch size: 92, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:38:36,178 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 19 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-15 16:38:48,123 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-15 16:38:50,518 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.958e+01 2.308e+01 2.508e+01 2.815e+01 5.021e+01, threshold=5.016e+01, percent-clipped=0.0 2024-08-15 16:39:03,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3272920.0, ans=0.125 2024-08-15 16:39:06,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3272920.0, ans=0.125 2024-08-15 16:39:07,688 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3272920.0, ans=0.1 2024-08-15 16:39:36,751 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 18 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-15 16:39:39,002 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3273120.0, ans=0.0 2024-08-15 16:39:39,254 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.12 vs. limit=15.0 2024-08-15 16:39:41,894 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.19 vs. limit=15.0 2024-08-15 16:39:47,661 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 8500, loss[loss=0.1111, beats_loss=0.011, ecapa_loss=0.0001334, whisper_loss=0.09875, over 20585.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01059, ecapa_loss=0.0001497, whisper_loss=0.09105, over 3900575.04 frames. ], batch size: 79, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:39:48,355 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3273220.0, ans=0.1 2024-08-15 16:39:50,135 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.84 vs. limit=22.5 2024-08-15 16:39:58,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3273220.0, ans=0.2 2024-08-15 16:40:14,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3273320.0, ans=0.125 2024-08-15 16:41:05,123 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 8550, loss[loss=0.1007, beats_loss=0.01177, ecapa_loss=0.0001395, whisper_loss=0.08755, over 21973.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01051, ecapa_loss=0.0001499, whisper_loss=0.09121, over 3895131.55 frames. ], batch size: 89, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:41:22,708 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.07 vs. limit=15.0 2024-08-15 16:41:23,402 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.746e+01 2.398e+01 2.637e+01 2.998e+01 4.357e+01, threshold=5.275e+01, percent-clipped=0.0 2024-08-15 16:41:28,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3273820.0, ans=0.125 2024-08-15 16:41:40,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3273920.0, ans=0.125 2024-08-15 16:41:47,091 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3273920.0, ans=0.5 2024-08-15 16:41:59,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3274020.0, ans=0.125 2024-08-15 16:42:21,701 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 8600, loss[loss=0.1116, beats_loss=0.009273, ecapa_loss=0.000161, whisper_loss=0.1007, over 18498.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01047, ecapa_loss=0.0001503, whisper_loss=0.09156, over 3880915.62 frames. ], batch size: 73, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:42:23,858 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3274220.0, ans=0.125 2024-08-15 16:42:33,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3274220.0, ans=0.0 2024-08-15 16:42:49,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3274320.0, ans=0.125 2024-08-15 16:42:51,221 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.14 vs. limit=12.0 2024-08-15 16:42:55,442 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3274420.0, ans=0.125 2024-08-15 16:42:56,723 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-15 16:43:02,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3274420.0, ans=0.0 2024-08-15 16:43:02,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3274420.0, ans=0.1 2024-08-15 16:43:15,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3274520.0, ans=0.0 2024-08-15 16:43:19,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3274520.0, ans=0.0 2024-08-15 16:43:37,170 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.72 vs. limit=15.0 2024-08-15 16:43:37,991 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 8650, loss[loss=0.09983, beats_loss=0.01005, ecapa_loss=0.0002111, whisper_loss=0.08767, over 20659.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01049, ecapa_loss=0.0001515, whisper_loss=0.09084, over 3854986.86 frames. ], batch size: 92, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:43:53,401 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3274820.0, ans=0.1 2024-08-15 16:43:55,928 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.305e+01 2.531e+01 2.832e+01 4.112e+01, threshold=5.062e+01, percent-clipped=0.0 2024-08-15 16:44:01,735 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 13 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-15 16:44:06,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3274920.0, ans=0.2 2024-08-15 16:44:15,979 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.50 vs. limit=22.5 2024-08-15 16:44:20,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3274920.0, ans=0.125 2024-08-15 16:44:26,777 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3275020.0, ans=0.0 2024-08-15 16:44:31,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3275020.0, ans=0.0 2024-08-15 16:44:37,436 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3275120.0, ans=0.125 2024-08-15 16:44:53,110 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.74 vs. limit=15.0 2024-08-15 16:44:53,550 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 8700, loss[loss=0.08581, beats_loss=0.01213, ecapa_loss=0.000142, whisper_loss=0.07226, over 21806.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01053, ecapa_loss=0.0001506, whisper_loss=0.09044, over 3840799.53 frames. ], batch size: 93, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:44:57,101 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 28 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-15 16:44:58,300 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-15 16:44:58,786 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.64 vs. limit=15.0 2024-08-15 16:45:07,153 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.56 vs. limit=15.0 2024-08-15 16:45:15,702 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 26 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-15 16:45:48,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3275520.0, ans=0.125 2024-08-15 16:46:11,303 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 8750, loss[loss=0.1236, beats_loss=0.01036, ecapa_loss=0.0001569, whisper_loss=0.1117, over 23167.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01047, ecapa_loss=0.0001513, whisper_loss=0.09108, over 3863811.89 frames. ], batch size: 94, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:46:14,833 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 29 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-15 16:46:16,110 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-15 16:46:19,157 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 22 from LS+wenet, 12 from Vox, 20 fro AS 2024-08-15 16:46:22,347 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 22 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-15 16:46:23,523 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 15 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-15 16:46:28,147 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 18 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-15 16:46:29,367 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.742e+01 2.332e+01 2.573e+01 2.934e+01 5.671e+01, threshold=5.146e+01, percent-clipped=2.0 2024-08-15 16:46:42,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3275820.0, ans=0.0 2024-08-15 16:46:45,219 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.41 vs. limit=22.5 2024-08-15 16:46:49,444 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3275920.0, ans=0.125 2024-08-15 16:46:56,984 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 37 from LS+wenet, 12 from Vox, 44 fro AS 2024-08-15 16:47:05,027 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.67 vs. limit=15.0 2024-08-15 16:47:07,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3276020.0, ans=0.1 2024-08-15 16:47:12,830 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 18 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-15 16:47:14,937 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 15 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-15 16:47:29,873 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 23 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-15 16:47:39,049 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 8800, loss[loss=0.1148, beats_loss=0.009666, ecapa_loss=0.0001927, whisper_loss=0.1032, over 22080.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01056, ecapa_loss=0.0001517, whisper_loss=0.09082, over 3841382.10 frames. ], batch size: 90, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:48:25,189 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3276420.0, ans=0.0 2024-08-15 16:48:25,839 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.34 vs. limit=22.5 2024-08-15 16:48:42,614 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 32 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-15 16:48:47,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3276520.0, ans=0.1 2024-08-15 16:48:57,474 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-15 16:48:58,999 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-15 16:49:09,122 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-15 16:49:15,374 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 8850, loss[loss=0.1062, beats_loss=0.01286, ecapa_loss=0.0001277, whisper_loss=0.0921, over 22054.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01073, ecapa_loss=0.0001504, whisper_loss=0.09018, over 3884207.17 frames. ], batch size: 88, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:49:29,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3276720.0, ans=0.125 2024-08-15 16:49:36,989 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.293e+01 2.672e+01 3.024e+01 1.700e+02, threshold=5.345e+01, percent-clipped=3.0 2024-08-15 16:49:46,174 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 26 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-15 16:49:47,711 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3276920.0, ans=0.125 2024-08-15 16:49:49,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3276920.0, ans=0.125 2024-08-15 16:49:54,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3276920.0, ans=0.125 2024-08-15 16:49:55,509 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 15 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-15 16:50:03,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=3277020.0, ans=0.95 2024-08-15 16:50:03,726 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 16:50:05,496 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.42 vs. limit=15.0 2024-08-15 16:50:14,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3277020.0, ans=0.04949747468305833 2024-08-15 16:50:34,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3277220.0, ans=0.125 2024-08-15 16:50:35,607 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 8900, loss[loss=0.09798, beats_loss=0.01124, ecapa_loss=0.0001452, whisper_loss=0.08529, over 20771.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01064, ecapa_loss=0.0001498, whisper_loss=0.09102, over 3883005.20 frames. ], batch size: 85, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:50:37,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3277220.0, ans=0.125 2024-08-15 16:50:38,634 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 22 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-15 16:50:39,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3277220.0, ans=0.0 2024-08-15 16:50:53,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3277320.0, ans=0.0 2024-08-15 16:50:57,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3277320.0, ans=0.07 2024-08-15 16:51:06,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3277420.0, ans=0.125 2024-08-15 16:51:36,375 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 21 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-15 16:51:45,491 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3277620.0, ans=0.125 2024-08-15 16:51:49,309 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 8950, loss[loss=0.1066, beats_loss=0.009625, ecapa_loss=0.0001494, whisper_loss=0.09548, over 18879.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01077, ecapa_loss=0.0001497, whisper_loss=0.09017, over 3870029.13 frames. ], batch size: 74, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:51:54,267 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 13 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-15 16:51:57,072 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 31 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-15 16:52:06,964 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.875e+01 2.302e+01 2.531e+01 2.768e+01 4.662e+01, threshold=5.062e+01, percent-clipped=0.0 2024-08-15 16:52:23,799 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-15 16:52:29,835 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 23 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-15 16:52:34,232 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3278020.0, ans=0.0 2024-08-15 16:52:41,850 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-15 16:52:51,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3278120.0, ans=0.0 2024-08-15 16:52:55,030 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 25 from LS+wenet, 10 from Vox, 20 fro AS 2024-08-15 16:53:01,954 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 9000, loss[loss=0.0969, beats_loss=0.0141, ecapa_loss=0.0001065, whisper_loss=0.08174, over 21502.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01076, ecapa_loss=0.0001505, whisper_loss=0.0901, over 3874186.76 frames. ], batch size: 85, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:53:01,956 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-15 16:53:39,099 INFO [train_multi_KD3.py:1149] (0/4) Epoch 23, validation on ASR_libri: loss=0.2514, beats_loss=0, ecapa_loss=0.0005338, whisper_loss=0.2461, over 922467.00 frames. 2024-08-15 16:53:57,670 INFO [train_multi_KD3.py:1149] (0/4) Epoch 23, validation on SV_voxceleb1: loss=0.004212, beats_loss=0, ecapa_loss=0.0004212, whisper_loss=0, over 939242.00 frames. 2024-08-15 16:55:49,245 INFO [train_multi_KD3.py:1149] (0/4) Epoch 23, validation on AT_audioset: loss=0.02337, beats_loss=0.02337, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 16:55:49,255 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32601MB 2024-08-15 16:55:49,989 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.90 vs. limit=22.5 2024-08-15 16:55:58,540 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3278220.0, ans=0.125 2024-08-15 16:56:01,880 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.96 vs. limit=15.0 2024-08-15 16:56:31,663 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.79 vs. limit=15.0 2024-08-15 16:56:47,718 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.70 vs. limit=22.5 2024-08-15 16:56:57,041 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.52 vs. limit=22.5 2024-08-15 16:56:59,650 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.792e-03 2024-08-15 16:57:03,767 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 9050, loss[loss=0.09564, beats_loss=0.01101, ecapa_loss=0.0001251, whisper_loss=0.08339, over 21248.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01066, ecapa_loss=0.0001502, whisper_loss=0.09017, over 3886384.80 frames. ], batch size: 81, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:57:10,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3278720.0, ans=0.125 2024-08-15 16:57:11,493 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3278720.0, ans=0.125 2024-08-15 16:57:21,128 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.929e+01 2.395e+01 2.682e+01 2.921e+01 1.898e+02, threshold=5.364e+01, percent-clipped=1.0 2024-08-15 16:57:26,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3278820.0, ans=0.1 2024-08-15 16:57:29,150 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 12 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-15 16:57:35,200 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3278920.0, ans=0.1 2024-08-15 16:57:44,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3278920.0, ans=0.025 2024-08-15 16:57:48,991 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.95 vs. limit=22.5 2024-08-15 16:58:04,116 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3279120.0, ans=0.0 2024-08-15 16:58:05,240 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 16 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-15 16:58:06,416 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 32 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-15 16:58:17,933 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 9100, loss[loss=0.1027, beats_loss=0.009229, ecapa_loss=0.0001568, whisper_loss=0.09188, over 19437.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01063, ecapa_loss=0.0001508, whisper_loss=0.09034, over 3880838.56 frames. ], batch size: 79, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:58:26,924 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.97 vs. limit=15.0 2024-08-15 16:58:32,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3279320.0, ans=0.04949747468305833 2024-08-15 16:58:38,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3279320.0, ans=0.125 2024-08-15 16:58:46,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3279420.0, ans=0.125 2024-08-15 16:59:13,694 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.13 vs. limit=15.0 2024-08-15 16:59:29,575 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3279720.0, ans=0.125 2024-08-15 16:59:30,363 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 9150, loss[loss=0.1115, beats_loss=0.01133, ecapa_loss=0.000163, whisper_loss=0.09858, over 21039.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01066, ecapa_loss=0.00015, whisper_loss=0.09017, over 3887205.39 frames. ], batch size: 86, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:59:47,521 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.818e+01 2.265e+01 2.493e+01 2.729e+01 3.385e+01, threshold=4.985e+01, percent-clipped=0.0 2024-08-15 17:00:09,332 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-328000.pt 2024-08-15 17:00:41,005 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3280120.0, ans=0.125 2024-08-15 17:00:42,519 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3280120.0, ans=10.0 2024-08-15 17:00:45,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3280220.0, ans=0.0 2024-08-15 17:00:46,254 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 9200, loss[loss=0.1012, beats_loss=0.01092, ecapa_loss=0.000153, whisper_loss=0.08875, over 13604.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01072, ecapa_loss=0.0001495, whisper_loss=0.08938, over 3859274.13 frames. ], batch size: 54, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:00:55,932 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3280220.0, ans=0.125 2024-08-15 17:01:01,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3280320.0, ans=0.04949747468305833 2024-08-15 17:01:08,109 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.20 vs. limit=15.0 2024-08-15 17:01:09,773 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.58 vs. limit=12.0 2024-08-15 17:01:15,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3280420.0, ans=0.1 2024-08-15 17:01:16,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3280420.0, ans=0.125 2024-08-15 17:01:28,718 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2024-08-15 17:01:41,831 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3280520.0, ans=0.125 2024-08-15 17:01:42,073 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.02 vs. limit=15.0 2024-08-15 17:02:00,517 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 9250, loss[loss=0.1003, beats_loss=0.01035, ecapa_loss=0.0001612, whisper_loss=0.08834, over 21701.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01069, ecapa_loss=0.0001502, whisper_loss=0.08911, over 3897504.36 frames. ], batch size: 89, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:02:00,697 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-15 17:02:02,405 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3280720.0, ans=0.125 2024-08-15 17:02:05,881 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.95 vs. limit=22.5 2024-08-15 17:02:11,126 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.87 vs. limit=15.0 2024-08-15 17:02:12,871 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3280720.0, ans=0.1 2024-08-15 17:02:17,842 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.743e+01 2.360e+01 2.606e+01 2.888e+01 4.280e+01, threshold=5.211e+01, percent-clipped=0.0 2024-08-15 17:02:40,899 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 23 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-15 17:02:45,493 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3281020.0, ans=0.0 2024-08-15 17:02:56,908 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 26 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-15 17:02:58,168 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-15 17:03:09,160 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-15 17:03:14,955 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 9300, loss[loss=0.09391, beats_loss=0.009758, ecapa_loss=0.0001284, whisper_loss=0.08286, over 17264.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01063, ecapa_loss=0.0001498, whisper_loss=0.08951, over 3893467.68 frames. ], batch size: 66, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:03:19,445 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 23 from LS+wenet, 17 from Vox, 49 fro AS 2024-08-15 17:03:32,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3281320.0, ans=0.0 2024-08-15 17:03:38,911 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 15 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-15 17:04:02,717 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3281520.0, ans=0.0 2024-08-15 17:04:14,161 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 17:04:31,895 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3281720.0, ans=0.1 2024-08-15 17:04:32,693 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 9350, loss[loss=0.1009, beats_loss=0.01135, ecapa_loss=0.0001405, whisper_loss=0.08815, over 21971.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01057, ecapa_loss=0.0001493, whisper_loss=0.0902, over 3901738.97 frames. ], batch size: 87, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:04:42,404 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 25 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-15 17:04:47,293 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.79 vs. limit=15.0 2024-08-15 17:04:51,071 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.266e+01 2.526e+01 2.881e+01 4.072e+01, threshold=5.051e+01, percent-clipped=0.0 2024-08-15 17:05:09,805 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3281920.0, ans=0.125 2024-08-15 17:05:17,527 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 16 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-15 17:05:33,122 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.43 vs. limit=22.5 2024-08-15 17:05:49,187 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 9400, loss[loss=0.09693, beats_loss=0.01063, ecapa_loss=0.0001542, whisper_loss=0.08476, over 17133.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01058, ecapa_loss=0.0001495, whisper_loss=0.09015, over 3886564.13 frames. ], batch size: 69, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:06:00,500 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-15 17:06:00,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=3282220.0, ans=0.05 2024-08-15 17:06:10,115 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-15 17:06:24,147 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 19 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-15 17:06:24,530 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3282420.0, ans=0.0 2024-08-15 17:06:24,829 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.87 vs. limit=15.0 2024-08-15 17:06:27,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3282420.0, ans=0.125 2024-08-15 17:06:28,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3282420.0, ans=0.125 2024-08-15 17:06:31,666 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 22 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-15 17:06:41,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3282520.0, ans=0.125 2024-08-15 17:06:46,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3282520.0, ans=0.0 2024-08-15 17:06:48,154 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 25 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-15 17:06:48,479 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3282520.0, ans=0.5 2024-08-15 17:07:08,892 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 9450, loss[loss=0.08596, beats_loss=0.01422, ecapa_loss=0.0001375, whisper_loss=0.07036, over 20650.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.0106, ecapa_loss=0.0001493, whisper_loss=0.08951, over 3888733.50 frames. ], batch size: 85, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:07:18,593 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-15 17:07:21,526 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 23 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-15 17:07:25,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3282820.0, ans=0.2 2024-08-15 17:07:27,633 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.838e+01 2.277e+01 2.590e+01 2.788e+01 7.153e+01, threshold=5.181e+01, percent-clipped=1.0 2024-08-15 17:07:33,935 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.60 vs. limit=22.5 2024-08-15 17:07:44,445 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 14 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-15 17:07:48,264 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3282920.0, ans=0.0 2024-08-15 17:07:53,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3282920.0, ans=0.0 2024-08-15 17:07:53,249 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.29 vs. limit=6.0 2024-08-15 17:08:02,704 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.15 vs. limit=22.5 2024-08-15 17:08:10,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3283120.0, ans=0.125 2024-08-15 17:08:11,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3283120.0, ans=0.2 2024-08-15 17:08:26,373 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 9500, loss[loss=0.1035, beats_loss=0.01079, ecapa_loss=0.0001286, whisper_loss=0.09146, over 23334.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01062, ecapa_loss=0.0001502, whisper_loss=0.08955, over 3878009.62 frames. ], batch size: 89, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:08:50,478 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3283320.0, ans=0.125 2024-08-15 17:09:02,953 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3283420.0, ans=0.125 2024-08-15 17:09:30,291 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 23 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-15 17:09:32,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3283620.0, ans=0.2 2024-08-15 17:09:35,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3283620.0, ans=0.0 2024-08-15 17:09:40,448 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 9550, loss[loss=0.1106, beats_loss=0.01167, ecapa_loss=0.0001307, whisper_loss=0.09766, over 20213.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.0106, ecapa_loss=0.000151, whisper_loss=0.08932, over 3877149.33 frames. ], batch size: 78, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:09:45,121 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 24 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-15 17:09:57,822 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.838e+01 2.381e+01 2.631e+01 2.929e+01 4.005e+01, threshold=5.261e+01, percent-clipped=0.0 2024-08-15 17:10:11,663 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-15 17:10:17,033 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 27 from LS+wenet, 9 from Vox, 32 fro AS 2024-08-15 17:10:26,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3284020.0, ans=0.0 2024-08-15 17:10:32,057 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 30 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-15 17:10:45,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3284120.0, ans=0.125 2024-08-15 17:10:54,226 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 9600, loss[loss=0.113, beats_loss=0.009426, ecapa_loss=0.000195, whisper_loss=0.1016, over 20397.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01059, ecapa_loss=0.0001493, whisper_loss=0.09004, over 3851005.79 frames. ], batch size: 88, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:10:58,170 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.18 vs. limit=15.0 2024-08-15 17:10:59,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3284220.0, ans=10.0 2024-08-15 17:11:08,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3284320.0, ans=0.0 2024-08-15 17:11:16,711 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3284320.0, ans=0.0 2024-08-15 17:11:18,168 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-15 17:11:33,200 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 34 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-15 17:11:33,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3284420.0, ans=0.125 2024-08-15 17:11:48,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3284520.0, ans=0.1 2024-08-15 17:11:49,395 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-15 17:11:59,043 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.82 vs. limit=5.0 2024-08-15 17:11:59,549 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 36 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-15 17:12:08,237 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 9650, loss[loss=0.09058, beats_loss=0.009855, ecapa_loss=0.0001507, whisper_loss=0.07922, over 20477.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01047, ecapa_loss=0.0001502, whisper_loss=0.09069, over 3840554.18 frames. ], batch size: 81, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:12:15,790 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-15 17:12:18,810 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 17 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-15 17:12:25,651 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.860e+01 2.315e+01 2.525e+01 2.831e+01 4.515e+01, threshold=5.050e+01, percent-clipped=0.0 2024-08-15 17:12:30,661 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-15 17:12:30,920 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3284820.0, ans=0.125 2024-08-15 17:12:45,298 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 22 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-15 17:13:03,598 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3285020.0, ans=0.125 2024-08-15 17:13:06,449 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3285120.0, ans=0.1 2024-08-15 17:13:20,850 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.36 vs. limit=15.0 2024-08-15 17:13:21,436 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 9700, loss[loss=0.08448, beats_loss=0.009215, ecapa_loss=0.0001588, whisper_loss=0.07368, over 22143.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01054, ecapa_loss=0.0001501, whisper_loss=0.09032, over 3847546.56 frames. ], batch size: 90, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:13:21,699 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-15 17:14:09,607 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3285320.0, ans=0.1 2024-08-15 17:14:18,566 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 16 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-15 17:14:28,924 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-15 17:14:44,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3285420.0, ans=0.2 2024-08-15 17:14:59,423 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 10 from Vox, 47 fro AS 2024-08-15 17:15:03,969 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-15 17:15:05,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3285620.0, ans=0.125 2024-08-15 17:15:12,698 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-15 17:15:19,949 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 9750, loss[loss=0.126, beats_loss=0.01023, ecapa_loss=0.0001066, whisper_loss=0.1147, over 22624.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01063, ecapa_loss=0.0001488, whisper_loss=0.08988, over 3846070.66 frames. ], batch size: 81, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:15:24,572 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 18 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-15 17:15:26,409 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3285720.0, ans=0.125 2024-08-15 17:15:31,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3285720.0, ans=0.125 2024-08-15 17:15:33,474 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3285820.0, ans=0.2 2024-08-15 17:15:40,257 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.777e+01 2.300e+01 2.530e+01 2.812e+01 4.314e+01, threshold=5.060e+01, percent-clipped=0.0 2024-08-15 17:16:05,666 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-15 17:16:30,363 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 20 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-15 17:16:30,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3286020.0, ans=0.1 2024-08-15 17:16:52,530 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=3286120.0, ans=0.05 2024-08-15 17:16:58,057 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.25 vs. limit=15.0 2024-08-15 17:16:59,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3286120.0, ans=0.025 2024-08-15 17:17:02,577 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 9800, loss[loss=0.08526, beats_loss=0.01279, ecapa_loss=0.0001289, whisper_loss=0.07117, over 19674.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01063, ecapa_loss=0.0001478, whisper_loss=0.09024, over 3841275.20 frames. ], batch size: 81, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:18:03,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3286520.0, ans=0.5 2024-08-15 17:18:09,133 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 22 from LS+wenet, 20 from Vox, 51 fro AS 2024-08-15 17:18:22,483 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 17 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-15 17:18:53,634 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 9850, loss[loss=0.1023, beats_loss=0.01077, ecapa_loss=0.0001628, whisper_loss=0.08987, over 17896.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01069, ecapa_loss=0.0001472, whisper_loss=0.09032, over 3840744.42 frames. ], batch size: 72, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:18:59,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3286720.0, ans=0.0 2024-08-15 17:19:06,431 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 19 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-15 17:19:14,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3286720.0, ans=0.125 2024-08-15 17:19:16,878 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 18 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-15 17:19:24,286 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.372e+01 2.632e+01 2.935e+01 4.456e+01, threshold=5.264e+01, percent-clipped=0.0 2024-08-15 17:19:25,400 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3286820.0, ans=0.1 2024-08-15 17:19:52,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3286920.0, ans=0.125 2024-08-15 17:20:31,602 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 21 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-15 17:20:35,997 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.49 vs. limit=22.5 2024-08-15 17:20:41,952 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.65 vs. limit=8.0 2024-08-15 17:20:55,498 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.54 vs. limit=12.0 2024-08-15 17:20:58,526 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 9900, loss[loss=0.1339, beats_loss=0.007061, ecapa_loss=0.0001539, whisper_loss=0.1254, over 22766.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01065, ecapa_loss=0.0001471, whisper_loss=0.0909, over 3847849.35 frames. ], batch size: 89, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:20:59,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3287220.0, ans=0.5 2024-08-15 17:21:13,316 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 17:21:28,073 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-15 17:21:36,084 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3287320.0, ans=0.07 2024-08-15 17:22:15,482 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.98 vs. limit=15.0 2024-08-15 17:22:17,117 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3287520.0, ans=0.1 2024-08-15 17:22:22,803 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.59 vs. limit=15.0 2024-08-15 17:22:25,713 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-15 17:22:31,402 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3287520.0, ans=0.05 2024-08-15 17:22:33,131 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3287520.0, ans=0.0 2024-08-15 17:22:42,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3287620.0, ans=0.125 2024-08-15 17:22:46,903 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3287620.0, ans=0.025 2024-08-15 17:22:51,645 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 19 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-15 17:22:54,607 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3287620.0, ans=0.1 2024-08-15 17:23:01,455 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 9950, loss[loss=0.08583, beats_loss=0.01286, ecapa_loss=0.0001383, whisper_loss=0.07159, over 20572.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01071, ecapa_loss=0.000147, whisper_loss=0.09053, over 3876709.29 frames. ], batch size: 87, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:23:31,271 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.852e+01 2.466e+01 2.723e+01 3.016e+01 5.091e+01, threshold=5.446e+01, percent-clipped=0.0 2024-08-15 17:23:59,184 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-15 17:24:12,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=3287920.0, ans=10.0 2024-08-15 17:24:27,452 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.64 vs. limit=22.5 2024-08-15 17:24:38,510 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-15 17:24:45,548 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3288120.0, ans=0.2 2024-08-15 17:24:46,709 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 23 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-15 17:24:48,805 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3288220.0, ans=0.125 2024-08-15 17:24:50,091 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 10000, loss[loss=0.1102, beats_loss=0.009736, ecapa_loss=0.0001261, whisper_loss=0.09925, over 15610.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01066, ecapa_loss=0.0001487, whisper_loss=0.09087, over 3877697.10 frames. ], batch size: 60, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:24:57,466 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 27 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-15 17:25:07,332 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 16 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-15 17:25:21,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3288320.0, ans=0.125 2024-08-15 17:25:34,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3288420.0, ans=0.125 2024-08-15 17:25:43,171 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 25 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-15 17:25:48,671 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 24 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-15 17:26:01,528 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3288620.0, ans=0.04949747468305833 2024-08-15 17:26:05,826 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3288620.0, ans=0.2 2024-08-15 17:26:18,168 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 10050, loss[loss=0.106, beats_loss=0.01129, ecapa_loss=0.0001287, whisper_loss=0.0934, over 21839.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01052, ecapa_loss=0.0001485, whisper_loss=0.0917, over 3882168.63 frames. ], batch size: 89, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:26:40,552 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.730e+01 2.323e+01 2.519e+01 2.738e+01 4.374e+01, threshold=5.039e+01, percent-clipped=0.0 2024-08-15 17:26:43,313 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=3288820.0, ans=15.0 2024-08-15 17:26:44,828 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.81 vs. limit=15.0 2024-08-15 17:26:47,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3288820.0, ans=0.125 2024-08-15 17:26:55,247 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-15 17:27:01,193 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3288920.0, ans=0.1 2024-08-15 17:27:32,198 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 20 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-15 17:27:46,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3289220.0, ans=0.1 2024-08-15 17:27:47,852 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 10100, loss[loss=0.1041, beats_loss=0.01055, ecapa_loss=0.0001238, whisper_loss=0.09234, over 20898.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01055, ecapa_loss=0.0001494, whisper_loss=0.09119, over 3909544.16 frames. ], batch size: 79, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:27:49,963 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 13 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-15 17:27:55,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3289220.0, ans=0.0 2024-08-15 17:28:21,683 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 32 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-15 17:28:44,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3289520.0, ans=0.125 2024-08-15 17:28:48,266 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 32 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-15 17:29:15,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3289620.0, ans=0.0 2024-08-15 17:29:18,227 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 10150, loss[loss=0.08913, beats_loss=0.01157, ecapa_loss=0.000127, whisper_loss=0.07629, over 14013.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01045, ecapa_loss=0.0001505, whisper_loss=0.09167, over 3887923.83 frames. ], batch size: 55, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:29:24,251 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 10 from LS+wenet, 31 from Vox, 27 fro AS 2024-08-15 17:29:26,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3289720.0, ans=0.125 2024-08-15 17:29:28,116 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-15 17:29:39,396 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.736e+01 2.314e+01 2.537e+01 2.890e+01 1.648e+02, threshold=5.074e+01, percent-clipped=2.0 2024-08-15 17:29:44,758 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 23 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-15 17:29:59,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3289920.0, ans=0.07 2024-08-15 17:30:06,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3289920.0, ans=0.0 2024-08-15 17:30:08,852 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.61 vs. limit=15.0 2024-08-15 17:30:10,377 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.86 vs. limit=22.5 2024-08-15 17:30:19,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=3290020.0, ans=0.5 2024-08-15 17:30:41,121 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 10200, loss[loss=0.09774, beats_loss=0.01034, ecapa_loss=0.0001569, whisper_loss=0.08583, over 22747.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0104, ecapa_loss=0.0001511, whisper_loss=0.09173, over 3886610.58 frames. ], batch size: 91, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:30:43,057 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3290220.0, ans=0.125 2024-08-15 17:30:59,612 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3290320.0, ans=0.125 2024-08-15 17:31:34,666 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 34 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-15 17:31:38,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3290520.0, ans=0.125 2024-08-15 17:31:44,091 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3290520.0, ans=0.1 2024-08-15 17:31:52,532 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3290620.0, ans=0.0 2024-08-15 17:32:05,219 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 10250, loss[loss=0.09082, beats_loss=0.009522, ecapa_loss=0.0001898, whisper_loss=0.0794, over 18580.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01039, ecapa_loss=0.0001504, whisper_loss=0.09197, over 3871434.87 frames. ], batch size: 81, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:32:12,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3290720.0, ans=0.125 2024-08-15 17:32:15,594 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 11 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-15 17:32:16,330 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.85 vs. limit=15.0 2024-08-15 17:32:25,135 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.623e+01 2.349e+01 2.551e+01 2.894e+01 3.006e+02, threshold=5.101e+01, percent-clipped=3.0 2024-08-15 17:33:02,816 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 36 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-15 17:33:20,397 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3291120.0, ans=0.1 2024-08-15 17:33:25,780 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=2.844e-02 2024-08-15 17:33:26,880 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 10300, loss[loss=0.1084, beats_loss=0.01001, ecapa_loss=0.0001495, whisper_loss=0.09688, over 21255.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01048, ecapa_loss=0.0001497, whisper_loss=0.09136, over 3888492.26 frames. ], batch size: 86, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:33:37,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3291220.0, ans=0.04949747468305833 2024-08-15 17:33:44,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3291320.0, ans=0.0 2024-08-15 17:33:50,291 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3291320.0, ans=0.2 2024-08-15 17:33:58,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3291420.0, ans=10.0 2024-08-15 17:34:16,568 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 16 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-15 17:34:43,156 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 10350, loss[loss=0.09105, beats_loss=0.009687, ecapa_loss=0.0001405, whisper_loss=0.07996, over 18517.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01056, ecapa_loss=0.0001501, whisper_loss=0.09016, over 3911374.56 frames. ], batch size: 74, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:34:54,886 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3291720.0, ans=0.125 2024-08-15 17:35:02,263 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.326e+01 2.650e+01 2.958e+01 2.904e+02, threshold=5.299e+01, percent-clipped=2.0 2024-08-15 17:35:23,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3291920.0, ans=0.0 2024-08-15 17:35:29,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3292020.0, ans=0.0 2024-08-15 17:35:32,145 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3292020.0, ans=0.0 2024-08-15 17:35:42,718 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 24 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-15 17:36:00,800 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 10400, loss[loss=0.09843, beats_loss=0.009984, ecapa_loss=0.0001518, whisper_loss=0.08693, over 14859.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01053, ecapa_loss=0.00015, whisper_loss=0.09044, over 3897110.82 frames. ], batch size: 58, lr: 2.69e-03, grad_scale: 1.152921504606847e+18 2024-08-15 17:36:04,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3292220.0, ans=0.2 2024-08-15 17:36:22,062 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 17 from LS+wenet, 24 from Vox, 50 fro AS 2024-08-15 17:36:39,509 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 23 from LS+wenet, 32 from Vox, 38 fro AS 2024-08-15 17:36:51,084 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=3292520.0, ans=0.95 2024-08-15 17:36:52,515 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-15 17:36:52,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3292520.0, ans=0.125 2024-08-15 17:37:09,381 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.02 vs. limit=15.0 2024-08-15 17:37:13,064 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-15 17:37:14,066 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 10450, loss[loss=0.09367, beats_loss=0.01172, ecapa_loss=0.0001179, whisper_loss=0.08078, over 22503.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01048, ecapa_loss=0.0001505, whisper_loss=0.09035, over 3893137.29 frames. ], batch size: 90, lr: 2.69e-03, grad_scale: 1.152921504606847e+18 2024-08-15 17:37:31,375 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.718e+01 2.170e+01 2.524e+01 2.860e+01 1.816e+02, threshold=5.048e+01, percent-clipped=1.0 2024-08-15 17:37:34,741 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 26 from LS+wenet, 21 from Vox, 17 fro AS 2024-08-15 17:37:58,522 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 27 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-15 17:38:13,410 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 20 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-15 17:38:21,161 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.52 vs. limit=6.0 2024-08-15 17:38:22,211 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 19 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-15 17:38:28,088 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 10500, loss[loss=0.09519, beats_loss=0.009821, ecapa_loss=0.0001594, whisper_loss=0.08377, over 14232.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01043, ecapa_loss=0.0001512, whisper_loss=0.09053, over 3888397.20 frames. ], batch size: 57, lr: 2.69e-03, grad_scale: 1.152921504606847e+18 2024-08-15 17:38:59,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3293420.0, ans=0.0 2024-08-15 17:39:14,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3293520.0, ans=0.0 2024-08-15 17:39:15,710 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 17:39:19,064 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.42 vs. limit=15.0 2024-08-15 17:39:19,962 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3293520.0, ans=0.125 2024-08-15 17:39:23,869 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 14 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-15 17:39:30,997 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 35 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-15 17:39:33,013 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3293620.0, ans=0.125 2024-08-15 17:39:33,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3293620.0, ans=0.0 2024-08-15 17:39:42,567 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 10550, loss[loss=0.101, beats_loss=0.01047, ecapa_loss=0.0001499, whisper_loss=0.08901, over 17377.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0105, ecapa_loss=0.0001503, whisper_loss=0.09004, over 3851228.18 frames. ], batch size: 69, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:39:49,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3293720.0, ans=0.125 2024-08-15 17:39:53,002 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3293720.0, ans=0.0 2024-08-15 17:40:01,129 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.311e+01 2.619e+01 2.877e+01 4.261e+01, threshold=5.238e+01, percent-clipped=0.0 2024-08-15 17:40:09,668 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 22 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-15 17:40:11,620 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-15 17:40:14,901 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.91 vs. limit=22.5 2024-08-15 17:40:20,074 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 26 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-15 17:40:24,727 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 20 from LS+wenet, 31 from Vox, 40 fro AS 2024-08-15 17:40:53,435 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3294220.0, ans=0.0 2024-08-15 17:40:54,214 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 10600, loss[loss=0.08253, beats_loss=0.01222, ecapa_loss=0.0001401, whisper_loss=0.06891, over 17886.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01053, ecapa_loss=0.0001502, whisper_loss=0.08994, over 3844457.77 frames. ], batch size: 70, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:40:57,920 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3294220.0, ans=0.125 2024-08-15 17:41:00,487 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 33 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-15 17:41:07,611 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 23 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-15 17:41:52,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3294620.0, ans=0.2 2024-08-15 17:42:02,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3294620.0, ans=0.125 2024-08-15 17:42:06,136 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 10650, loss[loss=0.1008, beats_loss=0.01161, ecapa_loss=0.0001573, whisper_loss=0.08766, over 20423.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01056, ecapa_loss=0.0001491, whisper_loss=0.0897, over 3856138.19 frames. ], batch size: 83, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:42:06,781 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3294720.0, ans=0.125 2024-08-15 17:42:20,899 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 31 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-15 17:42:24,882 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.761e+01 2.386e+01 2.620e+01 3.005e+01 5.015e+01, threshold=5.240e+01, percent-clipped=0.0 2024-08-15 17:42:26,758 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 28 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-15 17:42:38,486 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.54 vs. limit=15.0 2024-08-15 17:42:39,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3294920.0, ans=0.09899494936611666 2024-08-15 17:42:41,267 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3294920.0, ans=0.125 2024-08-15 17:42:55,493 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 21 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-15 17:43:03,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3295120.0, ans=0.2 2024-08-15 17:43:04,859 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3295120.0, ans=0.0 2024-08-15 17:43:10,657 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3295120.0, ans=0.125 2024-08-15 17:43:18,213 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 25 from LS+wenet, 10 from Vox, 28 fro AS 2024-08-15 17:43:19,816 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 10700, loss[loss=0.1163, beats_loss=0.01095, ecapa_loss=0.0001153, whisper_loss=0.1042, over 16247.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01049, ecapa_loss=0.0001487, whisper_loss=0.09064, over 3867712.29 frames. ], batch size: 63, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:43:42,324 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 25 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-15 17:43:58,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3295420.0, ans=0.125 2024-08-15 17:44:01,707 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3295420.0, ans=0.125 2024-08-15 17:44:05,703 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 24 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-15 17:44:27,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3295620.0, ans=0.1 2024-08-15 17:44:27,645 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.86 vs. limit=12.0 2024-08-15 17:44:31,435 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 23 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-15 17:44:32,594 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 10750, loss[loss=0.1152, beats_loss=0.007596, ecapa_loss=0.0001415, whisper_loss=0.1062, over 15070.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01047, ecapa_loss=0.0001487, whisper_loss=0.0915, over 3872088.14 frames. ], batch size: 58, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:44:39,966 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-15 17:44:43,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3295720.0, ans=0.0 2024-08-15 17:44:50,768 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.745e+01 2.313e+01 2.649e+01 2.924e+01 4.383e+01, threshold=5.299e+01, percent-clipped=0.0 2024-08-15 17:45:09,712 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.95 vs. limit=15.0 2024-08-15 17:45:42,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3296120.0, ans=0.0 2024-08-15 17:45:44,504 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 10800, loss[loss=0.1218, beats_loss=0.008988, ecapa_loss=0.0001375, whisper_loss=0.1114, over 22175.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01046, ecapa_loss=0.0001489, whisper_loss=0.09236, over 3888302.56 frames. ], batch size: 85, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:45:45,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff3.min_abs, batch_count=3296220.0, ans=0.2 2024-08-15 17:46:22,360 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.17 vs. limit=15.0 2024-08-15 17:46:37,620 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-15 17:46:43,779 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.63 vs. limit=15.0 2024-08-15 17:46:44,636 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 20 from LS+wenet, 29 from Vox, 24 fro AS 2024-08-15 17:46:53,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3296620.0, ans=0.125 2024-08-15 17:46:54,316 WARNING [optim.py:496] (0/4) Scaling gradients by 0.04891674965620041, model_norm_threshold=52.98820877075195 2024-08-15 17:46:54,506 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.34, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.961e+05, grad_sumsq=3.961e+05, orig_rms_sq=1.000e+00 2024-08-15 17:46:55,796 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 10850, loss[loss=0.07961, beats_loss=0.01031, ecapa_loss=0.0001852, whisper_loss=0.06745, over 16922.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.0106, ecapa_loss=0.0001482, whisper_loss=0.09167, over 3883577.28 frames. ], batch size: 73, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:46:58,714 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 24 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-15 17:47:06,184 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.51 vs. limit=22.5 2024-08-15 17:47:13,811 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 2.310e+01 2.561e+01 2.919e+01 1.083e+03, threshold=5.121e+01, percent-clipped=1.0 2024-08-15 17:47:33,252 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-15 17:48:07,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3297220.0, ans=0.2 2024-08-15 17:48:08,175 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 10900, loss[loss=0.1147, beats_loss=0.00611, ecapa_loss=0.0001853, whisper_loss=0.1068, over 17477.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01059, ecapa_loss=0.0001485, whisper_loss=0.09214, over 3907148.44 frames. ], batch size: 70, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:48:49,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3297420.0, ans=0.125 2024-08-15 17:49:02,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3297520.0, ans=0.0 2024-08-15 17:49:04,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3297620.0, ans=0.125 2024-08-15 17:49:20,710 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 10950, loss[loss=0.1054, beats_loss=0.009521, ecapa_loss=0.0001342, whisper_loss=0.09455, over 22263.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01063, ecapa_loss=0.0001473, whisper_loss=0.09169, over 3901614.27 frames. ], batch size: 87, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 17:49:21,243 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3297720.0, ans=0.0 2024-08-15 17:49:32,242 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3297720.0, ans=0.125 2024-08-15 17:49:35,116 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3297820.0, ans=0.2 2024-08-15 17:49:40,224 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.851e+01 2.372e+01 2.632e+01 3.011e+01 4.357e+01, threshold=5.265e+01, percent-clipped=0.0 2024-08-15 17:49:49,776 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.84 vs. limit=10.0 2024-08-15 17:50:05,232 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3298020.0, ans=0.125 2024-08-15 17:50:05,302 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3298020.0, ans=0.125 2024-08-15 17:50:06,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3298020.0, ans=0.0 2024-08-15 17:50:31,774 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 11000, loss[loss=0.1069, beats_loss=0.009658, ecapa_loss=0.0001502, whisper_loss=0.09571, over 18162.00 frames. ], tot_loss[loss=0.104, beats_loss=0.0106, ecapa_loss=0.000148, whisper_loss=0.09194, over 3939060.40 frames. ], batch size: 67, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 17:50:33,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3298220.0, ans=0.125 2024-08-15 17:50:36,738 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3298220.0, ans=0.0 2024-08-15 17:50:41,242 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3298220.0, ans=0.125 2024-08-15 17:50:50,491 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 23 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-15 17:50:50,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3298320.0, ans=0.0 2024-08-15 17:51:00,730 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 18 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-15 17:51:00,854 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3298420.0, ans=0.0 2024-08-15 17:51:26,470 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-15 17:51:41,965 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 11050, loss[loss=0.09081, beats_loss=0.0106, ecapa_loss=0.0001415, whisper_loss=0.07879, over 15090.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01051, ecapa_loss=0.000149, whisper_loss=0.09212, over 3933971.87 frames. ], batch size: 58, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 17:51:45,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3298720.0, ans=0.125 2024-08-15 17:52:03,246 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.406e+01 2.620e+01 2.867e+01 4.013e+01, threshold=5.239e+01, percent-clipped=0.0 2024-08-15 17:52:11,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3298920.0, ans=0.2 2024-08-15 17:52:11,185 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.98 vs. limit=15.0 2024-08-15 17:52:20,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3298920.0, ans=0.125 2024-08-15 17:52:24,047 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.44 vs. limit=15.0 2024-08-15 17:52:36,937 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.80 vs. limit=15.0 2024-08-15 17:52:44,976 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3299120.0, ans=0.2 2024-08-15 17:52:54,532 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 11100, loss[loss=0.1035, beats_loss=0.01151, ecapa_loss=0.0001219, whisper_loss=0.09075, over 20540.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01055, ecapa_loss=0.0001478, whisper_loss=0.09214, over 3919472.14 frames. ], batch size: 79, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 17:52:57,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3299220.0, ans=0.95 2024-08-15 17:53:18,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3299320.0, ans=0.125 2024-08-15 17:53:24,698 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3299420.0, ans=0.125 2024-08-15 17:53:27,569 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-15 17:53:47,762 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 27 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-15 17:53:59,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3299620.0, ans=0.2 2024-08-15 17:54:08,313 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 11150, loss[loss=0.1006, beats_loss=0.01202, ecapa_loss=0.0001792, whisper_loss=0.08682, over 22506.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.0105, ecapa_loss=0.0001482, whisper_loss=0.09243, over 3900837.00 frames. ], batch size: 95, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 17:54:20,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3299720.0, ans=0.0 2024-08-15 17:54:28,195 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.377e+01 2.635e+01 3.031e+01 4.135e+01, threshold=5.270e+01, percent-clipped=0.0 2024-08-15 17:54:38,402 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 22 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-15 17:54:43,146 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3299920.0, ans=0.125 2024-08-15 17:54:49,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3299920.0, ans=0.125 2024-08-15 17:55:04,724 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3300120.0, ans=0.125 2024-08-15 17:55:06,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3300120.0, ans=0.125 2024-08-15 17:55:11,688 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 35 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-15 17:55:16,026 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3300120.0, ans=0.0 2024-08-15 17:55:20,137 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 11200, loss[loss=0.0782, beats_loss=0.01151, ecapa_loss=0.0001543, whisper_loss=0.06515, over 14421.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01047, ecapa_loss=0.0001466, whisper_loss=0.09251, over 3919087.11 frames. ], batch size: 58, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 17:55:23,018 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 17 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-15 17:55:42,720 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 15 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-15 17:56:20,424 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-15 17:56:20,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3300620.0, ans=0.125 2024-08-15 17:56:33,536 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 11250, loss[loss=0.07843, beats_loss=0.01187, ecapa_loss=0.0001449, whisper_loss=0.06512, over 21400.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01048, ecapa_loss=0.0001479, whisper_loss=0.09206, over 3901251.75 frames. ], batch size: 85, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 17:56:35,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3300720.0, ans=0.05 2024-08-15 17:56:53,246 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.294e+01 2.493e+01 2.758e+01 4.504e+01, threshold=4.985e+01, percent-clipped=0.0 2024-08-15 17:57:02,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3300920.0, ans=0.0 2024-08-15 17:57:06,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3300920.0, ans=0.0 2024-08-15 17:57:14,706 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.32 vs. limit=15.0 2024-08-15 17:57:18,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3301020.0, ans=0.125 2024-08-15 17:57:21,242 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 24 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-15 17:57:26,216 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2024-08-15 17:57:28,454 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3301020.0, ans=0.125 2024-08-15 17:57:43,612 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-15 17:57:44,728 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 11300, loss[loss=0.09211, beats_loss=0.01053, ecapa_loss=0.0001184, whisper_loss=0.08039, over 16750.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01045, ecapa_loss=0.0001469, whisper_loss=0.09197, over 3888564.42 frames. ], batch size: 64, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 17:57:46,263 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-15 17:58:21,356 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3301420.0, ans=0.1 2024-08-15 17:58:30,725 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.46 vs. limit=22.5 2024-08-15 17:58:33,105 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 21 from LS+wenet, 28 from Vox, 44 fro AS 2024-08-15 17:58:40,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3301520.0, ans=0.0 2024-08-15 17:58:53,978 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.66 vs. limit=12.0 2024-08-15 17:58:57,320 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 11350, loss[loss=0.09396, beats_loss=0.01047, ecapa_loss=0.0001453, whisper_loss=0.08204, over 17860.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01046, ecapa_loss=0.0001485, whisper_loss=0.09087, over 3893464.92 frames. ], batch size: 69, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 17:59:17,738 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.884e+01 2.304e+01 2.607e+01 2.921e+01 2.640e+02, threshold=5.213e+01, percent-clipped=2.0 2024-08-15 17:59:22,576 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 20 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-15 17:59:37,542 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-15 17:59:44,772 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-15 17:59:56,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3302120.0, ans=0.0 2024-08-15 17:59:57,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3302120.0, ans=0.0 2024-08-15 18:00:06,664 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 16 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-15 18:00:11,384 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 11400, loss[loss=0.09424, beats_loss=0.01137, ecapa_loss=0.0001165, whisper_loss=0.0817, over 19392.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01046, ecapa_loss=0.0001479, whisper_loss=0.09148, over 3919255.57 frames. ], batch size: 77, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:00:30,191 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=15.50 vs. limit=15.0 2024-08-15 18:00:32,733 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-15 18:00:44,763 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 14 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-15 18:00:52,337 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 28 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-15 18:01:12,444 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3302620.0, ans=0.025 2024-08-15 18:01:19,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3302620.0, ans=0.125 2024-08-15 18:01:25,253 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-15 18:01:26,343 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 11450, loss[loss=0.1144, beats_loss=0.009906, ecapa_loss=0.0001526, whisper_loss=0.1029, over 22388.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0105, ecapa_loss=0.000149, whisper_loss=0.09138, over 3911907.57 frames. ], batch size: 87, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:01:46,557 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.784e+01 2.318e+01 2.537e+01 2.814e+01 4.367e+01, threshold=5.074e+01, percent-clipped=0.0 2024-08-15 18:02:01,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3302920.0, ans=0.05 2024-08-15 18:02:03,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3302920.0, ans=0.1 2024-08-15 18:02:09,011 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.31 vs. limit=15.0 2024-08-15 18:02:09,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3303020.0, ans=0.125 2024-08-15 18:02:35,056 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.17 vs. limit=10.0 2024-08-15 18:02:39,004 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 11500, loss[loss=0.09892, beats_loss=0.0121, ecapa_loss=0.0001269, whisper_loss=0.08555, over 17927.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01052, ecapa_loss=0.0001493, whisper_loss=0.09103, over 3907753.30 frames. ], batch size: 70, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:02:39,623 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3303220.0, ans=0.0 2024-08-15 18:02:54,426 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 25 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-15 18:02:54,801 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3303320.0, ans=0.125 2024-08-15 18:03:16,571 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3303420.0, ans=0.2 2024-08-15 18:03:19,303 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-15 18:03:24,336 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3303520.0, ans=0.125 2024-08-15 18:03:25,397 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 25 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-15 18:03:27,149 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3303520.0, ans=0.125 2024-08-15 18:03:36,881 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 21 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-15 18:03:39,620 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 15 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-15 18:03:45,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3303620.0, ans=0.1 2024-08-15 18:03:52,772 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 11550, loss[loss=0.1019, beats_loss=0.0114, ecapa_loss=0.0001591, whisper_loss=0.08895, over 15678.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0105, ecapa_loss=0.000149, whisper_loss=0.09113, over 3896055.97 frames. ], batch size: 64, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:03:53,026 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-15 18:04:12,852 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.935e+01 2.433e+01 2.629e+01 2.861e+01 8.078e+01, threshold=5.258e+01, percent-clipped=1.0 2024-08-15 18:04:37,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3304020.0, ans=0.0 2024-08-15 18:04:55,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3304120.0, ans=0.125 2024-08-15 18:05:08,397 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 11600, loss[loss=0.08117, beats_loss=0.01397, ecapa_loss=0.000119, whisper_loss=0.06601, over 15478.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01059, ecapa_loss=0.0001478, whisper_loss=0.09079, over 3931195.82 frames. ], batch size: 64, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:05:10,079 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 16 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-15 18:05:14,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3304220.0, ans=0.125 2024-08-15 18:05:22,382 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-15 18:05:28,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3304320.0, ans=0.0 2024-08-15 18:05:30,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3304320.0, ans=0.125 2024-08-15 18:05:41,326 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 24 from LS+wenet, 16 from Vox, 47 fro AS 2024-08-15 18:05:49,251 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.25 vs. limit=15.0 2024-08-15 18:05:51,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3304520.0, ans=0.0 2024-08-15 18:05:55,302 WARNING [optim.py:496] (0/4) Scaling gradients by 0.0450630709528923, model_norm_threshold=52.58251190185547 2024-08-15 18:05:55,497 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.906e+05, grad_sumsq=1.906e+05, orig_rms_sq=1.000e+00 2024-08-15 18:05:58,075 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.82 vs. limit=15.0 2024-08-15 18:06:03,559 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.98 vs. limit=15.0 2024-08-15 18:06:09,220 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3304620.0, ans=0.0 2024-08-15 18:06:10,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3304620.0, ans=0.125 2024-08-15 18:06:20,083 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 11650, loss[loss=0.1051, beats_loss=0.009697, ecapa_loss=0.0001458, whisper_loss=0.09394, over 15867.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01057, ecapa_loss=0.0001483, whisper_loss=0.09056, over 3924776.47 frames. ], batch size: 61, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:06:20,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3304720.0, ans=0.0 2024-08-15 18:06:24,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3304720.0, ans=0.0 2024-08-15 18:06:32,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3304720.0, ans=0.0 2024-08-15 18:06:36,525 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-15 18:06:38,472 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3304820.0, ans=0.125 2024-08-15 18:06:40,693 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.450e+01 2.772e+01 2.999e+01 1.167e+03, threshold=5.544e+01, percent-clipped=1.0 2024-08-15 18:06:59,482 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 13 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-15 18:07:04,178 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.51 vs. limit=15.0 2024-08-15 18:07:12,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3305020.0, ans=0.125 2024-08-15 18:07:22,070 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 25 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-15 18:07:25,196 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.67 vs. limit=22.5 2024-08-15 18:07:29,234 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.71 vs. limit=22.5 2024-08-15 18:07:31,354 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 11700, loss[loss=0.08812, beats_loss=0.01126, ecapa_loss=0.0001632, whisper_loss=0.07523, over 21423.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01071, ecapa_loss=0.0001476, whisper_loss=0.0901, over 3923841.01 frames. ], batch size: 88, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:07:35,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3305220.0, ans=0.025 2024-08-15 18:07:38,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3305220.0, ans=0.0 2024-08-15 18:07:41,370 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 25 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-15 18:07:44,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3305320.0, ans=0.035 2024-08-15 18:07:45,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3305320.0, ans=0.1 2024-08-15 18:07:55,770 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3305320.0, ans=0.125 2024-08-15 18:07:56,854 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 9 from LS+wenet, 28 from Vox, 23 fro AS 2024-08-15 18:07:57,137 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3305320.0, ans=0.125 2024-08-15 18:08:08,031 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.72 vs. limit=15.0 2024-08-15 18:08:21,806 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 27 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-15 18:08:26,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3305520.0, ans=0.0 2024-08-15 18:08:34,743 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 18 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-15 18:08:43,485 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 11750, loss[loss=0.1005, beats_loss=0.01294, ecapa_loss=0.0001577, whisper_loss=0.08603, over 22201.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01079, ecapa_loss=0.0001475, whisper_loss=0.09028, over 3942623.27 frames. ], batch size: 94, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:08:44,053 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3305720.0, ans=0.025 2024-08-15 18:08:46,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3305720.0, ans=0.0 2024-08-15 18:08:52,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3305720.0, ans=0.125 2024-08-15 18:09:03,519 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.784e+01 2.291e+01 2.526e+01 2.838e+01 3.948e+01, threshold=5.052e+01, percent-clipped=0.0 2024-08-15 18:09:08,335 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 23 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-15 18:09:27,772 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.41 vs. limit=15.0 2024-08-15 18:09:34,909 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3306020.0, ans=0.125 2024-08-15 18:09:36,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3306020.0, ans=0.125 2024-08-15 18:09:37,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3306020.0, ans=0.125 2024-08-15 18:09:40,870 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.68 vs. limit=15.0 2024-08-15 18:09:41,537 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 30 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-15 18:09:55,761 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 11800, loss[loss=0.09146, beats_loss=0.01309, ecapa_loss=0.0001558, whisper_loss=0.07681, over 21681.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01071, ecapa_loss=0.0001477, whisper_loss=0.09134, over 3943089.14 frames. ], batch size: 93, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:10:20,652 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 14 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-15 18:10:20,963 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3306320.0, ans=0.2 2024-08-15 18:10:27,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3306420.0, ans=0.2 2024-08-15 18:10:44,897 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 35 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-15 18:11:08,374 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 11850, loss[loss=0.08899, beats_loss=0.009189, ecapa_loss=0.0001988, whisper_loss=0.07781, over 18204.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01065, ecapa_loss=0.0001479, whisper_loss=0.09128, over 3945416.87 frames. ], batch size: 79, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:11:23,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3306820.0, ans=10.0 2024-08-15 18:11:28,385 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.920e+01 2.292e+01 2.620e+01 2.942e+01 3.993e+01, threshold=5.240e+01, percent-clipped=0.0 2024-08-15 18:11:28,848 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3306820.0, ans=0.125 2024-08-15 18:11:33,435 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=8.514e+00 2024-08-15 18:11:40,440 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 18:11:48,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3306920.0, ans=0.1 2024-08-15 18:12:05,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3307120.0, ans=0.1 2024-08-15 18:12:09,093 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3307120.0, ans=0.125 2024-08-15 18:12:14,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3307120.0, ans=0.125 2024-08-15 18:12:20,220 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 11900, loss[loss=0.09564, beats_loss=0.01128, ecapa_loss=0.0001114, whisper_loss=0.08325, over 15100.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.0107, ecapa_loss=0.0001472, whisper_loss=0.09133, over 3964249.72 frames. ], batch size: 57, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:12:24,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3307220.0, ans=0.125 2024-08-15 18:12:26,544 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-15 18:12:52,199 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-15 18:13:10,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3307520.0, ans=0.0 2024-08-15 18:13:11,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3307520.0, ans=0.125 2024-08-15 18:13:17,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3307620.0, ans=0.0 2024-08-15 18:13:21,389 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 18 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-15 18:13:23,229 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3307620.0, ans=0.125 2024-08-15 18:13:33,016 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 11950, loss[loss=0.1296, beats_loss=0.008572, ecapa_loss=0.0001414, whisper_loss=0.1196, over 22985.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01071, ecapa_loss=0.0001485, whisper_loss=0.09103, over 3931102.83 frames. ], batch size: 88, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:13:36,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3307720.0, ans=0.2 2024-08-15 18:13:36,772 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.34 vs. limit=15.0 2024-08-15 18:13:39,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.out_whiten.whitening_limit, batch_count=3307720.0, ans=8.0 2024-08-15 18:13:40,411 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3307720.0, ans=0.125 2024-08-15 18:13:50,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3307820.0, ans=0.125 2024-08-15 18:13:52,455 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.646e+01 2.261e+01 2.658e+01 2.941e+01 1.221e+02, threshold=5.315e+01, percent-clipped=2.0 2024-08-15 18:13:58,801 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=5.067e-02 2024-08-15 18:14:01,234 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-15 18:14:11,406 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 21 from LS+wenet, 25 from Vox, 48 fro AS 2024-08-15 18:14:18,496 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 21 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-15 18:14:19,128 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.38 vs. limit=15.0 2024-08-15 18:14:21,488 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3308020.0, ans=0.0 2024-08-15 18:14:37,338 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 24 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-15 18:14:44,629 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 12000, loss[loss=0.08636, beats_loss=0.008714, ecapa_loss=0.0001801, whisper_loss=0.07584, over 16851.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01074, ecapa_loss=0.0001489, whisper_loss=0.09041, over 3930843.46 frames. ], batch size: 72, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:14:44,631 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-15 18:15:14,876 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.0342, 3.4666, 3.2165, 3.2912], device='cuda:0') 2024-08-15 18:15:24,351 INFO [train_multi_KD3.py:1149] (0/4) Epoch 23, validation on ASR_libri: loss=0.2516, beats_loss=0, ecapa_loss=0.0005315, whisper_loss=0.2463, over 922467.00 frames. 2024-08-15 18:15:43,510 INFO [train_multi_KD3.py:1149] (0/4) Epoch 23, validation on SV_voxceleb1: loss=0.004172, beats_loss=0, ecapa_loss=0.0004172, whisper_loss=0, over 939242.00 frames. 2024-08-15 18:17:41,578 INFO [train_multi_KD3.py:1149] (0/4) Epoch 23, validation on AT_audioset: loss=0.02323, beats_loss=0.02323, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 18:17:41,583 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32601MB 2024-08-15 18:17:43,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3308220.0, ans=0.1 2024-08-15 18:18:02,885 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-15 18:18:12,941 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 31 from Vox, 31 fro AS 2024-08-15 18:18:19,270 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3308420.0, ans=0.125 2024-08-15 18:18:32,399 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 18 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-15 18:18:50,080 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 15 from LS+wenet, 11 from Vox, 36 fro AS 2024-08-15 18:18:55,721 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 12050, loss[loss=0.1193, beats_loss=0.01106, ecapa_loss=0.0001336, whisper_loss=0.1069, over 19346.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01072, ecapa_loss=0.0001491, whisper_loss=0.09047, over 3940010.39 frames. ], batch size: 75, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:19:15,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3308820.0, ans=0.125 2024-08-15 18:19:16,702 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.927e+01 2.282e+01 2.556e+01 2.851e+01 3.972e+01, threshold=5.113e+01, percent-clipped=0.0 2024-08-15 18:19:21,255 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-15 18:19:23,217 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3308820.0, ans=0.125 2024-08-15 18:19:31,593 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3308920.0, ans=0.125 2024-08-15 18:19:33,102 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 19 from LS+wenet, 29 from Vox, 27 fro AS 2024-08-15 18:20:09,498 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=3309220.0, ans=0.025 2024-08-15 18:20:10,320 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 12100, loss[loss=0.1031, beats_loss=0.009618, ecapa_loss=0.0001476, whisper_loss=0.09202, over 23348.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01064, ecapa_loss=0.000149, whisper_loss=0.09048, over 3909869.68 frames. ], batch size: 93, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:20:13,762 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3309220.0, ans=0.2 2024-08-15 18:20:16,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3309220.0, ans=0.1 2024-08-15 18:20:17,149 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.09 vs. limit=22.5 2024-08-15 18:20:18,470 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.61 vs. limit=22.5 2024-08-15 18:20:23,571 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-15 18:20:29,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3309320.0, ans=0.1 2024-08-15 18:20:29,442 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3309320.0, ans=0.125 2024-08-15 18:20:37,276 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.94 vs. limit=22.5 2024-08-15 18:20:54,865 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 28 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-15 18:21:08,077 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 14 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-15 18:21:14,648 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3309620.0, ans=0.0 2024-08-15 18:21:17,496 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 20 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-15 18:21:17,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3309620.0, ans=0.1 2024-08-15 18:21:23,561 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 36 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-15 18:21:24,703 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 12150, loss[loss=0.1209, beats_loss=0.006608, ecapa_loss=0.0001757, whisper_loss=0.1125, over 22412.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01067, ecapa_loss=0.0001504, whisper_loss=0.08981, over 3909165.37 frames. ], batch size: 89, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:21:31,325 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 20 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-15 18:21:46,289 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.597e+01 2.187e+01 2.499e+01 2.897e+01 4.006e+01, threshold=4.998e+01, percent-clipped=0.0 2024-08-15 18:21:49,831 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3309820.0, ans=0.0 2024-08-15 18:22:04,371 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-15 18:22:09,510 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.96 vs. limit=15.0 2024-08-15 18:22:22,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3310020.0, ans=0.2 2024-08-15 18:22:27,621 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.13 vs. limit=15.0 2024-08-15 18:22:40,000 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 12200, loss[loss=0.09326, beats_loss=0.01276, ecapa_loss=0.0001149, whisper_loss=0.07934, over 19175.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01062, ecapa_loss=0.0001499, whisper_loss=0.09016, over 3902056.02 frames. ], batch size: 74, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:22:40,350 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 19 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-15 18:23:07,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3310320.0, ans=0.125 2024-08-15 18:23:19,335 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3310420.0, ans=0.125 2024-08-15 18:23:22,055 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 21 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-15 18:23:51,208 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=16.13 vs. limit=15.0 2024-08-15 18:23:53,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3310720.0, ans=0.0 2024-08-15 18:23:54,469 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 12250, loss[loss=0.1112, beats_loss=0.0103, ecapa_loss=0.0001277, whisper_loss=0.09958, over 19348.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0105, ecapa_loss=0.0001497, whisper_loss=0.09073, over 3883997.19 frames. ], batch size: 73, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:24:15,437 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.727e+01 2.354e+01 2.587e+01 2.883e+01 9.186e+01, threshold=5.174e+01, percent-clipped=2.0 2024-08-15 18:24:15,653 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-15 18:24:17,660 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.19 vs. limit=15.0 2024-08-15 18:25:08,683 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 12300, loss[loss=0.09814, beats_loss=0.01654, ecapa_loss=0.0001611, whisper_loss=0.07999, over 21823.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01059, ecapa_loss=0.00015, whisper_loss=0.09079, over 3888264.95 frames. ], batch size: 92, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:25:10,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3311220.0, ans=0.0 2024-08-15 18:25:47,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3311420.0, ans=0.125 2024-08-15 18:25:58,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3311520.0, ans=0.125 2024-08-15 18:26:24,217 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 12350, loss[loss=0.1112, beats_loss=0.009678, ecapa_loss=0.0001308, whisper_loss=0.1002, over 18561.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01062, ecapa_loss=0.0001497, whisper_loss=0.09048, over 3847340.68 frames. ], batch size: 72, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:26:27,694 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 18 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-15 18:26:40,290 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.63 vs. limit=15.0 2024-08-15 18:26:41,213 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=3311820.0, ans=0.025 2024-08-15 18:26:44,903 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.020e+01 2.423e+01 2.680e+01 3.098e+01 2.023e+02, threshold=5.359e+01, percent-clipped=1.0 2024-08-15 18:26:46,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3311820.0, ans=0.0 2024-08-15 18:27:04,292 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 19 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-15 18:27:24,126 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 32 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-15 18:27:38,589 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 12400, loss[loss=0.1191, beats_loss=0.01042, ecapa_loss=0.0001443, whisper_loss=0.1072, over 22721.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01062, ecapa_loss=0.0001489, whisper_loss=0.09083, over 3871221.99 frames. ], batch size: 90, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:27:56,700 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 15 from LS+wenet, 27 from Vox, 46 fro AS 2024-08-15 18:27:57,572 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.61 vs. limit=22.5 2024-08-15 18:28:26,401 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 21 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-15 18:28:28,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3312520.0, ans=0.015 2024-08-15 18:28:29,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3312520.0, ans=0.125 2024-08-15 18:28:32,953 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.37 vs. limit=22.5 2024-08-15 18:28:37,582 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-15 18:28:52,984 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 12450, loss[loss=0.1202, beats_loss=0.008033, ecapa_loss=0.0002069, whisper_loss=0.1101, over 19087.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01058, ecapa_loss=0.0001505, whisper_loss=0.09056, over 3904979.20 frames. ], batch size: 82, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:28:53,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3312720.0, ans=0.0 2024-08-15 18:28:59,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3312720.0, ans=0.0 2024-08-15 18:29:02,275 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3312720.0, ans=0.125 2024-08-15 18:29:13,849 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.349e+01 2.594e+01 2.911e+01 3.951e+02, threshold=5.187e+01, percent-clipped=3.0 2024-08-15 18:29:20,375 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 18:29:49,781 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 18 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-15 18:29:56,133 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 20 from LS+wenet, 25 from Vox, 46 fro AS 2024-08-15 18:30:05,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3313120.0, ans=0.1 2024-08-15 18:30:07,643 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 12500, loss[loss=0.1099, beats_loss=0.01193, ecapa_loss=0.0001168, whisper_loss=0.09683, over 16606.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01054, ecapa_loss=0.0001496, whisper_loss=0.09098, over 3903117.67 frames. ], batch size: 63, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:30:18,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3313220.0, ans=0.125 2024-08-15 18:30:18,675 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3313220.0, ans=0.0 2024-08-15 18:30:32,448 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.22 vs. limit=6.0 2024-08-15 18:30:41,521 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.31 vs. limit=15.0 2024-08-15 18:30:52,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3313520.0, ans=0.125 2024-08-15 18:31:01,689 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 27 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-15 18:31:03,415 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3313520.0, ans=0.0 2024-08-15 18:31:05,220 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.65 vs. limit=12.0 2024-08-15 18:31:19,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3313620.0, ans=0.125 2024-08-15 18:31:23,183 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 12550, loss[loss=0.07937, beats_loss=0.01153, ecapa_loss=0.0001086, whisper_loss=0.06676, over 17541.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01046, ecapa_loss=0.0001487, whisper_loss=0.09155, over 3938586.83 frames. ], batch size: 66, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:31:44,787 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.895e+01 2.273e+01 2.491e+01 2.694e+01 3.703e+01, threshold=4.981e+01, percent-clipped=0.0 2024-08-15 18:31:52,616 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-15 18:32:27,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3314120.0, ans=0.0 2024-08-15 18:32:39,140 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 12600, loss[loss=0.1156, beats_loss=0.01065, ecapa_loss=0.0001504, whisper_loss=0.1035, over 22705.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01048, ecapa_loss=0.0001473, whisper_loss=0.0919, over 3894086.20 frames. ], batch size: 91, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:32:43,857 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 26 from Vox, 23 fro AS 2024-08-15 18:32:47,232 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.30 vs. limit=22.5 2024-08-15 18:32:58,705 WARNING [optim.py:496] (0/4) Scaling gradients by 0.08482968807220459, model_norm_threshold=49.81049346923828 2024-08-15 18:32:58,898 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.1.norm.log_scale with proportion 0.20, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.887e+04, grad_sumsq=6.887e+04, orig_rms_sq=1.000e+00 2024-08-15 18:33:10,406 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 21 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-15 18:33:10,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3314420.0, ans=0.0 2024-08-15 18:33:18,769 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.40 vs. limit=22.5 2024-08-15 18:33:20,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3314420.0, ans=0.125 2024-08-15 18:33:25,484 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-15 18:33:27,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3314520.0, ans=0.0 2024-08-15 18:33:28,363 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 21 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-15 18:33:31,280 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 26 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-15 18:33:40,252 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 19 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-15 18:33:53,194 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 12650, loss[loss=0.1174, beats_loss=0.01044, ecapa_loss=0.0001264, whisper_loss=0.1057, over 23127.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01067, ecapa_loss=0.0001476, whisper_loss=0.09119, over 3901048.24 frames. ], batch size: 89, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:34:05,263 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3314720.0, ans=0.2 2024-08-15 18:34:13,532 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.371e+01 2.618e+01 2.895e+01 5.872e+02, threshold=5.236e+01, percent-clipped=1.0 2024-08-15 18:34:14,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3314820.0, ans=0.125 2024-08-15 18:34:24,547 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-15 18:34:27,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3314920.0, ans=0.125 2024-08-15 18:34:31,984 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 19 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-15 18:34:40,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3315020.0, ans=0.2 2024-08-15 18:34:48,302 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3315020.0, ans=0.125 2024-08-15 18:35:01,411 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 26 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-15 18:35:07,083 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 12700, loss[loss=0.09834, beats_loss=0.01318, ecapa_loss=0.0001531, whisper_loss=0.08362, over 23323.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01072, ecapa_loss=0.0001481, whisper_loss=0.09104, over 3937915.18 frames. ], batch size: 96, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:35:30,478 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 19 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-15 18:35:40,805 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-15 18:35:57,631 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.83 vs. limit=15.0 2024-08-15 18:35:58,739 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 20 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-15 18:35:59,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3315520.0, ans=0.125 2024-08-15 18:36:10,364 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 12 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-15 18:36:19,718 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 15 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-15 18:36:22,300 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 12750, loss[loss=0.1007, beats_loss=0.01137, ecapa_loss=0.0001562, whisper_loss=0.08776, over 22697.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01068, ecapa_loss=0.0001489, whisper_loss=0.09113, over 3920012.78 frames. ], batch size: 91, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:36:30,239 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-15 18:36:38,953 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 29 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-15 18:36:40,411 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 20 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-15 18:36:43,134 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.744e+01 2.256e+01 2.562e+01 2.837e+01 4.631e+01, threshold=5.124e+01, percent-clipped=0.0 2024-08-15 18:36:43,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3315820.0, ans=0.0 2024-08-15 18:36:53,929 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.94 vs. limit=15.0 2024-08-15 18:37:02,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3315920.0, ans=0.125 2024-08-15 18:37:08,040 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 18 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-15 18:37:22,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3316120.0, ans=0.0 2024-08-15 18:37:32,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3316120.0, ans=0.1 2024-08-15 18:37:36,485 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 12800, loss[loss=0.09442, beats_loss=0.01201, ecapa_loss=0.0001496, whisper_loss=0.08092, over 22647.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0107, ecapa_loss=0.0001488, whisper_loss=0.09087, over 3930452.05 frames. ], batch size: 91, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:37:41,198 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3316220.0, ans=0.125 2024-08-15 18:37:42,597 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3316220.0, ans=0.0 2024-08-15 18:37:47,055 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 24 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-15 18:38:13,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3316420.0, ans=0.1 2024-08-15 18:38:16,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3316420.0, ans=0.2 2024-08-15 18:38:31,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3316520.0, ans=0.125 2024-08-15 18:38:36,717 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3316620.0, ans=0.1 2024-08-15 18:38:41,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3316620.0, ans=0.0 2024-08-15 18:38:43,886 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 24 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-15 18:38:52,371 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 12850, loss[loss=0.1128, beats_loss=0.008831, ecapa_loss=0.0001714, whisper_loss=0.1022, over 18256.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01073, ecapa_loss=0.0001484, whisper_loss=0.09044, over 3911793.04 frames. ], batch size: 70, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:39:05,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3316720.0, ans=0.125 2024-08-15 18:39:05,955 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.81 vs. limit=15.0 2024-08-15 18:39:13,599 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.324e+01 2.629e+01 2.874e+01 4.372e+01, threshold=5.259e+01, percent-clipped=0.0 2024-08-15 18:39:35,331 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.57 vs. limit=22.5 2024-08-15 18:39:36,926 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.53 vs. limit=15.0 2024-08-15 18:39:39,596 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.65 vs. limit=22.5 2024-08-15 18:39:43,548 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-15 18:40:07,008 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 12900, loss[loss=0.1087, beats_loss=0.01174, ecapa_loss=0.0001431, whisper_loss=0.09556, over 24011.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01073, ecapa_loss=0.0001483, whisper_loss=0.08973, over 3874178.82 frames. ], batch size: 94, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:40:08,716 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 19 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-15 18:40:09,058 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3317220.0, ans=0.09899494936611666 2024-08-15 18:40:13,082 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 24 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-15 18:40:28,194 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 34 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-15 18:40:37,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3317420.0, ans=0.125 2024-08-15 18:40:38,916 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.67 vs. limit=15.0 2024-08-15 18:40:44,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=3317420.0, ans=0.1 2024-08-15 18:40:48,981 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 27 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-15 18:41:01,215 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3317520.0, ans=0.2 2024-08-15 18:41:08,116 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 36 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-15 18:41:21,527 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 12950, loss[loss=0.07527, beats_loss=0.01037, ecapa_loss=0.0001552, whisper_loss=0.06335, over 20727.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0106, ecapa_loss=0.0001501, whisper_loss=0.08978, over 3853602.86 frames. ], batch size: 84, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:41:28,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3317720.0, ans=0.2 2024-08-15 18:41:40,941 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.766e+01 2.294e+01 2.551e+01 2.879e+01 4.880e+01, threshold=5.103e+01, percent-clipped=0.0 2024-08-15 18:41:41,892 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.03 vs. limit=12.0 2024-08-15 18:41:42,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3317820.0, ans=0.125 2024-08-15 18:41:47,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3317820.0, ans=0.035 2024-08-15 18:42:08,350 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3318020.0, ans=0.1 2024-08-15 18:42:34,481 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 13000, loss[loss=0.1098, beats_loss=0.009179, ecapa_loss=0.000162, whisper_loss=0.09899, over 21849.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01059, ecapa_loss=0.0001494, whisper_loss=0.09043, over 3881760.05 frames. ], batch size: 86, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:42:44,887 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 30 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-15 18:42:46,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3318220.0, ans=0.1 2024-08-15 18:43:01,083 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 25 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-15 18:43:07,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3318420.0, ans=0.0 2024-08-15 18:43:28,328 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-15 18:43:34,340 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 27 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-15 18:43:39,254 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3318620.0, ans=0.2 2024-08-15 18:43:46,235 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 23 from LS+wenet, 8 from Vox, 28 fro AS 2024-08-15 18:43:47,848 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 25 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-15 18:43:48,942 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 13050, loss[loss=0.1235, beats_loss=0.008537, ecapa_loss=0.0001511, whisper_loss=0.1134, over 16133.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01056, ecapa_loss=0.000149, whisper_loss=0.09092, over 3912094.62 frames. ], batch size: 64, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:43:53,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3318720.0, ans=0.0 2024-08-15 18:44:02,368 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 17 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-15 18:44:08,777 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3318820.0, ans=0.2 2024-08-15 18:44:09,083 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.37 vs. limit=15.0 2024-08-15 18:44:09,587 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.022e+01 2.364e+01 2.592e+01 2.928e+01 7.191e+01, threshold=5.184e+01, percent-clipped=1.0 2024-08-15 18:44:15,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3318820.0, ans=0.125 2024-08-15 18:45:00,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3319220.0, ans=0.0 2024-08-15 18:45:01,399 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.28 vs. limit=15.0 2024-08-15 18:45:01,830 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 13100, loss[loss=0.1166, beats_loss=0.009719, ecapa_loss=0.000149, whisper_loss=0.1054, over 16360.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01052, ecapa_loss=0.000149, whisper_loss=0.0909, over 3926233.22 frames. ], batch size: 67, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:45:04,790 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 23 from LS+wenet, 32 from Vox, 38 fro AS 2024-08-15 18:45:15,829 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 25 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-15 18:45:19,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3319320.0, ans=0.0 2024-08-15 18:45:26,673 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-15 18:45:37,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3319420.0, ans=0.125 2024-08-15 18:45:43,746 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 27 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-15 18:45:51,778 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.99 vs. limit=22.5 2024-08-15 18:46:04,229 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3319620.0, ans=0.125 2024-08-15 18:46:05,835 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 17 from LS+wenet, 28 from Vox, 43 fro AS 2024-08-15 18:46:07,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3319620.0, ans=0.125 2024-08-15 18:46:15,519 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3319620.0, ans=0.125 2024-08-15 18:46:19,565 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 13150, loss[loss=0.1155, beats_loss=0.0114, ecapa_loss=0.0001563, whisper_loss=0.1025, over 21517.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01054, ecapa_loss=0.0001483, whisper_loss=0.09091, over 3876983.27 frames. ], batch size: 89, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:46:24,308 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-15 18:46:24,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3319720.0, ans=0.1 2024-08-15 18:46:28,169 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.53 vs. limit=10.0 2024-08-15 18:46:41,411 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.950e+01 2.374e+01 2.573e+01 2.884e+01 4.147e+01, threshold=5.146e+01, percent-clipped=0.0 2024-08-15 18:46:41,911 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3319820.0, ans=0.125 2024-08-15 18:46:46,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3319820.0, ans=0.1 2024-08-15 18:47:02,985 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-332000.pt 2024-08-15 18:47:29,021 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 20 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-15 18:47:43,747 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 13200, loss[loss=0.0942, beats_loss=0.01133, ecapa_loss=0.0001528, whisper_loss=0.08134, over 14150.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01049, ecapa_loss=0.000148, whisper_loss=0.09073, over 3857855.04 frames. ], batch size: 57, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:47:44,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3320220.0, ans=0.125 2024-08-15 18:47:47,466 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3320220.0, ans=0.2 2024-08-15 18:48:02,179 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-15 18:48:19,083 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-15 18:48:22,370 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.80 vs. limit=15.0 2024-08-15 18:48:24,986 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 18 from LS+wenet, 32 from Vox, 38 fro AS 2024-08-15 18:48:25,249 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3320420.0, ans=0.07 2024-08-15 18:48:28,666 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-15 18:48:28,980 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3320420.0, ans=0.09899494936611666 2024-08-15 18:48:31,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3320420.0, ans=0.0 2024-08-15 18:48:40,493 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3320520.0, ans=0.125 2024-08-15 18:48:57,484 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3320620.0, ans=0.0 2024-08-15 18:49:02,757 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3320620.0, ans=0.125 2024-08-15 18:49:06,931 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 13250, loss[loss=0.09816, beats_loss=0.01267, ecapa_loss=0.0001124, whisper_loss=0.08436, over 22801.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0105, ecapa_loss=0.0001477, whisper_loss=0.0909, over 3878807.77 frames. ], batch size: 88, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:49:17,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3320720.0, ans=0.125 2024-08-15 18:49:18,895 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=4.403e+00 2024-08-15 18:49:30,580 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.935e+01 2.301e+01 2.680e+01 3.189e+01 5.288e+01, threshold=5.359e+01, percent-clipped=1.0 2024-08-15 18:49:32,463 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 22 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-15 18:49:33,878 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 25 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-15 18:49:42,465 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 29 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-15 18:50:03,985 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3321020.0, ans=0.0 2024-08-15 18:50:18,245 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 23 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-15 18:50:22,708 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-15 18:50:29,654 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 13300, loss[loss=0.09996, beats_loss=0.01177, ecapa_loss=0.0001758, whisper_loss=0.08643, over 19388.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01053, ecapa_loss=0.0001469, whisper_loss=0.09036, over 3882497.17 frames. ], batch size: 82, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:50:32,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3321220.0, ans=0.0 2024-08-15 18:50:33,737 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 24 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-15 18:50:35,229 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3321220.0, ans=0.0 2024-08-15 18:50:37,933 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 20 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-15 18:50:38,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3321220.0, ans=0.1 2024-08-15 18:51:02,463 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 22 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-15 18:51:13,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3321420.0, ans=0.1 2024-08-15 18:51:13,767 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3321420.0, ans=0.95 2024-08-15 18:51:22,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3321520.0, ans=0.0 2024-08-15 18:51:43,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3321620.0, ans=0.125 2024-08-15 18:51:55,937 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 13350, loss[loss=0.09926, beats_loss=0.009734, ecapa_loss=0.0001738, whisper_loss=0.08779, over 22354.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01046, ecapa_loss=0.0001478, whisper_loss=0.09089, over 3862775.06 frames. ], batch size: 91, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:51:56,472 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3321720.0, ans=0.1 2024-08-15 18:52:21,129 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.788e+01 2.271e+01 2.660e+01 2.983e+01 5.401e+01, threshold=5.319e+01, percent-clipped=1.0 2024-08-15 18:52:42,935 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3321920.0, ans=0.07 2024-08-15 18:52:46,088 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3321920.0, ans=0.0 2024-08-15 18:53:06,943 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.96 vs. limit=15.0 2024-08-15 18:53:09,491 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 14 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-15 18:53:21,910 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-15 18:53:22,296 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.70 vs. limit=22.5 2024-08-15 18:53:23,049 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 13400, loss[loss=0.1133, beats_loss=0.009345, ecapa_loss=0.000138, whisper_loss=0.1025, over 22748.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01047, ecapa_loss=0.0001484, whisper_loss=0.09045, over 3844253.61 frames. ], batch size: 88, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:53:24,469 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.86 vs. limit=15.0 2024-08-15 18:53:42,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3322320.0, ans=0.0 2024-08-15 18:53:45,679 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 19 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-15 18:53:47,480 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3322320.0, ans=0.015 2024-08-15 18:53:54,933 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 23 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-15 18:53:56,835 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.99 vs. limit=15.0 2024-08-15 18:54:00,938 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 32 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-15 18:54:03,120 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 19 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-15 18:54:09,446 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 18:54:15,176 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 21 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-15 18:54:21,281 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.09 vs. limit=10.0 2024-08-15 18:54:46,805 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 18 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-15 18:54:50,151 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 13450, loss[loss=0.1139, beats_loss=0.01027, ecapa_loss=0.0001123, whisper_loss=0.1026, over 23977.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01045, ecapa_loss=0.0001481, whisper_loss=0.09084, over 3867154.37 frames. ], batch size: 90, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:55:09,732 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3322820.0, ans=0.125 2024-08-15 18:55:11,425 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 20 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-15 18:55:14,897 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.704e+01 2.424e+01 2.655e+01 2.945e+01 1.400e+03, threshold=5.311e+01, percent-clipped=0.0 2024-08-15 18:55:14,897 WARNING [optim.py:496] (0/4) Scaling gradients by 0.037934403866529465, model_norm_threshold=53.10542297363281 2024-08-15 18:55:15,081 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.27, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=5.249e+05, grad_sumsq=5.178e+07, orig_rms_sq=1.014e-02 2024-08-15 18:55:17,144 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 18:55:25,005 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3322920.0, ans=0.015 2024-08-15 18:55:39,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3322920.0, ans=0.125 2024-08-15 18:55:46,057 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 34 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-15 18:56:03,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3323120.0, ans=0.1 2024-08-15 18:56:04,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3323120.0, ans=0.0 2024-08-15 18:56:16,030 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 13500, loss[loss=0.08562, beats_loss=0.01195, ecapa_loss=0.000111, whisper_loss=0.07256, over 16159.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01047, ecapa_loss=0.0001481, whisper_loss=0.0911, over 3850068.57 frames. ], batch size: 62, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:56:18,276 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3323220.0, ans=0.125 2024-08-15 18:56:30,655 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 21 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-15 18:56:50,619 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 28 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-15 18:56:50,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3323420.0, ans=0.0 2024-08-15 18:56:53,534 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.58 vs. limit=10.0 2024-08-15 18:56:56,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3323420.0, ans=0.0 2024-08-15 18:57:44,433 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 13550, loss[loss=0.1058, beats_loss=0.01023, ecapa_loss=0.0001455, whisper_loss=0.09414, over 21865.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01047, ecapa_loss=0.0001492, whisper_loss=0.09096, over 3832591.95 frames. ], batch size: 88, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:57:44,665 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-15 18:58:02,087 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 23 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-15 18:58:05,858 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 18:58:08,434 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.701e+01 2.273e+01 2.505e+01 2.907e+01 8.129e+01, threshold=5.010e+01, percent-clipped=4.0 2024-08-15 18:58:13,618 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 20 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-15 18:58:19,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3323920.0, ans=0.125 2024-08-15 18:58:27,408 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3323920.0, ans=0.0 2024-08-15 18:58:39,409 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 23 from LS+wenet, 26 from Vox, 45 fro AS 2024-08-15 18:58:47,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3324020.0, ans=0.125 2024-08-15 18:59:07,238 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 13 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-15 18:59:10,026 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 13600, loss[loss=0.1104, beats_loss=0.01018, ecapa_loss=0.0001469, whisper_loss=0.09879, over 24100.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01052, ecapa_loss=0.0001493, whisper_loss=0.09086, over 3846328.55 frames. ], batch size: 95, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:59:12,149 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3324220.0, ans=0.0 2024-08-15 18:59:22,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3324220.0, ans=0.0 2024-08-15 18:59:29,405 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=6.46 vs. limit=12.0 2024-08-15 18:59:35,302 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3324320.0, ans=0.0 2024-08-15 19:00:06,818 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-15 19:00:22,237 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 35 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-15 19:00:34,809 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 13650, loss[loss=0.08242, beats_loss=0.01399, ecapa_loss=0.0001211, whisper_loss=0.06721, over 22880.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01061, ecapa_loss=0.00015, whisper_loss=0.09051, over 3846517.80 frames. ], batch size: 92, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 19:00:58,851 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.624e+01 2.294e+01 2.538e+01 2.832e+01 8.240e+01, threshold=5.075e+01, percent-clipped=1.0 2024-08-15 19:01:00,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3324820.0, ans=0.125 2024-08-15 19:01:06,427 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3324820.0, ans=0.125 2024-08-15 19:01:46,878 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 20 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-15 19:01:51,497 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3325120.0, ans=0.1 2024-08-15 19:01:59,443 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 13700, loss[loss=0.1056, beats_loss=0.01272, ecapa_loss=0.0001405, whisper_loss=0.09142, over 21946.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01062, ecapa_loss=0.0001495, whisper_loss=0.09008, over 3840248.15 frames. ], batch size: 88, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 19:02:39,619 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-15 19:02:53,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3325520.0, ans=0.2 2024-08-15 19:31:40,142 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 21 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-15 19:45:38,615 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.70 vs. limit=15.0 2024-08-15 19:50:22,766 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 22 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-15 19:59:43,255 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 13750, loss[loss=0.09239, beats_loss=0.01427, ecapa_loss=0.0001342, whisper_loss=0.07677, over 19664.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01068, ecapa_loss=0.0001493, whisper_loss=0.08959, over 3879742.10 frames. ], batch size: 81, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 20:40:25,213 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.98 vs. limit=10.0 2024-08-15 20:47:28,683 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.373e+01 2.597e+01 2.828e+01 1.512e+02, threshold=5.195e+01, percent-clipped=2.0 2024-08-15 21:07:39,563 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-15 21:40:56,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3326020.0, ans=0.125 2024-08-15 22:29:44,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3326120.0, ans=0.125 2024-08-15 22:41:52,438 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 13800, loss[loss=0.1073, beats_loss=0.009337, ecapa_loss=0.0001427, whisper_loss=0.09652, over 23468.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01064, ecapa_loss=0.0001482, whisper_loss=0.08958, over 3847040.23 frames. ], batch size: 93, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 22:59:38,225 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 23 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-15 23:46:29,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3326420.0, ans=0.1 2024-08-16 00:01:43,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3326420.0, ans=0.1 2024-08-16 01:12:17,696 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 13850, loss[loss=0.1112, beats_loss=0.009158, ecapa_loss=0.0001969, whisper_loss=0.1, over 21328.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0106, ecapa_loss=0.0001491, whisper_loss=0.08971, over 3855222.65 frames. ], batch size: 91, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-16 01:18:42,871 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=3326720.0, ans=0.05 2024-08-16 01:30:59,408 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3326720.0, ans=0.125 2024-08-16 01:55:15,099 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.680e+01 2.228e+01 2.445e+01 2.757e+01 2.786e+02, threshold=4.891e+01, percent-clipped=1.0 2024-08-16 02:10:53,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3326920.0, ans=0.125 2024-08-16 02:44:34,252 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 21 from LS+wenet, 24 from Vox, 49 fro AS 2024-08-16 02:48:26,411 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3327020.0, ans=0.0 2024-08-16 02:54:33,461 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3327020.0, ans=0.04949747468305833 2024-08-16 03:35:49,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.whiten.whitening_limit, batch_count=3327120.0, ans=15.0 2024-08-16 03:35:49,817 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.77 vs. limit=15.0 2024-08-16 03:47:18,312 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 13900, loss[loss=0.1213, beats_loss=0.00963, ecapa_loss=0.0001355, whisper_loss=0.1103, over 22933.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01056, ecapa_loss=0.0001484, whisper_loss=0.0906, over 3876079.65 frames. ], batch size: 89, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-16 03:51:32,446 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-16 04:32:26,930 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 16 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-16 04:32:27,276 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3327320.0, ans=0.07 2024-08-16 05:17:47,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3327520.0, ans=0.2 2024-08-16 05:42:17,567 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-16 05:42:17,846 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3327520.0, ans=0.0 2024-08-16 06:05:23,818 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 25 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-16 06:11:23,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3327620.0, ans=0.1 2024-08-16 06:16:02,358 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 13950, loss[loss=0.09965, beats_loss=0.01149, ecapa_loss=0.0001542, whisper_loss=0.08662, over 22584.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0106, ecapa_loss=0.000148, whisper_loss=0.09024, over 3894490.15 frames. ], batch size: 93, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-16 06:21:12,210 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 27 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-16 06:41:46,005 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3327720.0, ans=0.0 2024-08-16 06:47:08,052 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3327820.0, ans=0.5 2024-08-16 06:47:08,374 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.08 vs. limit=15.0 2024-08-16 06:50:49,874 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3327820.0, ans=0.2 2024-08-16 06:57:05,967 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.387e+01 2.658e+01 3.040e+01 4.567e+01, threshold=5.316e+01, percent-clipped=0.0 2024-08-16 07:25:52,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3327920.0, ans=0.1 2024-08-16 07:48:32,876 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=3328020.0, ans=15.0 2024-08-16 07:55:19,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3328020.0, ans=0.1 2024-08-16 08:04:27,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3328020.0, ans=0.125 2024-08-16 09:02:46,065 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 14000, loss[loss=0.1044, beats_loss=0.01137, ecapa_loss=0.0001748, whisper_loss=0.09131, over 19303.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01055, ecapa_loss=0.0001474, whisper_loss=0.09067, over 3879520.08 frames. ], batch size: 80, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-16 09:13:14,967 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.16 vs. limit=15.0 2024-08-16 09:37:33,834 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-16 09:45:56,767 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3328420.0, ans=0.0 2024-08-16 09:48:00,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3328420.0, ans=0.125 2024-08-16 10:34:15,423 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 19 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-16 10:35:39,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3328520.0, ans=0.125 2024-08-16 10:35:39,985 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.61 vs. limit=15.0